ShaderX2 Real-TimeDepthOfFieldSimulation
ShaderX2 Real-TimeDepthOfFieldSimulation
Introduction
Photorealistic rendering attempts to generate computer images with quality
approaching that of real life images. Quite often, computer rendered images look almost
photorealistic, but are missing something subtle - something that makes them look
synthetic or too perfect. Depth of field is one of those very important visual components
of real photography, which makes images look “real”. In “real-world” photography or
cinematography, the physical properties of the camera cause some parts of the scene to be
blurred, while maintaining sharpness in other areas. While blurriness sometimes can be
thought of as an imperfection and undesirable artifact that distorts original images and
hides some of the scene details, it can also be used as a tool to provide valuable visual
clues and guide a viewer’s attention to important parts of the scene. Using depth of field
effectively can improve photorealism and add an artistic touch to rendered images. Figure
1 shows a simple scene rendered with and without depth of field.
1
Real-Time Depth of Field Simulation
Screen
Pinhole
2
From ShaderX2 – Shader Programming Tips and Tricks with DirectX 9
The distance from the image plane to the object in focus can be expressed as:
zfocus = u + v
f
Screen
Circle of confusion
Lens
u v
Focus plane
The circle of confusion diameter b depends on the distance of the plane of focus
and lens aperture setting a (also known as f-stop). For a known focus distance and lens
parameters, size of the circle of confusion can be calculated as:
D ⋅ f ( z focus − z )
b= , where D is a lens diameter
z focus ( z − f )
f
D=
a
Any circle of confusion greater than the smallest point a human eye can resolve
contributes to the blurriness of the image that we see as a depth of field.
3
Real-Time Depth of Field Simulation
4
From ShaderX2 – Shader Programming Tips and Tricks with DirectX 9
allowing use of different surface formats. Guided by this requirement we can pick the
D3DFMT_A8R8G8B8 format for the scene color output and the two-channel texture format
D3DFMT_G16R16 format for depth and blurriness factor. As shown in Figure 4, both
formats are 32-bits per pixel, and provide us with enough space for the necessary
information at the desired precision.
16 16
R8 G8 B 8 A 8 Depth Blurriness
/////////////////////////////////////////////////////////////////////
struct VS_INPUT
{
float4 vPos: POSITION;
float3 vNorm: NORMAL;
float2 vTexCoord: TEXCOORD0;
};
struct VS_OUTPUT
{
float4 vPos: POSITION;
float4 vColor: COLOR0;
float fDepth: TEXCOORD0;
float2 vTexCoord: TEXCOORD1;
};
5
Real-Time Depth of Field Simulation
VS_OUTPUT scene_shader_vs(VS_INPUT v)
{
VS_OUTPUT o = (VS_OUTPUT)0;
float4 vPosWV;
float3 vNorm;
float3 vLightDir;
// Transform position
o.vPos = mul(v.vPos, matWorldViewProj);
return o;
}
6
From ShaderX2 – Shader Programming Tips and Tricks with DirectX 9
An the example of scene pixel shader that can be compiled to PS 2.0 shader
model is shown below:
/////////////////////////////////////////////////////////////////////
float focalLen;
float Dlens;
float Zfocus;
float maxCoC;
float scale;
sampler TexSampler;
float sceneRange;
/////////////////////////////////////////////////////////////////////
struct PS_INPUT
{
float4 vColor: COLOR0;
float fDepth: TEXCOORD0;
float2 vTexCoord: TEXCOORD1;
};
struct PS_OUTPUT
{
float4 vColor: COLOR0;
float4 vDoF: COLOR1;
};
/////////////////////////////////////////////////////////////////////
PS_OUTPUT scene_shader_ps(PS_INPUT v)
{
PS_OUTPUT o = (PS_OUTPUT)0;
// Output color
o.vColor = v.vColor * tex2D(TexSampler, v.vTexCoord);
return o;
}
7
Real-Time Depth of Field Simulation
Texture Screen
Coordinates Positions
(0, 0) (1, 0) (-0.5, -0.5) (W-0.5, -0.5)
Figure 5. Texture coordinates and vertex positions for screen space quad.
This vertex shader is designed for vs_1_1 compilation target.
float4 viewportScale;
float4 viewportBias;
struct VS_INPUT {
float4 vPos: POSITION;
float2 vTexCoord: TEXCOORD;
};
struct VS_OUTPUT {
float4 vPos: POSITION;
float2 vTexCoord: TEXCOORD0;
};
return o;
}
8
From ShaderX2 – Shader Programming Tips and Tricks with DirectX 9
Filter Taps:
Center Sample
Outer Samples
Maximal
CoC
size
Blurriness = 0 Blurriness = 1
Point in focus Point is blurred
Figure 7. Relationship between blurriness and filter size.
The post-processing pixel shader computes filter sample positions based on 2D
offsets stored in the filterTaps array and the size of the circle of confusion. The 2D
offsets are locations of taps for the filter of one pixel in diameter. The following code
shows how these values can be initialized in the program according to the render target
resolution.
9
Real-Time Depth of Field Simulation
void SetupFilterKernel()
{
// Scale tap offsets based on render target size
FLOAT dx = 0.5f / (FLOAT)dwRTWidth;
FLOAT dy = 0.5f / (FLOAT)dwRTHeight;
D3DXVECTOR4 v[12];
v[0] = D3DXVECTOR4(-0.326212f * dx, -0.40581f * dy, 0.0f, 0.0f);
v[1] = D3DXVECTOR4(-0.840144f * dx, -0.07358f * dy, 0.0f, 0.0f);
v[2] = D3DXVECTOR4(-0.695914f * dx, 0.457137f * dy, 0.0f, 0.0f);
v[3] = D3DXVECTOR4(-0.203345f * dx, 0.620716f * dy, 0.0f, 0.0f);
v[4] = D3DXVECTOR4(0.96234f * dx, -0.194983f * dy, 0.0f, 0.0f);
v[5] = D3DXVECTOR4(0.473434f * dx, -0.480026f * dy, 0.0f, 0.0f);
v[6] = D3DXVECTOR4(0.519456f * dx, 0.767022f * dy, 0.0f, 0.0f);
v[7] = D3DXVECTOR4(0.185461f * dx, -0.893124f * dy, 0.0f, 0.0f);
v[8] = D3DXVECTOR4(0.507431f * dx, 0.064425f * dy, 0.0f, 0.0f);
v[9] = D3DXVECTOR4(0.89642f * dx, 0.412458f * dy, 0.0f, 0.0f);
v[10] = D3DXVECTOR4(-0.32194f * dx, -0.932615f * dy, 0.0f, 0.0f);
v[11] = D3DXVECTOR4(-0.791559f * dx, -0.59771f * dy, 0.0f, 0.0f);
Once sample positions are computed, the filter averages color from its samples to
derive the blurred color. When the blurriness value is close to zero, all samples come
from the same pixel and no blurring happens. As the blurriness factor increases, the filter
will start sampling from more and more neighboring pixels, thus increasingly blurring the
image. All images are sampled with D3DTEXF_LINEAR filtering. Using linear filtering is
not very accurate on the edges of objects where depth might abruptly change, however it
produces better overall quality images in practice.
One of the problems commonly associated with all post-filtering methods is
leaking of color from sharp objects onto the blurry backgrounds. This results in faint
halos around sharp objects as can be seen on the left side of Figure 8. The color leaking
happens because the filter for the blurry background will sample color from the sharp
object in the vicinity due to the large filter size. To solve this problem, we will discard
the outer samples that can contribute to leaking according to the following criteria: if the
outer sample is in focus and it is in front of the blurry center sample, it should not
contribute to the blurred color. This can introduce a minor popping effect when objects
go in or out of focus. To combat sample popping, the outer sample blurriness factor is
used as a sample weight to fade out its contribution gradually. The right side of Figure 8
shows a portion of a scene fragment with color leaking eliminated.
10
From ShaderX2 – Shader Programming Tips and Tricks with DirectX 9
#define NUM_DOF_TAPS 12
float maxCoC;
float2 filterTaps[NUM_DOF_TAPS];
//////////////////////////////////////////////////////////////////////
struct PS_INPUT
{
float2 vTexCoord: TEXCOORD;
};
//////////////////////////////////////////////////////////////////////
11
Real-Time Depth of Field Simulation
Now that we have discussed our implementation which models the circle of
confusion with a variable-sized stochastic filter kernel, we will describe an
implementation which is based on a separable Gaussian filter.
This separable Gaussian filter approach differs from the previous approach of
simulating depth of field in two ways. First, it does not utilize multiple render targets for
outputting depth information. Second, to simulate the blurring that occurs in depth of
field, we apply a Gaussian filter during the post-processing stage instead of simulating
the circle of confusion of a physical camera lens.
Implementation Overview
In this method, we first render the scene at full resolution to an offscreen buffer,
outputting depth information for each pixel to the alpha channel of that buffer. We then
downsample this fully-rendered scene into an image ¼ size (½ in x and ½ in y) of the
original. Next, we perform blurring of the downsampled scene in two passes by running
the image through two passes of a separable Gaussian filter – first along the x axis and
then along the y axis. On the final pass, we blend between the original full resolution
rendering of our scene and the blurred post-processed image based on the distance of
each pixel from the specified focal plane stored in the downsampled image. The
intermediate filtering results are stored in 16-bit per channel integer format
(D3DFMT_A16B16G16R16) for extra precision. We will now discuss this method in more
detail, going step-by-step through the different rendering passes and shaders used.
12
From ShaderX2 – Shader Programming Tips and Tricks with DirectX 9
/////////////////////////////////////////////////////////////////////
struct VS_INPUT
{
float4 vPos: POSITION;
float3 vNorm: NORMAL;
float2 vTexCoord: TEXCOORD0;
};
struct VS_OUTPUT
{
float4 vPos: POSITION;
float4 vColor: COLOR0;
float fBlur: TEXCOORD0;
float2 vTexCoord: TEXCOORD1;
};
/////////////////////////////////////////////////////////////////////
VS_OUTPUT scene_shader_vs(VS_INPUT v)
{
VS_OUTPUT o = (VS_OUTPUT)0;
float4 vPosWV;
float3 vNorm;
float3 vLightDir;
// Transform position
13
Real-Time Depth of Field Simulation
return o;
}
sampler TexSampler;
/////////////////////////////////////////////////////////////////////
struct PS_INPUT
{
float4 vColor: COLOR0;
float fBlur: TEXCOORD0;
float2 vTexCoord: TEXCOORD1;
};
/////////////////////////////////////////////////////////////////////
// Output color
vColor = v.vColor * tex2D(TexSampler, v.vTexCoord);
14
From ShaderX2 – Shader Programming Tips and Tricks with DirectX 9
matrix matWorldViewProj;
//////////////////////////////////////////////////////////////////////
struct VS_OUTPUT
{
float4 vPos: POSITION;
float2 vTex: TEXCOORD0;
};
//////////////////////////////////////////////////////////////////////
return o;
}
15
Real-Time Depth of Field Simulation
sampler renderTexture;
One of the most frequently used filters for performing smoothing of an image is
the Gaussian filter (see Figure 11). Typically, the filter is applied in the following way:
n n
∑∑ P C
i =1 j =1
ij ij
F= ,
S
where F is the filtered value of the target pixel, P is a pixel in the 2D grid, C is a
coefficient in the 2D Gaussian matrix, n is the vertical/horizontal dimensions of the
matrix, and S is the sum of all values in the Gaussian matrix).
Once a suitable kernel has been calculated, Gaussian smoothing can be performed
using standard convolution methods. The convolution can in fact be performed fairly
quickly since the equation for the 2D isotropic Gaussian is separable into x and y
components. Thus, the 2D convolution can be performed by first convolving with a 1D
Gaussian in the x direction, and then convolving with another 1D Gaussian in the y
direction. This allows us to apply a larger size filter to the input image in two successive
passes of 1D filters. We will perform this operation by rendering into a temporary buffer
and sampling a line (or a column, for y axis filtering) of texels in each of the passes.
16
From ShaderX2 – Shader Programming Tips and Tricks with DirectX 9
The size of the downsampled buffer determines the size of texels used for
controlling sampling points for the Gaussian filter taps. This can be precomputed as a
constant to the shader ahead of time. Following is an example of how filter tap offset can
be computed.
void SetupFilterKernel()
{
// Scale tap offsets based on render target size
FLOAT dx = 1.0f / (FLOAT)dwRTWidth;
FLOAT dy = 1.0f / (FLOAT)dwRTHeight;
D3DXVECTOR4 v[7];
17
Real-Time Depth of Field Simulation
The center sample and the inner taps of the filter are done with interpolated
texture coordinates which are computed in the vertex shader. To compute the offsets for
the first seven samples we use input texture coordinate and the precomputed tap offsets
based on the image resolution.
In the pixel shader we sample the image for the center tap and first 6 inner taps,
using nearest filtering for the center sample and bilinear sampling for the inner samples.
The pixel shader code derives the texture coordinates for the outer samples based
on pre-computed deltas from the location of the center sample. The outer samples are
fetched via dependent reads as texture coordinates are derived in the pixel shader itself.
All samples are weighted based on the predefined weight thresholds and
blurriness values and added together. This results in a weighted sum of 25 texels from the
source image, which is large enough to allow us to create a convincing blurring effect for
simulating depth of field without violating the maximum number of instructions for 2.0
pixel shader.
Note that the output of this pass is directed to a separate off-screen buffer. At this
point we have used three separate off-screen render targets: one to output results of the
full scene rendering, one to output results of downsampling pass, and one to output
results of Gaussian blurring.
18
From ShaderX2 – Shader Programming Tips and Tricks with DirectX 9
float4 viewportScale;
float4 viewportBias;
// Offsets 0-3 used by vertex shader, 4-6 by pixel shader
float2 horzTapOffs[7];
struct VS_INPUT
{
float4 vPos: POSITION;
float2 vTexCoord: TEXCOORD;
};
struct VS_OUTPUT_TEX7
{
float4 vPos: POSITION;
float2 vTap0: TEXCOORD0;
float2 vTap1: TEXCOORD1;
float2 vTap2: TEXCOORD2;
float2 vTap3: TEXCOORD3;
float2 vTap1Neg: TEXCOORD4;
float2 vTap2Neg: TEXCOORD5;
float2 vTap3Neg: TEXCOORD6;
};
VS_OUTPUT_TEX7 filter_gaussian_x_vs(VS_INPUT v)
{
VS_OUTPUT_TEX7 o = (VS_OUTPUT_TEX7)0;
return o;
}
19
Real-Time Depth of Field Simulation
sampler renderTexture;
// Offsets 0-3 used by vertex shader, 4-6 by pixel shader
float2 horzTapOffs[7];
struct PS_INPUT_TEX7
{
float2 vTap0: TEXCOORD0;
float2 vTap1: TEXCOORD1;
float2 vTap2: TEXCOORD2;
float2 vTap3: TEXCOORD3;
float2 vTap1Neg: TEXCOORD4;
float2 vTap2Neg: TEXCOORD5;
float2 vTap3Neg: TEXCOORD6;
};
20
From ShaderX2 – Shader Programming Tips and Tricks with DirectX 9
21
Real-Time Depth of Field Simulation
float4 viewportScale;
float4 viewportBias;
// Offsets 0-3 used by vertex shader, 4-6 by pixel shader
float2 vertTapOffs[7];
//////////////////////////////////////////////////////////////////////
struct VS_INPUT
{
float4 vPos: POSITION;
float2 vTexCoord: TEXCOORD;
};
struct VS_OUTPUT_TEX7
{
float4 vPos: POSITION;
float2 vTap0: TEXCOORD0;
float2 vTap1: TEXCOORD1;
float2 vTap2: TEXCOORD2;
float2 vTap3: TEXCOORD3;
float2 vTap1Neg: TEXCOORD4;
float2 vTap2Neg: TEXCOORD5;
float2 vTap3Neg: TEXCOORD6;
};
//////////////////////////////////////////////////////////////////////
VS_OUTPUT_TEX7 filter_gaussian_y_vs(VS_INPUT v)
{
VS_OUTPUT_TEX7 o = (VS_OUTPUT_TEX7)0;
22
From ShaderX2 – Shader Programming Tips and Tricks with DirectX 9
return o;
}
sampler blurredXTexture;
// Offsets 0-3 used by vertex shader, 4-6 by pixel shader
float2 vertTapOffs[7];
//////////////////////////////////////////////////////////////////////
struct PS_INPUT_TEX7
{
float2 vTap0: TEXCOORD0;
float2 vTap1: TEXCOORD1;
float2 vTap2: TEXCOORD2;
float2 vTap3: TEXCOORD3;
float2 vTap1Neg: TEXCOORD4;
float2 vTap2Neg: TEXCOORD5;
float2 vTap3Neg: TEXCOORD6;
};
//////////////////////////////////////////////////////////////////////
23
Real-Time Depth of Field Simulation
24
From ShaderX2 – Shader Programming Tips and Tricks with DirectX 9
Figure 11 shows the result of applying the 25×25 separable Gaussian to the
downsampled image:
25
Real-Time Depth of Field Simulation
In this vertex shader, we simply transform the vertices and propagate the texture
coordinate to the pixel shader. The vertex shader is designed to compile to vs_1_1 target.
//////////////////////////////////////////////////////////////////////
float4 viewportScale;
float4 viewportBias;
//////////////////////////////////////////////////////////////////////
struct VS_INPUT
{
float4 vPos: POSITION;
float2 vTex: TEXCOORD;
};
struct VS_OUTPUT
{
float4 vPos: POSITION;
float2 vTex: TEXCOORD0;
};
//////////////////////////////////////////////////////////////////////
VS_OUTPUT final_pass_vs(VS_INPUT v)
{
VS_OUTPUT o = (VS_OUTPUT)0;
return o;
}
26
From ShaderX2 – Shader Programming Tips and Tricks with DirectX 9
In this pixel shader we actually do the compositing of the final image. In the pixel
shader we retrieve the depth falloff distance stored in the downsampled image’s alpha
channel. This focal plane distance is used as a blending weight to blend between the post-
processed Gaussian-blurred blurred image and the original full resolution scene
rendering. This pixel shader is designed to compile to ps_1_4 target or above.
//////////////////////////////////////////////////////////////////////
sampler blurredXYTexture;
sampler renderTexture;
//////////////////////////////////////////////////////////////////////
27
Real-Time Depth of Field Simulation
The images for Figure 11 and 12 are taken from a screen saver from ATI
RADEON 9700 demo suite, which you can download here:
http://www.ati.com/developer/screensavers.html
Bokeh
It has been noticed that different lenses with the same apertures and focal
distances produce slightly different out-of-focus images. In photography the “quality” of
an out-of-focus or blurred image is described by the Japanese term “bokeh”. While this
term is mostly familiar to photographers, it is relatively new to computer graphics
professionals.
The perfect lens should have no spherical aberration, and should focus incoming
rays in a perfect cone of light behind the lens. In such a camera, if the image is not in
focus, each blurred point is represented by a uniformly illuminated circle of confusion.
All real lenses have some degree of spherical aberration, and always have non-uniform
distribution of light in the light cone and thus in the circle of confusion. The lens’
28
From ShaderX2 – Shader Programming Tips and Tricks with DirectX 9
diaphragm and number of shutter blades can also have some effect on the shape of the
circle of confusion. The term “bokeh”, which is a Japanese phoneticization of the French
word bouquet, describes this phenomenon and is a subjective factor, meaning that there is
no objective way to measure people’s reaction to this phenomenon. What might be
considered “bad” bokeh under certain circumstances, can be desirable for some artistic
effects, and vice versa.
To simulate different lens bokehs, one can use filters with different distributions
and weightings of filter taps. Figure 13 demonstrates part of the same scene processed
with blur filters of the same size but with different filter taps distributions and
weightings.
29
Real-Time Depth of Field Simulation
Summary
Reference
30