0% found this document useful (0 votes)
9 views

vulkan_tutorial_en-210-288

ini adalah bagian kelima dari vulkan turorial

Uploaded by

rendy anggara
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

vulkan_tutorial_en-210-288

ini adalah bagian kelima dari vulkan turorial

Uploaded by

rendy anggara
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

1 VkImageView textureImageView;

2 VkSampler textureSampler;
3
4 ...
5
6 void createTextureSampler() {
7 ...
8
9 if (vkCreateSampler(device, &samplerInfo, nullptr,
&textureSampler) != VK_SUCCESS) {
10 throw std::runtime_error("failed to create texture
sampler!");
11 }
12 }

Note the sampler does not reference a VkImage anywhere. The sampler is a
distinct object that provides an interface to extract colors from a texture. It
can be applied to any image you want, whether it is 1D, 2D or 3D. This is
different from many older APIs, which combined texture images and filtering
into a single state.
Destroy the sampler at the end of the program when we’ll no longer be accessing
the image:
1 void cleanup() {
2 cleanupSwapChain();
3
4 vkDestroySampler(device, textureSampler, nullptr);
5 vkDestroyImageView(device, textureImageView, nullptr);
6
7 ...
8 }

Anisotropy device feature


If you run your program right now, you’ll see a validation layer message like
this:

That’s because anisotropic filtering is actually an optional device feature. We


need to update the createLogicalDevice function to request it:
1 VkPhysicalDeviceFeatures deviceFeatures{};
2 deviceFeatures.samplerAnisotropy = VK_TRUE;

209
And even though it is very unlikely that a modern graphics card will not support
it, we should update isDeviceSuitable to check if it is available:
1 bool isDeviceSuitable(VkPhysicalDevice device) {
2 ...
3
4 VkPhysicalDeviceFeatures supportedFeatures;
5 vkGetPhysicalDeviceFeatures(device, &supportedFeatures);
6
7 return indices.isComplete() && extensionsSupported &&
swapChainAdequate && supportedFeatures.samplerAnisotropy;
8 }

The vkGetPhysicalDeviceFeatures repurposes the VkPhysicalDeviceFeatures


struct to indicate which features are supported rather than requested by setting
the boolean values.
Instead of enforcing the availability of anisotropic filtering, it’s also possible to
simply not use it by conditionally setting:
1 samplerInfo.anisotropyEnable = VK_FALSE;
2 samplerInfo.maxAnisotropy = 1.0f;

In the next chapter we will expose the image and sampler objects to the shaders
to draw the texture onto the square.
C++ code / Vertex shader / Fragment shader

Combined image sampler


Introduction
We looked at descriptors for the first time in the uniform buffers part of the
tutorial. In this chapter we will look at a new type of descriptor: combined
image sampler. This descriptor makes it possible for shaders to access an im-
age resource through a sampler object like the one we created in the previous
chapter.
We’ll start by modifying the descriptor layout, descriptor pool and descriptor set
to include such a combined image sampler descriptor. After that, we’re going
to add texture coordinates to Vertex and modify the fragment shader to read
colors from the texture instead of just interpolating the vertex colors.

Updating the descriptors


Browse to the createDescriptorSetLayout function and add a VkDescriptorSetLayoutBinding
for a combined image sampler descriptor. We’ll simply put it in the binding
after the uniform buffer:

210
1 VkDescriptorSetLayoutBinding samplerLayoutBinding{};
2 samplerLayoutBinding.binding = 1;
3 samplerLayoutBinding.descriptorCount = 1;
4 samplerLayoutBinding.descriptorType =
VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER;
5 samplerLayoutBinding.pImmutableSamplers = nullptr;
6 samplerLayoutBinding.stageFlags = VK_SHADER_STAGE_FRAGMENT_BIT;
7
8 std::array<VkDescriptorSetLayoutBinding, 2> bindings =
{uboLayoutBinding, samplerLayoutBinding};
9 VkDescriptorSetLayoutCreateInfo layoutInfo{};
10 layoutInfo.sType =
VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO;
11 layoutInfo.bindingCount = static_cast<uint32_t>(bindings.size());
12 layoutInfo.pBindings = bindings.data();

Make sure to set the stageFlags to indicate that we intend to use the combined
image sampler descriptor in the fragment shader. That’s where the color of the
fragment is going to be determined. It is possible to use texture sampling in
the vertex shader, for example to dynamically deform a grid of vertices by a
heightmap.
We must also create a larger descriptor pool to make room for the alloca-
tion of the combined image sampler by adding another VkPoolSize of type
VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER to the VkDescriptorPoolCreateInfo.
Go to the createDescriptorPool function and modify it to include a
VkDescriptorPoolSize for this descriptor:
1 std::array<VkDescriptorPoolSize, 2> poolSizes{};
2 poolSizes[0].type = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER;
3 poolSizes[0].descriptorCount =
static_cast<uint32_t>(MAX_FRAMES_IN_FLIGHT);
4 poolSizes[1].type = VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER;
5 poolSizes[1].descriptorCount =
static_cast<uint32_t>(MAX_FRAMES_IN_FLIGHT);
6
7 VkDescriptorPoolCreateInfo poolInfo{};
8 poolInfo.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_POOL_CREATE_INFO;
9 poolInfo.poolSizeCount = static_cast<uint32_t>(poolSizes.size());
10 poolInfo.pPoolSizes = poolSizes.data();
11 poolInfo.maxSets = static_cast<uint32_t>(MAX_FRAMES_IN_FLIGHT);

Inadequate descriptor pools are a good example of a problem that the vali-
dation layers will not catch: As of Vulkan 1.1, vkAllocateDescriptorSets
may fail with the error code VK_ERROR_POOL_OUT_OF_MEMORY if the pool is not
sufficiently large, but the driver may also try to solve the problem internally.
This means that sometimes (depending on hardware, pool size and allocation

211
size) the driver will let us get away with an allocation that exceeds the limits
of our descriptor pool. Other times, vkAllocateDescriptorSets will fail and
return VK_ERROR_POOL_OUT_OF_MEMORY. This can be particularly frustrating if
the allocation succeeds on some machines, but fails on others.
Since Vulkan shifts the responsiblity for the allocation to the driver, it is no
longer a strict requirement to only allocate as many descriptors of a certain
type (VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER, etc.) as specified
by the corresponding descriptorCount members for the creation of the
descriptor pool. However, it remains best practise to do so, and in the future,
VK_LAYER_KHRONOS_validation will warn about this type of problem if you
enable Best Practice Validation.
The final step is to bind the actual image and sampler resources to the descrip-
tors in the descriptor set. Go to the createDescriptorSets function.
1 for (size_t i = 0; i < MAX_FRAMES_IN_FLIGHT; i++) {
2 VkDescriptorBufferInfo bufferInfo{};
3 bufferInfo.buffer = uniformBuffers[i];
4 bufferInfo.offset = 0;
5 bufferInfo.range = sizeof(UniformBufferObject);
6
7 VkDescriptorImageInfo imageInfo{};
8 imageInfo.imageLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL;
9 imageInfo.imageView = textureImageView;
10 imageInfo.sampler = textureSampler;
11
12 ...
13 }

The resources for a combined image sampler structure must be specified in a


VkDescriptorImageInfo struct, just like the buffer resource for a uniform buffer
descriptor is specified in a VkDescriptorBufferInfo struct. This is where the
objects from the previous chapter come together.
1 std::array<VkWriteDescriptorSet, 2> descriptorWrites{};
2
3 descriptorWrites[0].sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
4 descriptorWrites[0].dstSet = descriptorSets[i];
5 descriptorWrites[0].dstBinding = 0;
6 descriptorWrites[0].dstArrayElement = 0;
7 descriptorWrites[0].descriptorType =
VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER;
8 descriptorWrites[0].descriptorCount = 1;
9 descriptorWrites[0].pBufferInfo = &bufferInfo;
10
11 descriptorWrites[1].sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
12 descriptorWrites[1].dstSet = descriptorSets[i];

212
13 descriptorWrites[1].dstBinding = 1;
14 descriptorWrites[1].dstArrayElement = 0;
15 descriptorWrites[1].descriptorType =
VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER;
16 descriptorWrites[1].descriptorCount = 1;
17 descriptorWrites[1].pImageInfo = &imageInfo;
18
19 vkUpdateDescriptorSets(device,
static_cast<uint32_t>(descriptorWrites.size()),
descriptorWrites.data(), 0, nullptr);

The descriptors must be updated with this image info, just like the buffer. This
time we’re using the pImageInfo array instead of pBufferInfo. The descriptors
are now ready to be used by the shaders!

Texture coordinates
There is one important ingredient for texture mapping that is still missing, and
that’s the actual coordinates for each vertex. The coordinates determine how
the image is actually mapped to the geometry.
1 struct Vertex {
2 glm::vec2 pos;
3 glm::vec3 color;
4 glm::vec2 texCoord;
5
6 static VkVertexInputBindingDescription getBindingDescription() {
7 VkVertexInputBindingDescription bindingDescription{};
8 bindingDescription.binding = 0;
9 bindingDescription.stride = sizeof(Vertex);
10 bindingDescription.inputRate = VK_VERTEX_INPUT_RATE_VERTEX;
11
12 return bindingDescription;
13 }
14
15 static std::array<VkVertexInputAttributeDescription, 3>
getAttributeDescriptions() {
16 std::array<VkVertexInputAttributeDescription, 3>
attributeDescriptions{};
17
18 attributeDescriptions[0].binding = 0;
19 attributeDescriptions[0].location = 0;
20 attributeDescriptions[0].format = VK_FORMAT_R32G32_SFLOAT;
21 attributeDescriptions[0].offset = offsetof(Vertex, pos);
22
23 attributeDescriptions[1].binding = 0;

213
24 attributeDescriptions[1].location = 1;
25 attributeDescriptions[1].format = VK_FORMAT_R32G32B32_SFLOAT;
26 attributeDescriptions[1].offset = offsetof(Vertex, color);
27
28 attributeDescriptions[2].binding = 0;
29 attributeDescriptions[2].location = 2;
30 attributeDescriptions[2].format = VK_FORMAT_R32G32_SFLOAT;
31 attributeDescriptions[2].offset = offsetof(Vertex, texCoord);
32
33 return attributeDescriptions;
34 }
35 };

Modify the Vertex struct to include a vec2 for texture coordinates. Make sure
to also add a VkVertexInputAttributeDescription so that we can use access
texture coordinates as input in the vertex shader. That is necessary to be able
to pass them to the fragment shader for interpolation across the surface of the
square.
1 const std::vector<Vertex> vertices = {
2 {{-0.5f, -0.5f}, {1.0f, 0.0f, 0.0f}, {1.0f, 0.0f}},
3 {{0.5f, -0.5f}, {0.0f, 1.0f, 0.0f}, {0.0f, 0.0f}},
4 {{0.5f, 0.5f}, {0.0f, 0.0f, 1.0f}, {0.0f, 1.0f}},
5 {{-0.5f, 0.5f}, {1.0f, 1.0f, 1.0f}, {1.0f, 1.0f}}
6 };

In this tutorial, I will simply fill the square with the texture by using coordinates
from 0, 0 in the top-left corner to 1, 1 in the bottom-right corner. Feel free to
experiment with different coordinates. Try using coordinates below 0 or above
1 to see the addressing modes in action!

Shaders
The final step is modifying the shaders to sample colors from the texture. We
first need to modify the vertex shader to pass through the texture coordinates
to the fragment shader:
1 layout(location = 0) in vec2 inPosition;
2 layout(location = 1) in vec3 inColor;
3 layout(location = 2) in vec2 inTexCoord;
4
5 layout(location = 0) out vec3 fragColor;
6 layout(location = 1) out vec2 fragTexCoord;
7
8 void main() {
9 gl_Position = ubo.proj * ubo.view * ubo.model * vec4(inPosition,
0.0, 1.0);

214
10 fragColor = inColor;
11 fragTexCoord = inTexCoord;
12 }

Just like the per vertex colors, the fragTexCoord values will be smoothly inter-
polated across the area of the square by the rasterizer. We can visualize this by
having the fragment shader output the texture coordinates as colors:
1 #version 450
2
3 layout(location = 0) in vec3 fragColor;
4 layout(location = 1) in vec2 fragTexCoord;
5
6 layout(location = 0) out vec4 outColor;
7
8 void main() {
9 outColor = vec4(fragTexCoord, 0.0, 1.0);
10 }

You should see something like the image below. Don’t forget to recompile the
shaders!

The green channel represents the horizontal coordinates and the red channel
the vertical coordinates. The black and yellow corners confirm that the tex-

215
ture coordinates are correctly interpolated from 0, 0 to 1, 1 across the square.
Visualizing data using colors is the shader programming equivalent of printf
debugging, for lack of a better option!
A combined image sampler descriptor is represented in GLSL by a sampler
uniform. Add a reference to it in the fragment shader:
1 layout(binding = 1) uniform sampler2D texSampler;

There are equivalent sampler1D and sampler3D types for other types of images.
Make sure to use the correct binding here.
1 void main() {
2 outColor = texture(texSampler, fragTexCoord);
3 }

Textures are sampled using the built-in texture function. It takes a sampler
and coordinate as arguments. The sampler automatically takes care of the
filtering and transformations in the background. You should now see the texture
on the square when you run the application:

Try experimenting with the addressing modes by scaling the texture coordinates
to values higher than 1. For example, the following fragment shader produces
the result in the image below when using VK_SAMPLER_ADDRESS_MODE_REPEAT:

216
1 void main() {
2 outColor = texture(texSampler, fragTexCoord * 2.0);
3 }

You can also manipulate the texture colors using the vertex colors:
1 void main() {
2 outColor = vec4(fragColor * texture(texSampler,
fragTexCoord).rgb, 1.0);
3 }

I’ve separated the RGB and alpha channels here to not scale the alpha channel.

217
You now know how to access images in shaders! This is a very powerful technique
when combined with images that are also written to in framebuffers. You can
use these images as inputs to implement cool effects like post-processing and
camera displays within the 3D world.
C++ code / Vertex shader / Fragment shader

218
Depth buffering

Introduction
The geometry we’ve worked with so far is projected into 3D, but it’s still com-
pletely flat. In this chapter we’re going to add a Z coordinate to the position to
prepare for 3D meshes. We’ll use this third coordinate to place a square over
the current square to see a problem that arises when geometry is not sorted by
depth.

3D geometry
Change the Vertex struct to use a 3D vector for the position, and update the
format in the corresponding VkVertexInputAttributeDescription:
1 struct Vertex {
2 glm::vec3 pos;
3 glm::vec3 color;
4 glm::vec2 texCoord;
5
6 ...
7
8 static std::array<VkVertexInputAttributeDescription, 3>
getAttributeDescriptions() {
9 std::array<VkVertexInputAttributeDescription, 3>
attributeDescriptions{};
10
11 attributeDescriptions[0].binding = 0;
12 attributeDescriptions[0].location = 0;
13 attributeDescriptions[0].format = VK_FORMAT_R32G32B32_SFLOAT;
14 attributeDescriptions[0].offset = offsetof(Vertex, pos);
15
16 ...
17 }
18 };

219
Next, update the vertex shader to accept and transform 3D coordinates as input.
Don’t forget to recompile it afterwards!
1 layout(location = 0) in vec3 inPosition;
2
3 ...
4
5 void main() {
6 gl_Position = ubo.proj * ubo.view * ubo.model * vec4(inPosition,
1.0);
7 fragColor = inColor;
8 fragTexCoord = inTexCoord;
9 }

Lastly, update the vertices container to include Z coordinates:


1 const std::vector<Vertex> vertices = {
2 {{-0.5f, -0.5f, 0.0f}, {1.0f, 0.0f, 0.0f}, {0.0f, 0.0f}},
3 {{0.5f, -0.5f, 0.0f}, {0.0f, 1.0f, 0.0f}, {1.0f, 0.0f}},
4 {{0.5f, 0.5f, 0.0f}, {0.0f, 0.0f, 1.0f}, {1.0f, 1.0f}},
5 {{-0.5f, 0.5f, 0.0f}, {1.0f, 1.0f, 1.0f}, {0.0f, 1.0f}}
6 };

If you run your application now, then you should see exactly the same result as
before. It’s time to add some extra geometry to make the scene more interest-
ing, and to demonstrate the problem that we’re going to tackle in this chapter.
Duplicate the vertices to define positions for a square right under the current
one like this:

Use Z coordinates of -0.5f and add the appropriate indices for the extra square:
1 const std::vector<Vertex> vertices = {
2 {{-0.5f, -0.5f, 0.0f}, {1.0f, 0.0f, 0.0f}, {0.0f, 0.0f}},

220
3 {{0.5f, -0.5f, 0.0f}, {0.0f, 1.0f, 0.0f}, {1.0f, 0.0f}},
4 {{0.5f, 0.5f, 0.0f}, {0.0f, 0.0f, 1.0f}, {1.0f, 1.0f}},
5 {{-0.5f, 0.5f, 0.0f}, {1.0f, 1.0f, 1.0f}, {0.0f, 1.0f}},
6
7 {{-0.5f, -0.5f, -0.5f}, {1.0f, 0.0f, 0.0f}, {0.0f, 0.0f}},
8 {{0.5f, -0.5f, -0.5f}, {0.0f, 1.0f, 0.0f}, {1.0f, 0.0f}},
9 {{0.5f, 0.5f, -0.5f}, {0.0f, 0.0f, 1.0f}, {1.0f, 1.0f}},
10 {{-0.5f, 0.5f, -0.5f}, {1.0f, 1.0f, 1.0f}, {0.0f, 1.0f}}
11 };
12
13 const std::vector<uint16_t> indices = {
14 0, 1, 2, 2, 3, 0,
15 4, 5, 6, 6, 7, 4
16 };

Run your program now and you’ll see something resembling an Escher illustra-
tion:

The problem is that the fragments of the lower square are drawn over the frag-
ments of the upper square, simply because it comes later in the index array.
There are two ways to solve this:
• Sort all of the draw calls by depth from back to front

221
• Use depth testing with a depth buffer
The first approach is commonly used for drawing transparent objects, because
order-independent transparency is a difficult challenge to solve. However, the
problem of ordering fragments by depth is much more commonly solved using a
depth buffer. A depth buffer is an additional attachment that stores the depth for
every position, just like the color attachment stores the color of every position.
Every time the rasterizer produces a fragment, the depth test will check if the
new fragment is closer than the previous one. If it isn’t, then the new fragment
is discarded. A fragment that passes the depth test writes its own depth to the
depth buffer. It is possible to manipulate this value from the fragment shader,
just like you can manipulate the color output.
1 #define GLM_FORCE_RADIANS
2 #define GLM_FORCE_DEPTH_ZERO_TO_ONE
3 #include <glm/glm.hpp>
4 #include <glm/gtc/matrix_transform.hpp>

The perspective projection matrix generated by GLM will use the OpenGL
depth range of -1.0 to 1.0 by default. We need to configure it to use the Vulkan
range of 0.0 to 1.0 using the GLM_FORCE_DEPTH_ZERO_TO_ONE definition.

Depth image and view


A depth attachment is based on an image, just like the color attachment. The
difference is that the swap chain will not automatically create depth images for
us. We only need a single depth image, because only one draw operation is
running at once. The depth image will again require the trifecta of resources:
image, memory and image view.
1 VkImage depthImage;
2 VkDeviceMemory depthImageMemory;
3 VkImageView depthImageView;

Create a new function createDepthResources to set up these resources:


1 void initVulkan() {
2 ...
3 createCommandPool();
4 createDepthResources();
5 createTextureImage();
6 ...
7 }
8
9 ...
10
11 void createDepthResources() {

222
12
13 }

Creating a depth image is fairly straightforward. It should have the same resolu-
tion as the color attachment, defined by the swap chain extent, an image usage
appropriate for a depth attachment, optimal tiling and device local memory.
The only question is: what is the right format for a depth image? The format
must contain a depth component, indicated by _D??_ in the VK_FORMAT_.
Unlike the texture image, we don’t necessarily need a specific format, because we
won’t be directly accessing the texels from the program. It just needs to have
a reasonable accuracy, at least 24 bits is common in real-world applications.
There are several formats that fit this requirement:
• VK_FORMAT_D32_SFLOAT: 32-bit float for depth
• VK_FORMAT_D32_SFLOAT_S8_UINT: 32-bit signed float for depth and 8 bit
stencil component
• VK_FORMAT_D24_UNORM_S8_UINT: 24-bit float for depth and 8 bit stencil
component
The stencil component is used for stencil tests, which is an additional test that
can be combined with depth testing. We’ll look at this in a future chapter.
We could simply go for the VK_FORMAT_D32_SFLOAT format, because support
for it is extremely common (see the hardware database), but it’s nice to add
some extra flexibility to our application where possible. We’re going to write a
function findSupportedFormat that takes a list of candidate formats in order
from most desirable to least desirable, and checks which is the first one that is
supported:
1 VkFormat findSupportedFormat(const std::vector<VkFormat>&
candidates, VkImageTiling tiling, VkFormatFeatureFlags features)
{
2
3 }

The support of a format depends on the tiling mode and usage, so we must also
include these as parameters. The support of a format can be queried using the
vkGetPhysicalDeviceFormatProperties function:
1 for (VkFormat format : candidates) {
2 VkFormatProperties props;
3 vkGetPhysicalDeviceFormatProperties(physicalDevice, format,
&props);
4 }

The VkFormatProperties struct contains three fields:


• linearTilingFeatures: Use cases that are supported with linear tiling

223
• optimalTilingFeatures: Use cases that are supported with optimal
tiling
• bufferFeatures: Use cases that are supported for buffers
Only the first two are relevant here, and the one we check depends on the tiling
parameter of the function:
1 if (tiling == VK_IMAGE_TILING_LINEAR && (props.linearTilingFeatures
& features) == features) {
2 return format;
3 } else if (tiling == VK_IMAGE_TILING_OPTIMAL &&
(props.optimalTilingFeatures & features) == features) {
4 return format;
5 }

If none of the candidate formats support the desired usage, then we can either
return a special value or simply throw an exception:
1 VkFormat findSupportedFormat(const std::vector<VkFormat>&
candidates, VkImageTiling tiling, VkFormatFeatureFlags features)
{
2 for (VkFormat format : candidates) {
3 VkFormatProperties props;
4 vkGetPhysicalDeviceFormatProperties(physicalDevice, format,
&props);
5
6 if (tiling == VK_IMAGE_TILING_LINEAR &&
(props.linearTilingFeatures & features) == features) {
7 return format;
8 } else if (tiling == VK_IMAGE_TILING_OPTIMAL &&
(props.optimalTilingFeatures & features) == features) {
9 return format;
10 }
11 }
12
13 throw std::runtime_error("failed to find supported format!");
14 }

We’ll use this function now to create a findDepthFormat helper function to se-
lect a format with a depth component that supports usage as depth attachment:
1 VkFormat findDepthFormat() {
2 return findSupportedFormat(
3 {VK_FORMAT_D32_SFLOAT, VK_FORMAT_D32_SFLOAT_S8_UINT,
VK_FORMAT_D24_UNORM_S8_UINT},
4 VK_IMAGE_TILING_OPTIMAL,
5 VK_FORMAT_FEATURE_DEPTH_STENCIL_ATTACHMENT_BIT
6 );

224
7 }

Make sure to use the VK_FORMAT_FEATURE_ flag instead of VK_IMAGE_USAGE_ in


this case. All of these candidate formats contain a depth component, but the
latter two also contain a stencil component. We won’t be using that yet, but we
do need to take that into account when performing layout transitions on images
with these formats. Add a simple helper function that tells us if the chosen
depth format contains a stencil component:
1 bool hasStencilComponent(VkFormat format) {
2 return format == VK_FORMAT_D32_SFLOAT_S8_UINT || format ==
VK_FORMAT_D24_UNORM_S8_UINT;
3 }

Call the function to find a depth format from createDepthResources:


1 VkFormat depthFormat = findDepthFormat();

We now have all the required information to invoke our createImage and
createImageView helper functions:
1 createImage(swapChainExtent.width, swapChainExtent.height,
depthFormat, VK_IMAGE_TILING_OPTIMAL,
VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT,
VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, depthImage,
depthImageMemory);
2 depthImageView = createImageView(depthImage, depthFormat);

However, the createImageView function currently assumes that the subresource


is always the VK_IMAGE_ASPECT_COLOR_BIT, so we will need to turn that field
into a parameter:
1 VkImageView createImageView(VkImage image, VkFormat format,
VkImageAspectFlags aspectFlags) {
2 ...
3 viewInfo.subresourceRange.aspectMask = aspectFlags;
4 ...
5 }

Update all calls to this function to use the right aspect:


1 swapChainImageViews[i] = createImageView(swapChainImages[i],
swapChainImageFormat, VK_IMAGE_ASPECT_COLOR_BIT);
2 ...
3 depthImageView = createImageView(depthImage, depthFormat,
VK_IMAGE_ASPECT_DEPTH_BIT);
4 ...
5 textureImageView = createImageView(textureImage,
VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_ASPECT_COLOR_BIT);

225
That’s it for creating the depth image. We don’t need to map it or copy another
image to it, because we’re going to clear it at the start of the render pass like
the color attachment.

Explicitly transitioning the depth image


We don’t need to explicitly transition the layout of the image to a depth at-
tachment because we’ll take care of this in the render pass. However, for com-
pleteness I’ll still describe the process in this section. You may skip it if you
like.
Make a call to transitionImageLayout at the end of the createDepthResources
function like so:
1 transitionImageLayout(depthImage, depthFormat,
VK_IMAGE_LAYOUT_UNDEFINED,
VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL);

The undefined layout can be used as initial layout, because there are no existing
depth image contents that matter. We need to update some of the logic in
transitionImageLayout to use the right subresource aspect:
1 if (newLayout == VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL) {
2 barrier.subresourceRange.aspectMask = VK_IMAGE_ASPECT_DEPTH_BIT;
3
4 if (hasStencilComponent(format)) {
5 barrier.subresourceRange.aspectMask |=
VK_IMAGE_ASPECT_STENCIL_BIT;
6 }
7 } else {
8 barrier.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
9 }

Although we’re not using the stencil component, we do need to include it in the
layout transitions of the depth image.
Finally, add the correct access masks and pipeline stages:
1 if (oldLayout == VK_IMAGE_LAYOUT_UNDEFINED && newLayout ==
VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL) {
2 barrier.srcAccessMask = 0;
3 barrier.dstAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;
4
5 sourceStage = VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT;
6 destinationStage = VK_PIPELINE_STAGE_TRANSFER_BIT;
7 } else if (oldLayout == VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL &&
newLayout == VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL) {
8 barrier.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;

226
9 barrier.dstAccessMask = VK_ACCESS_SHADER_READ_BIT;
10
11 sourceStage = VK_PIPELINE_STAGE_TRANSFER_BIT;
12 destinationStage = VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT;
13 } else if (oldLayout == VK_IMAGE_LAYOUT_UNDEFINED && newLayout ==
VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL) {
14 barrier.srcAccessMask = 0;
15 barrier.dstAccessMask =
VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_READ_BIT |
VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT;
16
17 sourceStage = VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT;
18 destinationStage = VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT;
19 } else {
20 throw std::invalid_argument("unsupported layout transition!");
21 }

The depth buffer will be read from to perform depth tests to see if a fragment
is visible, and will be written to when a new fragment is drawn. The reading
happens in the VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT stage and the
writing in the VK_PIPELINE_STAGE_LATE_FRAGMENT_TESTS_BIT. You should
pick the earliest pipeline stage that matches the specified operations, so that it
is ready for usage as depth attachment when it needs to be.

Render pass
We’re now going to modify createRenderPass to include a depth attachment.
First specify the VkAttachmentDescription:
1 VkAttachmentDescription depthAttachment{};
2 depthAttachment.format = findDepthFormat();
3 depthAttachment.samples = VK_SAMPLE_COUNT_1_BIT;
4 depthAttachment.loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR;
5 depthAttachment.storeOp = VK_ATTACHMENT_STORE_OP_DONT_CARE;
6 depthAttachment.stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE;
7 depthAttachment.stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE;
8 depthAttachment.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED;
9 depthAttachment.finalLayout =
VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL;

The format should be the same as the depth image itself. This time we don’t
care about storing the depth data (storeOp), because it will not be used after
drawing has finished. This may allow the hardware to perform additional op-
timizations. Just like the color buffer, we don’t care about the previous depth
contents, so we can use VK_IMAGE_LAYOUT_UNDEFINED as initialLayout.

227
1 VkAttachmentReference depthAttachmentRef{};
2 depthAttachmentRef.attachment = 1;
3 depthAttachmentRef.layout =
VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL;

Add a reference to the attachment for the first (and only) subpass:
1 VkSubpassDescription subpass{};
2 subpass.pipelineBindPoint = VK_PIPELINE_BIND_POINT_GRAPHICS;
3 subpass.colorAttachmentCount = 1;
4 subpass.pColorAttachments = &colorAttachmentRef;
5 subpass.pDepthStencilAttachment = &depthAttachmentRef;

Unlike color attachments, a subpass can only use a single depth (+stencil) at-
tachment. It wouldn’t really make any sense to do depth tests on multiple
buffers.
1 std::array<VkAttachmentDescription, 2> attachments =
{colorAttachment, depthAttachment};
2 VkRenderPassCreateInfo renderPassInfo{};
3 renderPassInfo.sType = VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO;
4 renderPassInfo.attachmentCount =
static_cast<uint32_t>(attachments.size());
5 renderPassInfo.pAttachments = attachments.data();
6 renderPassInfo.subpassCount = 1;
7 renderPassInfo.pSubpasses = &subpass;
8 renderPassInfo.dependencyCount = 1;
9 renderPassInfo.pDependencies = &dependency;

Next, update the VkSubpassDependency struct to refer to both attachments.


1 dependency.srcStageMask =
VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT |
VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT;
2 dependency.dstStageMask =
VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT |
VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT;
3 dependency.dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT |
VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT;

Finally, we need to extend our subpass dependencies to make sure that there
is no conflict between the transitioning of the depth image and it being cleared
as part of its load operation. The depth image is first accessed in the early
fragment test pipeline stage and because we have a load operation that clears,
we should specify the access mask for writes.

228
Framebuffer
The next step is to modify the framebuffer creation to bind the depth image
to the depth attachment. Go to createFramebuffers and specify the depth
image view as second attachment:
1 std::array<VkImageView, 2> attachments = {
2 swapChainImageViews[i],
3 depthImageView
4 };
5
6 VkFramebufferCreateInfo framebufferInfo{};
7 framebufferInfo.sType = VK_STRUCTURE_TYPE_FRAMEBUFFER_CREATE_INFO;
8 framebufferInfo.renderPass = renderPass;
9 framebufferInfo.attachmentCount =
static_cast<uint32_t>(attachments.size());
10 framebufferInfo.pAttachments = attachments.data();
11 framebufferInfo.width = swapChainExtent.width;
12 framebufferInfo.height = swapChainExtent.height;
13 framebufferInfo.layers = 1;

The color attachment differs for every swap chain image, but the same depth
image can be used by all of them because only a single subpass is running at
the same time due to our semaphores.
You’ll also need to move the call to createFramebuffers to make sure that it
is called after the depth image view has actually been created:
1 void initVulkan() {
2 ...
3 createDepthResources();
4 createFramebuffers();
5 ...
6 }

Clear values
Because we now have multiple attachments with VK_ATTACHMENT_LOAD_OP_CLEAR,
we also need to specify multiple clear values. Go to recordCommandBuffer and
create an array of VkClearValue structs:
1 std::array<VkClearValue, 2> clearValues{};
2 clearValues[0].color = {{0.0f, 0.0f, 0.0f, 1.0f}};
3 clearValues[1].depthStencil = {1.0f, 0};
4
5 renderPassInfo.clearValueCount =
static_cast<uint32_t>(clearValues.size());

229
6 renderPassInfo.pClearValues = clearValues.data();

The range of depths in the depth buffer is 0.0 to 1.0 in Vulkan, where 1.0 lies
at the far view plane and 0.0 at the near view plane. The initial value at each
point in the depth buffer should be the furthest possible depth, which is 1.0.
Note that the order of clearValues should be identical to the order of your
attachments.

Depth and stencil state


The depth attachment is ready to be used now, but depth testing still
needs to be enabled in the graphics pipeline. It is configured through the
VkPipelineDepthStencilStateCreateInfo struct:
1 VkPipelineDepthStencilStateCreateInfo depthStencil{};
2 depthStencil.sType =
VK_STRUCTURE_TYPE_PIPELINE_DEPTH_STENCIL_STATE_CREATE_INFO;
3 depthStencil.depthTestEnable = VK_TRUE;
4 depthStencil.depthWriteEnable = VK_TRUE;

The depthTestEnable field specifies if the depth of new fragments should


be compared to the depth buffer to see if they should be discarded. The
depthWriteEnable field specifies if the new depth of fragments that pass the
depth test should actually be written to the depth buffer.
1 depthStencil.depthCompareOp = VK_COMPARE_OP_LESS;

The depthCompareOp field specifies the comparison that is performed to keep


or discard fragments. We’re sticking to the convention of lower depth = closer,
so the depth of new fragments should be less.
1 depthStencil.depthBoundsTestEnable = VK_FALSE;
2 depthStencil.minDepthBounds = 0.0f; // Optional
3 depthStencil.maxDepthBounds = 1.0f; // Optional

The depthBoundsTestEnable, minDepthBounds and maxDepthBounds fields are


used for the optional depth bound test. Basically, this allows you to only keep
fragments that fall within the specified depth range. We won’t be using this
functionality.
1 depthStencil.stencilTestEnable = VK_FALSE;
2 depthStencil.front = {}; // Optional
3 depthStencil.back = {}; // Optional

The last three fields configure stencil buffer operations, which we also won’t
be using in this tutorial. If you want to use these operations, then you will
have to make sure that the format of the depth/stencil image contains a stencil
component.

230
1 pipelineInfo.pDepthStencilState = &depthStencil;

Update the VkGraphicsPipelineCreateInfo struct to reference the depth sten-


cil state we just filled in. A depth stencil state must always be specified if the
render pass contains a depth stencil attachment.
If you run your program now, then you should see that the fragments of the
geometry are now correctly ordered:

Handling window resize


The resolution of the depth buffer should change when the window is resized to
match the new color attachment resolution. Extend the recreateSwapChain
function to recreate the depth resources in that case:
1 void recreateSwapChain() {
2 int width = 0, height = 0;
3 while (width == 0 || height == 0) {
4 glfwGetFramebufferSize(window, &width, &height);
5 glfwWaitEvents();
6 }
7
8 vkDeviceWaitIdle(device);

231
9
10 cleanupSwapChain();
11
12 createSwapChain();
13 createImageViews();
14 createDepthResources();
15 createFramebuffers();
16 }

The cleanup operations should happen in the swap chain cleanup function:
1 void cleanupSwapChain() {
2 vkDestroyImageView(device, depthImageView, nullptr);
3 vkDestroyImage(device, depthImage, nullptr);
4 vkFreeMemory(device, depthImageMemory, nullptr);
5
6 ...
7 }

Congratulations, your application is now finally ready to render arbitrary 3D


geometry and have it look right. We’re going to try this out in the next chapter
by drawing a textured model!
C++ code / Vertex shader / Fragment shader

232
Loading models

Introduction
Your program is now ready to render textured 3D meshes, but the current
geometry in the vertices and indices arrays is not very interesting yet. In
this chapter we’re going to extend the program to load the vertices and indices
from an actual model file to make the graphics card actually do some work.
Many graphics API tutorials have the reader write their own OBJ loader in a
chapter like this. The problem with this is that any remotely interesting 3D
application will soon require features that are not supported by this file format,
like skeletal animation. We will load mesh data from an OBJ model in this
chapter, but we’ll focus more on integrating the mesh data with the program
itself rather than the details of loading it from a file.

Library
We will use the tinyobjloader library to load vertices and faces from an OBJ file.
It’s fast and it’s easy to integrate because it’s a single file library like stb_image.
Go to the repository linked above and download the tiny_obj_loader.h file to
a folder in your library directory.
Visual Studio
Add the directory with tiny_obj_loader.h in it to the Additional Include
Directories paths.

233
Makefile
Add the directory with tiny_obj_loader.h to the include directories for GCC:
1 VULKAN_SDK_PATH = /home/user/VulkanSDK/x.x.x.x/x86_64
2 STB_INCLUDE_PATH = /home/user/libraries/stb
3 TINYOBJ_INCLUDE_PATH = /home/user/libraries/tinyobjloader
4
5 ...
6
7 CFLAGS = -std=c++17 -I$(VULKAN_SDK_PATH)/include
-I$(STB_INCLUDE_PATH) -I$(TINYOBJ_INCLUDE_PATH)

Sample mesh
In this chapter we won’t be enabling lighting yet, so it helps to use a sample
model that has lighting baked into the texture. An easy way to find such
models is to look for 3D scans on Sketchfab. Many of the models on that site
are available in OBJ format with a permissive license.
For this tutorial I’ve decided to go with the Viking room model by nigelgoh (CC
BY 4.0). I tweaked the size and orientation of the model to use it as a drop in
replacement for the current geometry:
• viking_room.obj
• viking_room.png
Feel free to use your own model, but make sure that it only consists of one
material and that is has dimensions of about 1.5 x 1.5 x 1.5 units. If it is larger
than that, then you’ll have to change the view matrix. Put the model file in
a new models directory next to shaders and textures, and put the texture
image in the textures directory.
Put two new configuration variables in your program to define the model and
texture paths:
1 const uint32_t WIDTH = 800;

234
2 const uint32_t HEIGHT = 600;
3
4 const std::string MODEL_PATH = "models/viking_room.obj";
5 const std::string TEXTURE_PATH = "textures/viking_room.png";

And update createTextureImage to use this path variable:


1 stbi_uc* pixels = stbi_load(TEXTURE_PATH.c_str(), &texWidth,
&texHeight, &texChannels, STBI_rgb_alpha);

Loading vertices and indices


We’re going to load the vertices and indices from the model file now, so you
should remove the global vertices and indices arrays now. Replace them
with non-const containers as class members:
1 std::vector<Vertex> vertices;
2 std::vector<uint32_t> indices;
3 VkBuffer vertexBuffer;
4 VkDeviceMemory vertexBufferMemory;

You should change the type of the indices from uint16_t to uint32_t, because
there are going to be a lot more vertices than 65535. Remember to also change
the vkCmdBindIndexBuffer parameter:
1 vkCmdBindIndexBuffer(commandBuffer, indexBuffer, 0,
VK_INDEX_TYPE_UINT32);

The tinyobjloader library is included in the same way as STB libraries. Include
the tiny_obj_loader.h file and make sure to define TINYOBJLOADER_IMPLEMENTATION
in one source file to include the function bodies and avoid linker errors:
1 #define TINYOBJLOADER_IMPLEMENTATION
2 #include <tiny_obj_loader.h>

We’re now going to write a loadModel function that uses this library to populate
the vertices and indices containers with the vertex data from the mesh. It
should be called somewhere before the vertex and index buffers are created:
1 void initVulkan() {
2 ...
3 loadModel();
4 createVertexBuffer();
5 createIndexBuffer();
6 ...
7 }
8

235
9 ...
10
11 void loadModel() {
12
13 }

A model is loaded into the library’s data structures by calling the


tinyobj::LoadObj function:
1 void loadModel() {
2 tinyobj::attrib_t attrib;
3 std::vector<tinyobj::shape_t> shapes;
4 std::vector<tinyobj::material_t> materials;
5 std::string warn, err;
6
7 if (!tinyobj::LoadObj(&attrib, &shapes, &materials, &warn, &err,
MODEL_PATH.c_str())) {
8 throw std::runtime_error(warn + err);
9 }
10 }

An OBJ file consists of positions, normals, texture coordinates and faces. Faces
consist of an arbitrary amount of vertices, where each vertex refers to a position,
normal and/or texture coordinate by index. This makes it possible to not just
reuse entire vertices, but also individual attributes.
The attrib container holds all of the positions, normals and texture coordinates
in its attrib.vertices, attrib.normals and attrib.texcoords vectors. The
shapes container contains all of the separate objects and their faces. Each face
consists of an array of vertices, and each vertex contains the indices of the
position, normal and texture coordinate attributes. OBJ models can also define
a material and texture per face, but we will be ignoring those.
The err string contains errors and the warn string contains warnings that oc-
curred while loading the file, like a missing material definition. Loading only
really failed if the LoadObj function returns false. As mentioned above, faces
in OBJ files can actually contain an arbitrary number of vertices, whereas our
application can only render triangles. Luckily the LoadObj has an optional
parameter to automatically triangulate such faces, which is enabled by default.
We’re going to combine all of the faces in the file into a single model, so just
iterate over all of the shapes:
1 for (const auto& shape : shapes) {
2
3 }

236
The triangulation feature has already made sure that there are three vertices per
face, so we can now directly iterate over the vertices and dump them straight
into our vertices vector:
1 for (const auto& shape : shapes) {
2 for (const auto& index : shape.mesh.indices) {
3 Vertex vertex{};
4
5 vertices.push_back(vertex);
6 indices.push_back(indices.size());
7 }
8 }

For simplicity, we will assume that every vertex is unique for now, hence the sim-
ple auto-increment indices. The index variable is of type tinyobj::index_t,
which contains the vertex_index, normal_index and texcoord_index mem-
bers. We need to use these indices to look up the actual vertex attributes in the
attrib arrays:
1 vertex.pos = {
2 attrib.vertices[3 * index.vertex_index + 0],
3 attrib.vertices[3 * index.vertex_index + 1],
4 attrib.vertices[3 * index.vertex_index + 2]
5 };
6
7 vertex.texCoord = {
8 attrib.texcoords[2 * index.texcoord_index + 0],
9 attrib.texcoords[2 * index.texcoord_index + 1]
10 };
11
12 vertex.color = {1.0f, 1.0f, 1.0f};

Unfortunately the attrib.vertices array is an array of float values instead


of something like glm::vec3, so you need to multiply the index by 3. Similarly,
there are two texture coordinate components per entry. The offsets of 0, 1 and
2 are used to access the X, Y and Z components, or the U and V components
in the case of texture coordinates.
Run your program now with optimization enabled (e.g. Release mode in Visual
Studio and with the -O3 compiler flag for GCC‘). This is necessary, because
otherwise loading the model will be very slow. You should see something like
the following:

237
Great, the geometry looks correct, but what’s going on with the texture? The
OBJ format assumes a coordinate system where a vertical coordinate of 0 means
the bottom of the image, however we’ve uploaded our image into Vulkan in a
top to bottom orientation where 0 means the top of the image. Solve this by
flipping the vertical component of the texture coordinates:
1 vertex.texCoord = {
2 attrib.texcoords[2 * index.texcoord_index + 0],
3 1.0f - attrib.texcoords[2 * index.texcoord_index + 1]
4 };

When you run your program again, you should now see the correct result:

238
All that hard work is finally beginning to pay off with a demo like this!
As the model rotates you may notice that the rear (backside of the
walls) looks a bit funny. This is normal and is simply because the
model is not really designed to be viewed from that side.

Vertex deduplication
Unfortunately we’re not really taking advantage of the index buffer yet. The
vertices vector contains a lot of duplicated vertex data, because many vertices
are included in multiple triangles. We should keep only the unique vertices and
use the index buffer to reuse them whenever they come up. A straightforward
way to implement this is to use a map or unordered_map to keep track of the
unique vertices and respective indices:
1 #include <unordered_map>
2
3 ...
4
5 std::unordered_map<Vertex, uint32_t> uniqueVertices{};
6
7 for (const auto& shape : shapes) {
8 for (const auto& index : shape.mesh.indices) {

239
9 Vertex vertex{};
10
11 ...
12
13 if (uniqueVertices.count(vertex) == 0) {
14 uniqueVertices[vertex] =
static_cast<uint32_t>(vertices.size());
15 vertices.push_back(vertex);
16 }
17
18 indices.push_back(uniqueVertices[vertex]);
19 }
20 }

Every time we read a vertex from the OBJ file, we check if we’ve already seen a
vertex with the exact same position and texture coordinates before. If not, we
add it to vertices and store its index in the uniqueVertices container. After
that we add the index of the new vertex to indices. If we’ve seen the exact
same vertex before, then we look up its index in uniqueVertices and store that
index in indices.
The program will fail to compile right now, because using a user-defined type
like our Vertex struct as key in a hash table requires us to implement two
functions: equality test and hash calculation. The former is easy to implement
by overriding the == operator in the Vertex struct:
1 bool operator==(const Vertex& other) const {
2 return pos == other.pos && color == other.color && texCoord ==
other.texCoord;
3 }

A hash function for Vertex is implemented by specifying a template special-


ization for std::hash<T>. Hash functions are a complex topic, but cpprefer-
ence.com recommends the following approach combining the fields of a struct
to create a decent quality hash function:
1 namespace std {
2 template<> struct hash<Vertex> {
3 size_t operator()(Vertex const& vertex) const {
4 return ((hash<glm::vec3>()(vertex.pos) ^
5 (hash<glm::vec3>()(vertex.color) << 1)) >> 1) ^
6 (hash<glm::vec2>()(vertex.texCoord) << 1);
7 }
8 };
9 }

This code should be placed outside the Vertex struct. The hash functions for
the GLM types need to be included using the following header:

240
1 #define GLM_ENABLE_EXPERIMENTAL
2 #include <glm/gtx/hash.hpp>

The hash functions are defined in the gtx folder, which means that it is tech-
nically still an experimental extension to GLM. Therefore you need to define
GLM_ENABLE_EXPERIMENTAL to use it. It means that the API could change with
a new version of GLM in the future, but in practice the API is very stable.
You should now be able to successfully compile and run your program. If
you check the size of vertices, then you’ll see that it has shrunk down from
1,500,000 to 265,645! That means that each vertex is reused in an average
number of ~6 triangles. This definitely saves us a lot of GPU memory.
C++ code / Vertex shader / Fragment shader

241
Generating Mipmaps

Introduction
Our program can now load and render 3D models. In this chapter, we will add
one more feature, mipmap generation. Mipmaps are widely used in games and
rendering software, and Vulkan gives us complete control over how they are
created.
Mipmaps are precalculated, downscaled versions of an image. Each new image
is half the width and height of the previous one. Mipmaps are used as a form of
Level of Detail or LOD. Objects that are far away from the camera will sample
their textures from the smaller mip images. Using smaller images increases the
rendering speed and avoids artifacts such as Moiré patterns. An example of
what mipmaps look like:

242
Image creation
In Vulkan, each of the mip images is stored in different mip levels of a VkImage.
Mip level 0 is the original image, and the mip levels after level 0 are commonly
referred to as the mip chain.
The number of mip levels is specified when the VkImage is created. Up until
now, we have always set this value to one. We need to calculate the number of
mip levels from the dimensions of the image. First, add a class member to store
this number:
1 ...
2 uint32_t mipLevels;
3 VkImage textureImage;
4 ...

The value for mipLevels can be found once we’ve loaded the texture in
createTextureImage:
1 int texWidth, texHeight, texChannels;
2 stbi_uc* pixels = stbi_load(TEXTURE_PATH.c_str(), &texWidth,
&texHeight, &texChannels, STBI_rgb_alpha);
3 ...
4 mipLevels =
static_cast<uint32_t>(std::floor(std::log2(std::max(texWidth,
texHeight)))) + 1;

This calculates the number of levels in the mip chain. The max function se-
lects the largest dimension. The log2 function calculates how many times that
dimension can be divided by 2. The floor function handles cases where the
largest dimension is not a power of 2. 1 is added so that the original image has
a mip level.
To use this value, we need to change the createImage, createImageView, and
transitionImageLayout functions to allow us to specify the number of mip
levels. Add a mipLevels parameter to the functions:
1 void createImage(uint32_t width, uint32_t height, uint32_t
mipLevels, VkFormat format, VkImageTiling tiling,
VkImageUsageFlags usage, VkMemoryPropertyFlags properties,
VkImage& image, VkDeviceMemory& imageMemory) {
2 ...
3 imageInfo.mipLevels = mipLevels;
4 ...
5 }

1 VkImageView createImageView(VkImage image, VkFormat format,


VkImageAspectFlags aspectFlags, uint32_t mipLevels) {
2 ...

243
3 viewInfo.subresourceRange.levelCount = mipLevels;
4 ...

1 void transitionImageLayout(VkImage image, VkFormat format,


VkImageLayout oldLayout, VkImageLayout newLayout, uint32_t
mipLevels) {
2 ...
3 barrier.subresourceRange.levelCount = mipLevels;
4 ...

Update all calls to these functions to use the right values:


1 createImage(swapChainExtent.width, swapChainExtent.height, 1,
depthFormat, VK_IMAGE_TILING_OPTIMAL,
VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT,
VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, depthImage,
depthImageMemory);
2 ...
3 createImage(texWidth, texHeight, mipLevels, VK_FORMAT_R8G8B8A8_SRGB,
VK_IMAGE_TILING_OPTIMAL, VK_IMAGE_USAGE_TRANSFER_DST_BIT |
VK_IMAGE_USAGE_SAMPLED_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT,
textureImage, textureImageMemory);

1 swapChainImageViews[i] = createImageView(swapChainImages[i],
swapChainImageFormat, VK_IMAGE_ASPECT_COLOR_BIT, 1);
2 ...
3 depthImageView = createImageView(depthImage, depthFormat,
VK_IMAGE_ASPECT_DEPTH_BIT, 1);
4 ...
5 textureImageView = createImageView(textureImage,
VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_ASPECT_COLOR_BIT, mipLevels);

1 transitionImageLayout(depthImage, depthFormat,
VK_IMAGE_LAYOUT_UNDEFINED,
VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL, 1);
2 ...
3 transitionImageLayout(textureImage, VK_FORMAT_R8G8B8A8_SRGB,
VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
mipLevels);

Generating Mipmaps
Our texture image now has multiple mip levels, but the staging buffer can only
be used to fill mip level 0. The other levels are still undefined. To fill these
levels we need to generate the data from the single level that we have. We will

244
use the vkCmdBlitImage command. This command performs copying, scaling,
and filtering operations. We will call this multiple times to blit data to each
level of our texture image.
vkCmdBlitImage is considered a transfer operation, so we must inform Vulkan
that we intend to use the texture image as both the source and destination
of a transfer. Add VK_IMAGE_USAGE_TRANSFER_SRC_BIT to the texture image’s
usage flags in createTextureImage:
1 ...
2 createImage(texWidth, texHeight, mipLevels, VK_FORMAT_R8G8B8A8_SRGB,
VK_IMAGE_TILING_OPTIMAL, VK_IMAGE_USAGE_TRANSFER_SRC_BIT |
VK_IMAGE_USAGE_TRANSFER_DST_BIT | VK_IMAGE_USAGE_SAMPLED_BIT,
VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, textureImage,
textureImageMemory);
3 ...

Like other image operations, vkCmdBlitImage depends on the layout


of the image it operates on. We could transition the entire image to
VK_IMAGE_LAYOUT_GENERAL, but this will most likely be slow. For optimal per-
formance, the source image should be in VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL
and the destination image should be in VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL.
Vulkan allows us to transition each mip level of an image independently. Each
blit will only deal with two mip levels at a time, so we can transition each level
into the optimal layout between blits commands.
transitionImageLayout only performs layout transitions on the entire
image, so we’ll need to write a few more pipeline barrier commands. Remove
the existing transition to VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL in
createTextureImage:
1 ...
2 transitionImageLayout(textureImage, VK_FORMAT_R8G8B8A8_SRGB,
VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
mipLevels);
3 copyBufferToImage(stagingBuffer, textureImage,
static_cast<uint32_t>(texWidth),
static_cast<uint32_t>(texHeight));
4 //transitioned to VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL while
generating mipmaps
5 ...

This will leave each level of the texture image in VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL.


Each level will be transitioned to VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL
after the blit command reading from it is finished.
We’re now going to write the function that generates the mipmaps:

245
1 void generateMipmaps(VkImage image, int32_t texWidth, int32_t
texHeight, uint32_t mipLevels) {
2 VkCommandBuffer commandBuffer = beginSingleTimeCommands();
3
4 VkImageMemoryBarrier barrier{};
5 barrier.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER;
6 barrier.image = image;
7 barrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
8 barrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
9 barrier.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
10 barrier.subresourceRange.baseArrayLayer = 0;
11 barrier.subresourceRange.layerCount = 1;
12 barrier.subresourceRange.levelCount = 1;
13
14 endSingleTimeCommands(commandBuffer);
15 }

We’re going to make several transitions, so we’ll reuse this VkImageMemoryBarrier.


The fields set above will remain the same for all barriers. subresourceRange.miplevel,
oldLayout, newLayout, srcAccessMask, and dstAccessMask will be changed
for each transition.
1 int32_t mipWidth = texWidth;
2 int32_t mipHeight = texHeight;
3
4 for (uint32_t i = 1; i < mipLevels; i++) {
5
6 }

This loop will record each of the VkCmdBlitImage commands. Note that the
loop variable starts at 1, not 0.
1 barrier.subresourceRange.baseMipLevel = i - 1;
2 barrier.oldLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL;
3 barrier.newLayout = VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL;
4 barrier.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;
5 barrier.dstAccessMask = VK_ACCESS_TRANSFER_READ_BIT;
6
7 vkCmdPipelineBarrier(commandBuffer,
8 VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT,
0,
9 0, nullptr,
10 0, nullptr,
11 1, &barrier);

First, we transition level i - 1 to VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL.


This transition will wait for level i - 1 to be filled, either from the previous

246
blit command, or from vkCmdCopyBufferToImage. The current blit command
will wait on this transition.
1 VkImageBlit blit{};
2 blit.srcOffsets[0] = { 0, 0, 0 };
3 blit.srcOffsets[1] = { mipWidth, mipHeight, 1 };
4 blit.srcSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
5 blit.srcSubresource.mipLevel = i - 1;
6 blit.srcSubresource.baseArrayLayer = 0;
7 blit.srcSubresource.layerCount = 1;
8 blit.dstOffsets[0] = { 0, 0, 0 };
9 blit.dstOffsets[1] = { mipWidth > 1 ? mipWidth / 2 : 1, mipHeight >
1 ? mipHeight / 2 : 1, 1 };
10 blit.dstSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
11 blit.dstSubresource.mipLevel = i;
12 blit.dstSubresource.baseArrayLayer = 0;
13 blit.dstSubresource.layerCount = 1;

Next, we specify the regions that will be used in the blit operation. The source
mip level is i - 1 and the destination mip level is i. The two elements of
the srcOffsets array determine the 3D region that data will be blitted from.
dstOffsets determines the region that data will be blitted to. The X and Y
dimensions of the dstOffsets[1] are divided by two since each mip level is
half the size of the previous level. The Z dimension of srcOffsets[1] and
dstOffsets[1] must be 1, since a 2D image has a depth of 1.
1 vkCmdBlitImage(commandBuffer,
2 image, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
3 image, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
4 1, &blit,
5 VK_FILTER_LINEAR);

Now, we record the blit command. Note that textureImage is used for both
the srcImage and dstImage parameter. This is because we’re blitting between
different levels of the same image. The source mip level was just transitioned
to VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL and the destination level is still
in VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL from createTextureImage.
Beware if you are using a dedicated transfer queue (as suggested in Vertex
buffers): vkCmdBlitImage must be submitted to a queue with graphics capabil-
ity.
The last parameter allows us to specify a VkFilter to use in the blit. We have
the same filtering options here that we had when making the VkSampler. We
use the VK_FILTER_LINEAR to enable interpolation.
1 barrier.oldLayout = VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL;
2 barrier.newLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL;

247
3 barrier.srcAccessMask = VK_ACCESS_TRANSFER_READ_BIT;
4 barrier.dstAccessMask = VK_ACCESS_SHADER_READ_BIT;
5
6 vkCmdPipelineBarrier(commandBuffer,
7 VK_PIPELINE_STAGE_TRANSFER_BIT,
VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT, 0,
8 0, nullptr,
9 0, nullptr,
10 1, &barrier);

This barrier transitions mip level i - 1 to VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL.


This transition waits on the current blit command to finish. All sampling
operations will wait on this transition to finish.
1 ...
2 if (mipWidth > 1) mipWidth /= 2;
3 if (mipHeight > 1) mipHeight /= 2;
4 }

At the end of the loop, we divide the current mip dimensions by two. We check
each dimension before the division to ensure that dimension never becomes 0.
This handles cases where the image is not square, since one of the mip dimensions
would reach 1 before the other dimension. When this happens, that dimension
should remain 1 for all remaining levels.
1 barrier.subresourceRange.baseMipLevel = mipLevels - 1;
2 barrier.oldLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL;
3 barrier.newLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL;
4 barrier.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;
5 barrier.dstAccessMask = VK_ACCESS_SHADER_READ_BIT;
6
7 vkCmdPipelineBarrier(commandBuffer,
8 VK_PIPELINE_STAGE_TRANSFER_BIT,
VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT, 0,
9 0, nullptr,
10 0, nullptr,
11 1, &barrier);
12
13 endSingleTimeCommands(commandBuffer);
14 }

Before we end the command buffer, we insert one more pipeline barrier. This bar-
rier transitions the last mip level from VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL
to VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL. This wasn’t handled by
the loop, since the last mip level is never blitted from.
Finally, add the call to generateMipmaps in createTextureImage:

248
1 transitionImageLayout(textureImage, VK_FORMAT_R8G8B8A8_SRGB,
VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
mipLevels);
2 copyBufferToImage(stagingBuffer, textureImage,
static_cast<uint32_t>(texWidth),
static_cast<uint32_t>(texHeight));
3 //transitioned to VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL while
generating mipmaps
4 ...
5 generateMipmaps(textureImage, texWidth, texHeight, mipLevels);

Our texture image’s mipmaps are now completely filled.

Linear filtering support


It is very convenient to use a built-in function like vkCmdBlitImage to generate
all the mip levels, but unfortunately it is not guaranteed to be supported on all
platforms. It requires the texture image format we use to support linear filter-
ing, which can be checked with the vkGetPhysicalDeviceFormatProperties
function. We will add a check to the generateMipmaps function for this.
First add an additional parameter that specifies the image format:
1 void createTextureImage() {
2 ...
3
4 generateMipmaps(textureImage, VK_FORMAT_R8G8B8A8_SRGB, texWidth,
texHeight, mipLevels);
5 }
6
7 void generateMipmaps(VkImage image, VkFormat imageFormat, int32_t
texWidth, int32_t texHeight, uint32_t mipLevels) {
8
9 ...
10 }

In the generateMipmaps function, use vkGetPhysicalDeviceFormatProperties


to request the properties of the texture image format:
1 void generateMipmaps(VkImage image, VkFormat imageFormat, int32_t
texWidth, int32_t texHeight, uint32_t mipLevels) {
2
3 // Check if image format supports linear blitting
4 VkFormatProperties formatProperties;
5 vkGetPhysicalDeviceFormatProperties(physicalDevice, imageFormat,
&formatProperties);
6

249
7 ...

The VkFormatProperties struct has three fields named linearTilingFeatures,


optimalTilingFeatures and bufferFeatures that each describe how
the format can be used depending on the way it is used. We create
a texture image with the optimal tiling format, so we need to check
optimalTilingFeatures. Support for the linear filtering feature can be
checked with the VK_FORMAT_FEATURE_SAMPLED_IMAGE_FILTER_LINEAR_BIT:
1 if (!(formatProperties.optimalTilingFeatures &
VK_FORMAT_FEATURE_SAMPLED_IMAGE_FILTER_LINEAR_BIT)) {
2 throw std::runtime_error("texture image format does not support
linear blitting!");
3 }

There are two alternatives in this case. You could implement a function that
searches common texture image formats for one that does support linear blitting,
or you could implement the mipmap generation in software with a library like
stb_image_resize. Each mip level can then be loaded into the image in the
same way that you loaded the original image.
It should be noted that it is uncommon in practice to generate the mipmap
levels at runtime anyway. Usually they are pregenerated and stored in the
texture file alongside the base level to improve loading speed. Implementing
resizing in software and loading multiple levels from a file is left as an exercise
to the reader.

Sampler
While the VkImage holds the mipmap data, VkSampler controls how that data is
read while rendering. Vulkan allows us to specify minLod, maxLod, mipLodBias,
and mipmapMode (“Lod” means “Level of Detail”). When a texture is sampled,
the sampler selects a mip level according to the following pseudocode:
1 lod = getLodLevelFromScreenSize(); //smaller when the object is
close, may be negative
2 lod = clamp(lod + mipLodBias, minLod, maxLod);
3
4 level = clamp(floor(lod), 0, texture.mipLevels - 1); //clamped to
the number of mip levels in the texture
5
6 if (mipmapMode == VK_SAMPLER_MIPMAP_MODE_NEAREST) {
7 color = sample(level);
8 } else {
9 color = blend(sample(level), sample(level + 1));
10 }

250
If samplerInfo.mipmapMode is VK_SAMPLER_MIPMAP_MODE_NEAREST, lod selects
the mip level to sample from. If the mipmap mode is VK_SAMPLER_MIPMAP_MODE_LINEAR,
lod is used to select two mip levels to be sampled. Those levels are sampled
and the results are linearly blended.
The sample operation is also affected by lod:
1 if (lod <= 0) {
2 color = readTexture(uv, magFilter);
3 } else {
4 color = readTexture(uv, minFilter);
5 }

If the object is close to the camera, magFilter is used as the filter. If the object
is further from the camera, minFilter is used. Normally, lod is non-negative,
and is only 0 when close the camera. mipLodBias lets us force Vulkan to use
lower lod and level than it would normally use.
To see the results of this chapter, we need to choose values for our
textureSampler. We’ve already set the minFilter and magFilter to
use VK_FILTER_LINEAR. We just need to choose values for minLod, maxLod,
mipLodBias, and mipmapMode.
1 void createTextureSampler() {
2 ...
3 samplerInfo.mipmapMode = VK_SAMPLER_MIPMAP_MODE_LINEAR;
4 samplerInfo.minLod = 0.0f; // Optional
5 samplerInfo.maxLod = static_cast<float>(mipLevels);
6 samplerInfo.mipLodBias = 0.0f; // Optional
7 ...
8 }

To allow the full range of mip levels to be used, we set minLod to 0.0f, and
maxLod to the number of mip levels. We have no reason to change the lod value
, so we set mipLodBias to 0.0f.
Now run your program and you should see the following:

251
It’s not a dramatic difference, since our scene is so simple. There are subtle
differences if you look closely.

The most noticeable difference is the writing on the papers. With mipmaps, the
writing has been smoothed. Without mipmaps, the writing has harsh edges and

252
gaps from Moiré artifacts.
You can play around with the sampler settings to see how they affect mipmap-
ping. For example, by changing minLod, you can force the sampler to not use
the lowest mip levels:
1 samplerInfo.minLod = static_cast<float>(mipLevels / 2);

These settings will produce this image:

This is how higher mip levels will be used when objects are further away from
the camera.
C++ code / Vertex shader / Fragment shader

253
Multisampling

Introduction
Our program can now load multiple levels of detail for textures which fixes
artifacts when rendering objects far away from the viewer. The image is now
a lot smoother, however on closer inspection you will notice jagged saw-like
patterns along the edges of drawn geometric shapes. This is especially visible
in one of our early programs when we rendered a quad:

This undesired effect is called “aliasing” and it’s a result of a limited numbers
of pixels that are available for rendering. Since there are no displays out there

254
with unlimited resolution, it will be always visible to some extent. There’s a
number of ways to fix this and in this chapter we’ll focus on one of the more
popular ones: Multisample anti-aliasing (MSAA).
In ordinary rendering, the pixel color is determined based on a single sample
point which in most cases is the center of the target pixel on screen. If part
of the drawn line passes through a certain pixel but doesn’t cover the sample
point, that pixel will be left blank, leading to the jagged “staircase” effect.

What MSAA does is it uses multiple sample points per pixel (hence the name)
to determine its final color. As one might expect, more samples lead to better
results, however it is also more computationally expensive.

In our implementation, we will focus on using the maximum available sample

255
count. Depending on your application this may not always be the best approach
and it might be better to use less samples for the sake of higher performance if
the final result meets your quality demands.

Getting available sample count


Let’s start off by determining how many samples our hardware can use. Most
modern GPUs support at least 8 samples but this number is not guaranteed to
be the same everywhere. We’ll keep track of it by adding a new class member:
1 ...
2 VkSampleCountFlagBits msaaSamples = VK_SAMPLE_COUNT_1_BIT;
3 ...

By default we’ll be using only one sample per pixel which is equivalent to no mul-
tisampling, in which case the final image will remain unchanged. The exact max-
imum number of samples can be extracted from VkPhysicalDeviceProperties
associated with our selected physical device. We’re using a depth buffer, so we
have to take into account the sample count for both color and depth. The high-
est sample count that is supported by both (&) will be the maximum we can
support. Add a function that will fetch this information for us:
1 VkSampleCountFlagBits getMaxUsableSampleCount() {
2 VkPhysicalDeviceProperties physicalDeviceProperties;
3 vkGetPhysicalDeviceProperties(physicalDevice,
&physicalDeviceProperties);
4
5 VkSampleCountFlags counts =
physicalDeviceProperties.limits.framebufferColorSampleCounts
&
physicalDeviceProperties.limits.framebufferDepthSampleCounts;
6 if (counts & VK_SAMPLE_COUNT_64_BIT) { return
VK_SAMPLE_COUNT_64_BIT; }
7 if (counts & VK_SAMPLE_COUNT_32_BIT) { return
VK_SAMPLE_COUNT_32_BIT; }
8 if (counts & VK_SAMPLE_COUNT_16_BIT) { return
VK_SAMPLE_COUNT_16_BIT; }
9 if (counts & VK_SAMPLE_COUNT_8_BIT) { return
VK_SAMPLE_COUNT_8_BIT; }
10 if (counts & VK_SAMPLE_COUNT_4_BIT) { return
VK_SAMPLE_COUNT_4_BIT; }
11 if (counts & VK_SAMPLE_COUNT_2_BIT) { return
VK_SAMPLE_COUNT_2_BIT; }
12
13 return VK_SAMPLE_COUNT_1_BIT;
14 }

256
We will now use this function to set the msaaSamples variable during the
physical device selection process. For this, we have to slightly modify the
pickPhysicalDevice function:
1 void pickPhysicalDevice() {
2 ...
3 for (const auto& device : devices) {
4 if (isDeviceSuitable(device)) {
5 physicalDevice = device;
6 msaaSamples = getMaxUsableSampleCount();
7 break;
8 }
9 }
10 ...
11 }

Setting up a render target


In MSAA, each pixel is sampled in an offscreen buffer which is then rendered
to the screen. This new buffer is slightly different from regular images we’ve
been rendering to - they have to be able to store more than one sample per
pixel. Once a multisampled buffer is created, it has to be resolved to the default
framebuffer (which stores only a single sample per pixel). This is why we have
to create an additional render target and modify our current drawing process.
We only need one render target since only one drawing operation is active at a
time, just like with the depth buffer. Add the following class members:
1 ...
2 VkImage colorImage;
3 VkDeviceMemory colorImageMemory;
4 VkImageView colorImageView;
5 ...

This new image will have to store the desired number of samples per pixel, so
we need to pass this number to VkImageCreateInfo during the image creation
process. Modify the createImage function by adding a numSamples parameter:
1 void createImage(uint32_t width, uint32_t height, uint32_t
mipLevels, VkSampleCountFlagBits numSamples, VkFormat format,
VkImageTiling tiling, VkImageUsageFlags usage,
VkMemoryPropertyFlags properties, VkImage& image,
VkDeviceMemory& imageMemory) {
2 ...
3 imageInfo.samples = numSamples;
4 ...

257
For now, update all calls to this function using VK_SAMPLE_COUNT_1_BIT - we
will be replacing this with proper values as we progress with implementation:
1 createImage(swapChainExtent.width, swapChainExtent.height, 1,
VK_SAMPLE_COUNT_1_BIT, depthFormat, VK_IMAGE_TILING_OPTIMAL,
VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT,
VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, depthImage,
depthImageMemory);
2 ...
3 createImage(texWidth, texHeight, mipLevels, VK_SAMPLE_COUNT_1_BIT,
VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_TILING_OPTIMAL,
VK_IMAGE_USAGE_TRANSFER_SRC_BIT |
VK_IMAGE_USAGE_TRANSFER_DST_BIT | VK_IMAGE_USAGE_SAMPLED_BIT,
VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, textureImage,
textureImageMemory);

We will now create a multisampled color buffer. Add a createColorResources


function and note that we’re using msaaSamples here as a function parameter
to createImage. We’re also using only one mip level, since this is enforced by
the Vulkan specification in case of images with more than one sample per pixel.
Also, this color buffer doesn’t need mipmaps since it’s not going to be used as
a texture:
1 void createColorResources() {
2 VkFormat colorFormat = swapChainImageFormat;
3
4 createImage(swapChainExtent.width, swapChainExtent.height, 1,
msaaSamples, colorFormat, VK_IMAGE_TILING_OPTIMAL,
VK_IMAGE_USAGE_TRANSIENT_ATTACHMENT_BIT |
VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT,
VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, colorImage,
colorImageMemory);
5 colorImageView = createImageView(colorImage, colorFormat,
VK_IMAGE_ASPECT_COLOR_BIT, 1);
6 }

For consistency, call the function right before createDepthResources:


1 void initVulkan() {
2 ...
3 createColorResources();
4 createDepthResources();
5 ...
6 }

Now that we have a multisampled color buffer in place it’s time to take care
of depth. Modify createDepthResources and update the number of samples
used by the depth buffer:

258
1 void createDepthResources() {
2 ...
3 createImage(swapChainExtent.width, swapChainExtent.height, 1,
msaaSamples, depthFormat, VK_IMAGE_TILING_OPTIMAL,
VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT,
VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, depthImage,
depthImageMemory);
4 ...
5 }

We have now created a couple of new Vulkan resources, so let’s not forget to
release them when necessary:
1 void cleanupSwapChain() {
2 vkDestroyImageView(device, colorImageView, nullptr);
3 vkDestroyImage(device, colorImage, nullptr);
4 vkFreeMemory(device, colorImageMemory, nullptr);
5 ...
6 }

And update the recreateSwapChain so that the new color image can be recre-
ated in the correct resolution when the window is resized:
1 void recreateSwapChain() {
2 ...
3 createImageViews();
4 createColorResources();
5 createDepthResources();
6 ...
7 }

We made it past the initial MSAA setup, now we need to start using this new
resource in our graphics pipeline, framebuffer, render pass and see the results!

Adding new attachments


Let’s take care of the render pass first. Modify createRenderPass and update
color and depth attachment creation info structs:
1 void createRenderPass() {
2 ...
3 colorAttachment.samples = msaaSamples;
4 colorAttachment.finalLayout =
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
5 ...
6 depthAttachment.samples = msaaSamples;
7 ...

259
You’ll notice that we have changed the finalLayout from VK_IMAGE_LAYOUT_PRESENT_SRC_KHR
to VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL. That’s because multisam-
pled images cannot be presented directly. We first need to resolve them to a
regular image. This requirement does not apply to the depth buffer, since it
won’t be presented at any point. Therefore we will have to add only one new
attachment for color which is a so-called resolve attachment:
1 ...
2 VkAttachmentDescription colorAttachmentResolve{};
3 colorAttachmentResolve.format = swapChainImageFormat;
4 colorAttachmentResolve.samples = VK_SAMPLE_COUNT_1_BIT;
5 colorAttachmentResolve.loadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE;
6 colorAttachmentResolve.storeOp = VK_ATTACHMENT_STORE_OP_STORE;
7 colorAttachmentResolve.stencilLoadOp =
VK_ATTACHMENT_LOAD_OP_DONT_CARE;
8 colorAttachmentResolve.stencilStoreOp =
VK_ATTACHMENT_STORE_OP_DONT_CARE;
9 colorAttachmentResolve.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED;
10 colorAttachmentResolve.finalLayout =
VK_IMAGE_LAYOUT_PRESENT_SRC_KHR;
11 ...

The render pass now has to be instructed to resolve multisampled color image
into regular attachment. Create a new attachment reference that will point to
the color buffer which will serve as the resolve target:
1 ...
2 VkAttachmentReference colorAttachmentResolveRef{};
3 colorAttachmentResolveRef.attachment = 2;
4 colorAttachmentResolveRef.layout =
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
5 ...

Set the pResolveAttachments subpass struct member to point to the newly


created attachment reference. This is enough to let the render pass define a
multisample resolve operation which will let us render the image to screen:
1 ...
2 subpass.pResolveAttachments = &colorAttachmentResolveRef;
3 ...

Now update render pass info struct with the new color attachment:
1 ...
2 std::array<VkAttachmentDescription, 3> attachments =
{colorAttachment, depthAttachment, colorAttachmentResolve};
3 ...

260
With the render pass in place, modify createFramebuffers and add the new
image view to the list:
1 void createFramebuffers() {
2 ...
3 std::array<VkImageView, 3> attachments = {
4 colorImageView,
5 depthImageView,
6 swapChainImageViews[i]
7 };
8 ...
9 }

Finally, tell the newly created pipeline to use more than one sample by modifying
createGraphicsPipeline:
1 void createGraphicsPipeline() {
2 ...
3 multisampling.rasterizationSamples = msaaSamples;
4 ...
5 }

Now run your program and you should see the following:

261
Just like with mipmapping, the difference may not be apparent straight away.
On a closer look you’ll notice that the edges are not as jagged anymore and the
whole image seems a bit smoother compared to the original.

The difference is more noticable when looking up close at one of the edges:

Quality improvements
There are certain limitations of our current MSAA implementation which may
impact the quality of the output image in more detailed scenes. For exam-
ple, we’re currently not solving potential problems caused by shader aliasing,

262
i.e. MSAA only smoothens out the edges of geometry but not the interior filling.
This may lead to a situation when you get a smooth polygon rendered on screen
but the applied texture will still look aliased if it contains high contrasting col-
ors. One way to approach this problem is to enable Sample Shading which will
improve the image quality even further, though at an additional performance
cost:
1 void createLogicalDevice() {
2 ...
3 deviceFeatures.sampleRateShading = VK_TRUE; // enable sample
shading feature for the device
4 ...
5 }
6
7 void createGraphicsPipeline() {
8 ...
9 multisampling.sampleShadingEnable = VK_TRUE; // enable sample
shading in the pipeline
10 multisampling.minSampleShading = .2f; // min fraction for sample
shading; closer to one is smoother
11 ...
12 }

In this example we’ll leave sample shading disabled but in certain scenarios the
quality improvement may be noticeable:

Conclusion
It has taken a lot of work to get to this point, but now you finally have a good
base for a Vulkan program. The knowledge of the basic principles of Vulkan

263
that you now possess should be sufficient to start exploring more of the features,
like:
• Push constants
• Instanced rendering
• Dynamic uniforms
• Separate images and sampler descriptors
• Pipeline cache
• Multi-threaded command buffer generation
• Multiple subpasses
• Compute shaders
The current program can be extended in many ways, like adding Blinn-Phong
lighting, post-processing effects and shadow mapping. You should be able to
learn how these effects work from tutorials for other APIs, because despite
Vulkan’s explicitness, many concepts still work the same.
C++ code / Vertex shader / Fragment shader

264
Compute Shader

Introduction
In this bonus chapter we’ll take a look at compute shaders. Up until now
all previous chapters dealt with the traditional graphics part of the Vulkan
pipeline. But unlike older APIs like OpenGL, compute shader support in Vulkan
is mandatory. This means that you can use compute shaders on every Vulkan
implementation available, no matter if it’s a high-end desktop GPU or a low-
powered embedded device.
This opens up the world of general purpose computing on graphics processor
units (GPGPU), no matter where your application is running. GPGPU means
that you can do general computations on your GPU, something that has tra-
ditionally been a domain of CPUs. But with GPUs having become more and
more powerful and more flexible, many workloads that would require the general
purpose capabilities of a CPU can now be done on the GPU in realtime.
A few examples of where the compute capabilities of a GPU can be used are
image manipulation, visibility testing, post processing, advanced lighting calcu-
lations, animations, physics (e.g. for a particle system) and much more. And it’s
even possible to use compute for non-visual computational only work that does
not require any graphics output, e.g. number crunching or AI related things.
This is called “headless compute”.

Advantages
Doing computationally expensive calculations on the GPU has several advan-
tages. The most obvious one is offloading work from the CPU. Another one
is not requiring moving data between the CPU’s main memory and the GPU’s
memory. All of the data can stay on the GPU without having to wait for slow
transfers from main memory.
Aside from these, GPUs are heavily parallelized with some of them having tens
of thousands of small compute units. This often makes them a better fit for
highly parallel workflows than a CPU with a few large compute units.

265
The Vulkan pipeline
It’s important to know that compute is completely separated from the graphics
part of the pipeline. This is visible in the following block diagram of the Vulkan
pipeline from the official specification:

In this diagram we can see the traditional graphics part of the pipeline on the
left, and several stages on the right that are not part of this graphics pipeline,
including the compute shader (stage). With the compute shader stage being
detached from the graphics pipeline we’ll be able to use it anywhere we see fit.
This is very different from e.g. the fragment shader which is always applied to
the transformed output of the vertex shader.
The center of the diagram also shows that e.g. descriptor sets are also used by
compute, so everything we learned about descriptors layouts, descriptor sets
and descriptors also applies here.

An example
An easy to understand example that we will implement in this chapter is a
GPU based particle system. Such systems are used in many games and often
consist of thousands of particles that need to be updated at interactive frame
rates. Rendering such a system requires 2 main components: vertices, passed
as vertex buffers, and a way to update them based on some equation.
The “classical” CPU based particle system would store particle data in the sys-
tem’s main memory and then use the CPU to update them. After the update,
the vertices need to be transferred to the GPU’s memory again so it can dis-
play the updated particles in the next frame. The most straight-forward way
would be recreating the vertex buffer with the new data for each frame. This
is obviously very costly. Depending on your implementation, there are other

266
options like mapping GPU memory so it can be written by the CPU (called
“resizable BAR” on desktop systems, or unified memory on integrated GPUs)
or just using a host local buffer (which would be the slowest method due to
PCI-E bandwidth). But no matter what buffer update method you choose, you
always require a “round-trip” to the CPU to update the particles.
With a GPU based particle system, this round-trip is no longer required. Ver-
tices are only uploaded to the GPU at the start and all updates are done in the
GPU’s memory using compute shaders. One of the main reasons why this is
faster is the much higher bandwidth between the GPU and it’s local memory.
In a CPU based scenario, you’d be limited by main memory and PCI-express
bandwidth, which is often just a fraction of the GPU’s memory bandwidth.
When doing this on a GPU with a dedicated compute queue, you can update
particles in parallel to the rendering part of the graphics pipeline. This is called
“async compute”, and is an advanced topic not covered in this tutorial.
Here is a screenshot from this chapter’s code. The particles shown here are up-
dated by a compute shader directly on the GPU, without any CPU interaction:

267
Data manipulation
In this tutorial we already learned about different buffer types like vertex and
index buffers for passing primitives and uniform buffers for passing data to a
shader. And we also used images to do texture mapping. But up until now, we
always wrote data using the CPU and only did reads on the GPU.
An important concept introduced with compute shaders is the ability to arbi-
trarily read from and write to buffers. For this, Vulkan offers two dedicated
storage types.

Shader storage buffer objects (SSBO)


A shader storage buffer (SSBO) allows shaders to read from and write to a buffer.
Using these is similar to using uniform buffer objects. The biggest differences are
that you can alias other buffer types to SSBOs and that they can be arbitrarily
large.
Going back to the GPU based particle system, you might now wonder how to
deal with vertices being updated (written) by the compute shader and read
(drawn) by the vertex shader, as both usages would seemingly require different
buffer types.
But that’s not the case. In Vulkan you can specify multiple usages for buffers
and images. So for the particle vertex buffer to be used as a vertex buffer (in
the graphics pass) and as a storage buffer (in the compute pass), you simply
create the buffer with those two usage flags:
1 VkBufferCreateInfo bufferInfo{};
2 ...
3 bufferInfo.usage = VK_BUFFER_USAGE_VERTEX_BUFFER_BIT |
VK_BUFFER_USAGE_STORAGE_BUFFER_BIT |
VK_BUFFER_USAGE_TRANSFER_DST_BIT;
4 ...
5
6 if (vkCreateBuffer(device, &bufferInfo, nullptr,
&shaderStorageBuffers[i]) != VK_SUCCESS) {
7 throw std::runtime_error("failed to create vertex buffer!");
8 }

The two flags VK_BUFFER_USAGE_VERTEX_BUFFER_BIT and VK_BUFFER_USAGE_STORAGE_BUFFER_BIT


set with bufferInfo.usage tell the implementation that we want to use this
buffer for two different scenarios: as a vertex buffer in the vertex shader and as a
store buffer. Note that we also added the VK_BUFFER_USAGE_TRANSFER_DST_BIT
flag in here so we can transfer data from the host to the GPU. This is cru-
cial as we want the shader storage buffer to stay in GPU memory only
(VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT) we need to to transfer data from
the host to this buffer.

268
Here is the same code using using the createBuffer helper function:
1 createBuffer(bufferSize, VK_BUFFER_USAGE_STORAGE_BUFFER_BIT |
VK_BUFFER_USAGE_VERTEX_BUFFER_BIT |
VK_BUFFER_USAGE_TRANSFER_DST_BIT,
VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, shaderStorageBuffers[i],
shaderStorageBuffersMemory[i]);

The GLSL shader declaration for accessing such a buffer looks like this:
1 struct Particle {
2 vec2 position;
3 vec2 velocity;
4 vec4 color;
5 };
6
7 layout(std140, binding = 1) readonly buffer ParticleSSBOIn {
8 Particle particlesIn[ ];
9 };
10
11 layout(std140, binding = 2) buffer ParticleSSBOOut {
12 Particle particlesOut[ ];
13 };

In this example we have a typed SSBO with each particle having a position and
velocity value (see the Particle struct). The SSBO then contains an unbound
number of particles as marked by the []. Not having to specify the number of
elements in an SSBO is one of the advantages over e.g. uniform buffers. std140
is a memory layout qualifier that determines how the member elements of the
shader storage buffer are aligned in memory. This gives us certain guarantees,
required to map the buffers between the host and the GPU.
Writing to such a storage buffer object in the compute shader is straight-forward
and similar to how you’d write to the buffer on the C++ side:
1 particlesOut[index].position = particlesIn[index].position +
particlesIn[index].velocity.xy * ubo.deltaTime;

Storage images
Note that we won’t be doing image manipulation in this chapter. This paragraph
is here to make readers aware that compute shaders can also be used for image
manipulation.
A storage image allows you read from and write to an image. Typical use cases
are applying image effects to textures, doing post processing (which in turn is
very similar) or generating mip-maps.
This is similar for images:

269
1 VkImageCreateInfo imageInfo {};
2 ...
3 imageInfo.usage = VK_IMAGE_USAGE_SAMPLED_BIT |
VK_IMAGE_USAGE_STORAGE_BIT;
4 ...
5
6 if (vkCreateImage(device, &imageInfo, nullptr, &textureImage) !=
VK_SUCCESS) {
7 throw std::runtime_error("failed to create image!");
8 }

The two flags VK_IMAGE_USAGE_SAMPLED_BIT and VK_IMAGE_USAGE_STORAGE_BIT


set with imageInfo.usage tell the implementation that we want to use this
image for two different scenarios: as an image sampled in the fragment shader
and as a storage image in the computer shader;
The GLSL shader declaration for storage image looks similar to sampled images
used e.g. in the fragment shader:
1 layout (binding = 0, rgba8) uniform readonly image2D inputImage;
2 layout (binding = 1, rgba8) uniform writeonly image2D outputImage;

A few differences here are additional attributes like rgba8 for the format of the
image, the readonly and writeonly qualifiers, telling the implementation that
we will only read from the input image and write to the output image. And last
but not least we need to use the image2D type to declare a storage image.
Reading from and writing to storage images in the compute shader is then done
using imageLoad and imageStore:
1 vec3 pixel = imageLoad(inputImage,
ivec2(gl_GlobalInvocationID.xy)).rgb;
2 imageStore(outputImage, ivec2(gl_GlobalInvocationID.xy), pixel);

Compute queue families


In the physical device and queue families chapter we already learned about queue
families and how to select a graphics queue family. Compute uses the queue
family properties flag bit VK_QUEUE_COMPUTE_BIT. So if we want to do compute
work, we need to get a queue from a queue family that supports compute.
Note that Vulkan requires an implementation which supports graphics opera-
tions to have at least one queue family that supports both graphics and compute
operations, but it’s also possible that implementations offer a dedicated com-
pute queue. This dedicated compute queue (that does not have the graphics
bit) hints at an asynchronous compute queue. To keep this tutorial beginner

270
friendly though, we’ll use a queue that can do both graphics and compute opera-
tions. This will also save us from dealing with several advanced synchronization
mechanisms.
For our compute sample we need to change the device creation code a bit:
1 uint32_t queueFamilyCount = 0;
2 vkGetPhysicalDeviceQueueFamilyProperties(device, &queueFamilyCount,
nullptr);
3
4 std::vector<VkQueueFamilyProperties> queueFamilies(queueFamilyCount);
5 vkGetPhysicalDeviceQueueFamilyProperties(device, &queueFamilyCount,
queueFamilies.data());
6
7 int i = 0;
8 for (const auto& queueFamily : queueFamilies) {
9 if ((queueFamily.queueFlags & VK_QUEUE_GRAPHICS_BIT) &&
(queueFamily.queueFlags & VK_QUEUE_COMPUTE_BIT)) {
10 indices.graphicsAndComputeFamily = i;
11 }
12
13 i++;
14 }

The changed queue family index selection code will now try to find a queue
family that supports both graphics and compute.
We can then get a compute queue from this queue family in createLogicalDevice:
1 vkGetDeviceQueue(device, indices.graphicsAndComputeFamily.value(),
0, &computeQueue);

The compute shader stage


In the graphics samples we have used different pipeline stages to load shaders
and access descriptors. Compute shaders are accessed in a similar way by
using the VK_SHADER_STAGE_COMPUTE_BIT pipeline. So loading a compute
shader is just the same as loading a vertex shader, but with a different shader
stage. We’ll talk about this in detail in the next paragraphs. Compute also
introduces a new binding point type for descriptors and pipelines named
VK_PIPELINE_BIND_POINT_COMPUTE that we’ll have to use later on.

Loading compute shaders


Loading compute shaders in our application is the same as loading any
other other shader. The only real difference is that we’ll need to use the
VK_SHADER_STAGE_COMPUTE_BIT mentioned above.

271
1 auto computeShaderCode = readFile("shaders/compute.spv");
2
3 VkShaderModule computeShaderModule =
createShaderModule(computeShaderCode);
4
5 VkPipelineShaderStageCreateInfo computeShaderStageInfo{};
6 computeShaderStageInfo.sType =
VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO;
7 computeShaderStageInfo.stage = VK_SHADER_STAGE_COMPUTE_BIT;
8 computeShaderStageInfo.module = computeShaderModule;
9 computeShaderStageInfo.pName = "main";
10 ...

Preparing the shader storage buffers


Earlier on we learned that we can use shader storage buffers to pass arbitrary
data to compute shaders. For this example we will upload an array of particles
to the GPU, so we can manipulate it directly in the GPU’s memory.
In the frames in flight chapter we talked about duplicating resources per frame
in flight, so we can keep the CPU and the GPU busy. First we declare a vector
for the buffer object and the device memory backing it up:
1 std::vector<VkBuffer> shaderStorageBuffers;
2 std::vector<VkDeviceMemory> shaderStorageBuffersMemory;

In the createShaderStorageBuffers we then resize those vectors to match the


max. number of frames in flight:
1 shaderStorageBuffers.resize(MAX_FRAMES_IN_FLIGHT);
2 shaderStorageBuffersMemory.resize(MAX_FRAMES_IN_FLIGHT);

With this setup in place we can start to move the initial particle information to
the GPU. We first initialize a vector of particles on the host side:
1 // Initialize particles
2 std::default_random_engine rndEngine((unsigned)time(nullptr));
3 std::uniform_real_distribution<float> rndDist(0.0f, 1.0f);
4
5 // Initial particle positions on a circle
6 std::vector<Particle> particles(PARTICLE_COUNT);
7 for (auto& particle : particles) {
8 float r = 0.25f * sqrt(rndDist(rndEngine));
9 float theta = rndDist(rndEngine) * 2 *
3.14159265358979323846;
10 float x = r * cos(theta) * HEIGHT / WIDTH;
11 float y = r * sin(theta);

272
12 particle.position = glm::vec2(x, y);
13 particle.velocity = glm::normalize(glm::vec2(x,y)) *
0.00025f;
14 particle.color = glm::vec4(rndDist(rndEngine),
rndDist(rndEngine), rndDist(rndEngine), 1.0f);
15 }

We then create a staging buffer in the host’s memory to hold the initial particle
properties:
1 VkDeviceSize bufferSize = sizeof(Particle) * PARTICLE_COUNT;
2
3 VkBuffer stagingBuffer;
4 VkDeviceMemory stagingBufferMemory;
5 createBuffer(bufferSize, VK_BUFFER_USAGE_TRANSFER_SRC_BIT,
VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, stagingBuffer,
stagingBufferMemory);
6
7 void* data;
8 vkMapMemory(device, stagingBufferMemory, 0, bufferSize, 0,
&data);
9 memcpy(data, particles.data(), (size_t)bufferSize);
10 vkUnmapMemory(device, stagingBufferMemory);

Using this staging buffer as a source we then create the per-frame shader storage
buffers and copy the particle properties from the staging buffer to each of these:
1 for (size_t i = 0; i < MAX_FRAMES_IN_FLIGHT; i++) {
2 createBuffer(bufferSize, VK_BUFFER_USAGE_STORAGE_BUFFER_BIT
| VK_BUFFER_USAGE_VERTEX_BUFFER_BIT |
VK_BUFFER_USAGE_TRANSFER_DST_BIT,
VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT,
shaderStorageBuffers[i], shaderStorageBuffersMemory[i]);
3 // Copy data from the staging buffer (host) to the shader
storage buffer (GPU)
4 copyBuffer(stagingBuffer, shaderStorageBuffers[i],
bufferSize);
5 }
6 }

Descriptors
Setting up descriptors for compute is almost identical to graphics. The only
difference is that descriptors need to have the VK_SHADER_STAGE_COMPUTE_BIT
set to make them accessible by the compute stage:

273
1 std::array<VkDescriptorSetLayoutBinding, 3> layoutBindings{};
2 layoutBindings[0].binding = 0;
3 layoutBindings[0].descriptorCount = 1;
4 layoutBindings[0].descriptorType = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER;
5 layoutBindings[0].pImmutableSamplers = nullptr;
6 layoutBindings[0].stageFlags = VK_SHADER_STAGE_COMPUTE_BIT;
7 ...

Note that you can combine shader stages here, so if you want the descriptor to
be accessible from the vertex and compute stage, e.g. for a uniform buffer with
parameters shared across them, you simply set the bits for both stages:
1 layoutBindings[0].stageFlags = VK_SHADER_STAGE_VERTEX_BIT |
VK_SHADER_STAGE_COMPUTE_BIT;

Here is the descriptor setup for our sample. The layout looks like this:
1 std::array<VkDescriptorSetLayoutBinding, 3> layoutBindings{};
2 layoutBindings[0].binding = 0;
3 layoutBindings[0].descriptorCount = 1;
4 layoutBindings[0].descriptorType = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER;
5 layoutBindings[0].pImmutableSamplers = nullptr;
6 layoutBindings[0].stageFlags = VK_SHADER_STAGE_COMPUTE_BIT;
7
8 layoutBindings[1].binding = 1;
9 layoutBindings[1].descriptorCount = 1;
10 layoutBindings[1].descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER;
11 layoutBindings[1].pImmutableSamplers = nullptr;
12 layoutBindings[1].stageFlags = VK_SHADER_STAGE_COMPUTE_BIT;
13
14 layoutBindings[2].binding = 2;
15 layoutBindings[2].descriptorCount = 1;
16 layoutBindings[2].descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER;
17 layoutBindings[2].pImmutableSamplers = nullptr;
18 layoutBindings[2].stageFlags = VK_SHADER_STAGE_COMPUTE_BIT;
19
20 VkDescriptorSetLayoutCreateInfo layoutInfo{};
21 layoutInfo.sType =
VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO;
22 layoutInfo.bindingCount = 3;
23 layoutInfo.pBindings = layoutBindings.data();
24
25 if (vkCreateDescriptorSetLayout(device, &layoutInfo, nullptr,
&computeDescriptorSetLayout) != VK_SUCCESS) {
26 throw std::runtime_error("failed to create compute descriptor
set layout!");
27 }

274
Looking at this setup, you might wonder why we have two layout bindings for
shader storage buffer objects, even though we’ll only render a single particle
system. This is because the particle positions are updated frame by frame
based on a delta time. This means that each frame needs to know about the
last frames’ particle positions, so it can update them with a new delta time and
write them to it’s own SSBO:

For that, the compute shader needs to have access to the last and cur-
rent frame’s SSBOs. This is done by passing both to the compute
shader in our descriptor setup. See the storageBufferInfoLastFrame
and storageBufferInfoCurrentFrame:
1 for (size_t i = 0; i < MAX_FRAMES_IN_FLIGHT; i++) {
2 VkDescriptorBufferInfo uniformBufferInfo{};
3 uniformBufferInfo.buffer = uniformBuffers[i];
4 uniformBufferInfo.offset = 0;
5 uniformBufferInfo.range = sizeof(UniformBufferObject);
6
7 std::array<VkWriteDescriptorSet, 3> descriptorWrites{};
8 ...
9
10 VkDescriptorBufferInfo storageBufferInfoLastFrame{};
11 storageBufferInfoLastFrame.buffer = shaderStorageBuffers[(i - 1)
% MAX_FRAMES_IN_FLIGHT];
12 storageBufferInfoLastFrame.offset = 0;
13 storageBufferInfoLastFrame.range = sizeof(Particle) *
PARTICLE_COUNT;
14
15 descriptorWrites[1].sType =
VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
16 descriptorWrites[1].dstSet = computeDescriptorSets[i];
17 descriptorWrites[1].dstBinding = 1;
18 descriptorWrites[1].dstArrayElement = 0;
19 descriptorWrites[1].descriptorType =

275
VK_DESCRIPTOR_TYPE_STORAGE_BUFFER;
20 descriptorWrites[1].descriptorCount = 1;
21 descriptorWrites[1].pBufferInfo = &storageBufferInfoLastFrame;
22
23 VkDescriptorBufferInfo storageBufferInfoCurrentFrame{};
24 storageBufferInfoCurrentFrame.buffer = shaderStorageBuffers[i];
25 storageBufferInfoCurrentFrame.offset = 0;
26 storageBufferInfoCurrentFrame.range = sizeof(Particle) *
PARTICLE_COUNT;
27
28 descriptorWrites[2].sType =
VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
29 descriptorWrites[2].dstSet = computeDescriptorSets[i];
30 descriptorWrites[2].dstBinding = 2;
31 descriptorWrites[2].dstArrayElement = 0;
32 descriptorWrites[2].descriptorType =
VK_DESCRIPTOR_TYPE_STORAGE_BUFFER;
33 descriptorWrites[2].descriptorCount = 1;
34 descriptorWrites[2].pBufferInfo = &storageBufferInfoCurrentFrame;
35
36 vkUpdateDescriptorSets(device, 3, descriptorWrites.data(), 0,
nullptr);
37 }

Remember that we also have to request the descriptor types for the SSBOs from
our descriptor pool:
1 std::array<VkDescriptorPoolSize, 2> poolSizes{};
2 ...
3
4 poolSizes[1].type = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER;
5 poolSizes[1].descriptorCount =
static_cast<uint32_t>(MAX_FRAMES_IN_FLIGHT) * 2;

We need to double the number of VK_DESCRIPTOR_TYPE_STORAGE_BUFFER types


requested from the pool by two because our sets reference the SSBOs of the last
and current frame.

Compute pipelines
As compute is not a part of the graphics pipeline, we can’t use vkCreateGraphicsPipelines.
Instead we need to create a dedicated compute pipeline with vkCreateComputePipelines
for running our compute commands. Since a compute pipeline does not touch
any of the rasterization state, it has a lot less state than a graphics pipeline:
1 VkComputePipelineCreateInfo pipelineInfo{};

276
2 pipelineInfo.sType = VK_STRUCTURE_TYPE_COMPUTE_PIPELINE_CREATE_INFO;
3 pipelineInfo.layout = computePipelineLayout;
4 pipelineInfo.stage = computeShaderStageInfo;
5
6 if (vkCreateComputePipelines(device, VK_NULL_HANDLE, 1,
&pipelineInfo, nullptr, &computePipeline) != VK_SUCCESS) {
7 throw std::runtime_error("failed to create compute pipeline!");
8 }

The setup is a lot simpler, as we only require one shader stage and a pipeline
layout. The pipeline layout works the same as with the graphics pipeline:
1 VkPipelineLayoutCreateInfo pipelineLayoutInfo{};
2 pipelineLayoutInfo.sType =
VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO;
3 pipelineLayoutInfo.setLayoutCount = 1;
4 pipelineLayoutInfo.pSetLayouts = &computeDescriptorSetLayout;
5
6 if (vkCreatePipelineLayout(device, &pipelineLayoutInfo, nullptr,
&computePipelineLayout) != VK_SUCCESS) {
7 throw std::runtime_error("failed to create compute pipeline
layout!");
8 }

Compute space
Before we get into how a compute shader works and how we submit compute
workloads to the GPU, we need to talk about two important compute concepts:
work groups and invocations. They define an abstract execution model for
how compute workloads are processed by the compute hardware of the GPU in
three dimensions (x, y, and z).
Work groups define how the compute workloads are formed and processed by
the the compute hardware of the GPU. You can think of them as work items the
GPU has to work through. Work group dimensions are set by the application
at command buffer time using a dispatch command.
And each work group then is a collection of invocations that execute the same
compute shader. Invocations can potentially run in parallel and their dimensions
are set in the compute shader. Invocations within a single workgroup have access
to shared memory.
This image shows the relation between these two in three dimensions:

277
The number of dimensions for work groups (defined by vkCmdDispatch) and
invocations depends (defined by the local sizes in the compute shader) on how
input data is structured. If you e.g. work on a one-dimensional array, like we
do in this chapter, you only have to specify the x dimension for both.
As an example: If we dispatch a work group count of [64, 1, 1] with a compute
shader local size of [32, 32, ,1], our compute shader will be invoked 64 x 32 x 32
= 65,536 times.
Note that the maximum count for work groups and local sizes differs from
implementation to implementation, so you should always check the compute
related maxComputeWorkGroupCount, maxComputeWorkGroupInvocations and
maxComputeWorkGroupSize limits in VkPhysicalDeviceLimits.

Compute shaders
Now that we have learned about all the parts required to setup a compute
shader pipeline, it’s time to take a look at compute shaders. All of the things
we learned about using GLSL shaders e.g. for vertex and fragment shaders also
applies to compute shaders. The syntax is the same, and many concepts like
passing data between the application and the shader are the same. But there
are some important differences.
A very basic compute shader for updating a linear array of particles may look
like this:
1 #version 450
2
3 layout (binding = 0) uniform ParameterUBO {
4 float deltaTime;
5 } ubo;
6
7 struct Particle {
8 vec2 position;

278
9 vec2 velocity;
10 vec4 color;
11 };
12
13 layout(std140, binding = 1) readonly buffer ParticleSSBOIn {
14 Particle particlesIn[ ];
15 };
16
17 layout(std140, binding = 2) buffer ParticleSSBOOut {
18 Particle particlesOut[ ];
19 };
20
21 layout (local_size_x = 256, local_size_y = 1, local_size_z = 1) in;
22
23 void main()
24 {
25 uint index = gl_GlobalInvocationID.x;
26
27 Particle particleIn = particlesIn[index];
28
29 particlesOut[index].position = particleIn.position +
particleIn.velocity.xy * ubo.deltaTime;
30 particlesOut[index].velocity = particleIn.velocity;
31 ...
32 }

The top part of the shader contains the declarations for the shader’s input.
First is a uniform buffer object at binding 0, something we already learned
about in this tutorial. Below we declare our Particle structure that matches the
declaration in the C++ code. Binding 1 then refers to the shader storage buffer
object with the particle data from the last frame (see the descriptor setup), and
binding 2 points to the SSBO for the current frame, which is the one we’ll be
updating with this shader.
An interesting thing is this compute-only declaration related to the compute
space:
1 layout (local_size_x = 256, local_size_y = 1, local_size_z = 1) in;

This defines the number invocations of this compute shader in the current work
group. As noted earlier, this is the local part of the compute space. Hence the
local_ prefix. As we work on a linear 1D array of particles we only need to
specify a number for x dimension in local_size_x.
The main function then reads from the last frame’s SSBO and writes the updated
particle position to the SSBO for the current frame. Similar to other shader
types, compute shaders have their own set of builtin input variables. Built-ins

279
are always prefixed with gl_. One such built-in is gl_GlobalInvocationID, a
variable that uniquely identifies the current compute shader invocation across
the current dispatch. We use this to index into our particle array.

Running compute commands


Dispatch
Now it’s time to actually tell the GPU to do some compute. This is done by
calling vkCmdDispatch inside a command buffer. While not perfectly true, a
dispatch is for compute as a draw call like vkCmdDraw is for graphics. This
dispatches a given number of compute work items in at max. three dimensions.
1 VkCommandBufferBeginInfo beginInfo{};
2 beginInfo.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO;
3
4 if (vkBeginCommandBuffer(commandBuffer, &beginInfo) != VK_SUCCESS) {
5 throw std::runtime_error("failed to begin recording command
buffer!");
6 }
7
8 ...
9
10 vkCmdBindPipeline(commandBuffer, VK_PIPELINE_BIND_POINT_COMPUTE,
computePipeline);
11 vkCmdBindDescriptorSets(commandBuffer,
VK_PIPELINE_BIND_POINT_COMPUTE, computePipelineLayout, 0, 1,
&computeDescriptorSets[i], 0, 0);
12
13 vkCmdDispatch(computeCommandBuffer, PARTICLE_COUNT / 256, 1, 1);
14
15 ...
16
17 if (vkEndCommandBuffer(commandBuffer) != VK_SUCCESS) {
18 throw std::runtime_error("failed to record command buffer!");
19 }

The vkCmdDispatch will dispatch PARTICLE_COUNT / 256 local work groups in


the x dimension. As our particles array is linear, we leave the other two dimen-
sions at one, resulting in a one-dimensional dispatch. But why do we divide
the number of particles (in our array) by 256? That’s because in the previous
paragraph we defined that every compute shader in a work group will do 256
invocations. So if we were to have 4096 particles, we would dispatch 16 work
groups, with each work group running 256 compute shader invocations. Getting
the two numbers right usually takes some tinkering and profiling, depending
on your workload and the hardware you’re running on. If your particle size

280
would be dynamic and can’t always be divided by e.g. 256, you can always use
gl_GlobalInvocationID at the start of your compute shader and return from
it if the global invocation index is greater than the number of your particles.
And just as was the case for the compute pipeline, a compute command buffer
contains a lot less state then a graphics command buffer. There’s no need to
start a render pass or set a viewport.

Submitting work
As our sample does both compute and graphics operations, we’ll be doing two
submits to both the graphics and compute queue per frame (see the drawFrame
function):
1 ...
2 if (vkQueueSubmit(computeQueue, 1, &submitInfo, nullptr) !=
VK_SUCCESS) {
3 throw std::runtime_error("failed to submit compute command
buffer!");
4 };
5 ...
6 if (vkQueueSubmit(graphicsQueue, 1, &submitInfo,
inFlightFences[currentFrame]) != VK_SUCCESS) {
7 throw std::runtime_error("failed to submit draw command
buffer!");
8 }

The first submit to the compute queue updates the particle positions using the
compute shader, and the second submit will then use that updated data to draw
the particle system.

Synchronizing graphics and compute


Synchronization is an important part of Vulkan, even more so when doing com-
pute in conjunction with graphics. Wrong or lacking synchronization may result
in the vertex stage starting to draw (=read) particles while the compute shader
hasn’t finished updating (=write) them (read-after-write hazard), or the com-
pute shader could start updating particles that are still in use by the vertex part
of the pipeline (write-after-read hazard).
So we must make sure that those cases don’t happen by properly synchronizing
the graphics and the compute load. There are different ways of doing so, de-
pending on how you submit your compute workload but in our case with two
separate submits, we’ll be using semaphores and fences to ensure that the ver-
tex shader won’t start fetching vertices until the compute shader has finished
updating them.

281
This is necessary as even though the two submits are ordered one-after-another,
there is no guarantee that they execute on the GPU in this order. Adding in
wait and signal semaphores ensures this execution order.
So we first add a new set of synchronization primitives for the compute work
in createSyncObjects. The compute fences, just like the graphics fences, are
created in the signaled state because otherwise, the first draw would time out
while waiting for the fences to be signaled as detailed here:
1 std::vector<VkFence> computeInFlightFences;
2 std::vector<VkSemaphore> computeFinishedSemaphores;
3 ...
4 computeInFlightFences.resize(MAX_FRAMES_IN_FLIGHT);
5 computeFinishedSemaphores.resize(MAX_FRAMES_IN_FLIGHT);
6
7 VkSemaphoreCreateInfo semaphoreInfo{};
8 semaphoreInfo.sType = VK_STRUCTURE_TYPE_SEMAPHORE_CREATE_INFO;
9
10 VkFenceCreateInfo fenceInfo{};
11 fenceInfo.sType = VK_STRUCTURE_TYPE_FENCE_CREATE_INFO;
12 fenceInfo.flags = VK_FENCE_CREATE_SIGNALED_BIT;
13
14 for (size_t i = 0; i < MAX_FRAMES_IN_FLIGHT; i++) {
15 ...
16 if (vkCreateSemaphore(device, &semaphoreInfo, nullptr,
&computeFinishedSemaphores[i]) != VK_SUCCESS ||
17 vkCreateFence(device, &fenceInfo, nullptr,
&computeInFlightFences[i]) != VK_SUCCESS) {
18 throw std::runtime_error("failed to create compute
synchronization objects for a frame!");
19 }
20 }

We then use these to synchronize the compute buffer submission with the graph-
ics submission:
1 // Compute submission
2 vkWaitForFences(device, 1, &computeInFlightFences[currentFrame],
VK_TRUE, UINT64_MAX);
3
4 updateUniformBuffer(currentFrame);
5
6 vkResetFences(device, 1, &computeInFlightFences[currentFrame]);
7
8 vkResetCommandBuffer(computeCommandBuffers[currentFrame],
/*VkCommandBufferResetFlagBits*/ 0);
9 recordComputeCommandBuffer(computeCommandBuffers[currentFrame]);

282
10
11 submitInfo.commandBufferCount = 1;
12 submitInfo.pCommandBuffers = &computeCommandBuffers[currentFrame];
13 submitInfo.signalSemaphoreCount = 1;
14 submitInfo.pSignalSemaphores =
&computeFinishedSemaphores[currentFrame];
15
16 if (vkQueueSubmit(computeQueue, 1, &submitInfo,
computeInFlightFences[currentFrame]) != VK_SUCCESS) {
17 throw std::runtime_error("failed to submit compute command
buffer!");
18 };
19
20 // Graphics submission
21 vkWaitForFences(device, 1, &inFlightFences[currentFrame], VK_TRUE,
UINT64_MAX);
22
23 ...
24
25 vkResetFences(device, 1, &inFlightFences[currentFrame]);
26
27 vkResetCommandBuffer(commandBuffers[currentFrame],
/*VkCommandBufferResetFlagBits*/ 0);
28 recordCommandBuffer(commandBuffers[currentFrame], imageIndex);
29
30 VkSemaphore waitSemaphores[] = {
computeFinishedSemaphores[currentFrame],
imageAvailableSemaphores[currentFrame] };
31 VkPipelineStageFlags waitStages[] = {
VK_PIPELINE_STAGE_VERTEX_INPUT_BIT,
VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT };
32 submitInfo = {};
33 submitInfo.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO;
34
35 submitInfo.waitSemaphoreCount = 2;
36 submitInfo.pWaitSemaphores = waitSemaphores;
37 submitInfo.pWaitDstStageMask = waitStages;
38 submitInfo.commandBufferCount = 1;
39 submitInfo.pCommandBuffers = &commandBuffers[currentFrame];
40 submitInfo.signalSemaphoreCount = 1;
41 submitInfo.pSignalSemaphores =
&renderFinishedSemaphores[currentFrame];
42
43 if (vkQueueSubmit(graphicsQueue, 1, &submitInfo,
inFlightFences[currentFrame]) != VK_SUCCESS) {
44 throw std::runtime_error("failed to submit draw command

283
buffer!");
45 }

Similar to the sample in the semaphores chapter, this setup will immediately
run the compute shader as we haven’t specified any wait semaphores. This is
fine, as we are waiting for the compute command buffer of the current frame
to finish execution before the compute submission with the vkWaitForFences
command.
The graphics submission on the other hand needs to wait for the compute
work to finish so it doesn’t start fetching vertices while the compute buffer
is still updating them. So we wait on the computeFinishedSemaphores
for the current frame and have the graphics submission wait on the
VK_PIPELINE_STAGE_VERTEX_INPUT_BIT stage, where vertices are consumed.
But it also needs to wait for presentation so the fragment shader won’t
output to the color attachments until the image has been presented. So we
also wait on the imageAvailableSemaphores on the current frame at the
VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT stage.

Drawing the particle system


Earlier on, we learned that buffers in Vulkan can have multiple use-cases and so
we created the shader storage buffer that contains our particles with both the
shader storage buffer bit and the vertex buffer bit. This means that we can use
the shader storage buffer for drawing just as we used “pure” vertex buffers in
the previous chapters.
We first setup the vertex input state to match our particle structure:
1 struct Particle {
2 ...
3
4 static std::array<VkVertexInputAttributeDescription, 2>
getAttributeDescriptions() {
5 std::array<VkVertexInputAttributeDescription, 2>
attributeDescriptions{};
6
7 attributeDescriptions[0].binding = 0;
8 attributeDescriptions[0].location = 0;
9 attributeDescriptions[0].format = VK_FORMAT_R32G32_SFLOAT;
10 attributeDescriptions[0].offset = offsetof(Particle,
position);
11
12 attributeDescriptions[1].binding = 0;
13 attributeDescriptions[1].location = 1;

284
14 attributeDescriptions[1].format =
VK_FORMAT_R32G32B32A32_SFLOAT;
15 attributeDescriptions[1].offset = offsetof(Particle, color);
16
17 return attributeDescriptions;
18 }
19 };

Note that we don’t add velocity to the vertex input attributes, as this is only
used by the compute shader.
We then bind and draw it like we would with any vertex buffer:
1 vkCmdBindVertexBuffers(commandBuffer, 0, 1,
&shaderStorageBuffer[currentFrame], offsets);
2
3 vkCmdDraw(commandBuffer, PARTICLE_COUNT, 1, 0, 0);

Conclusion
In this chapter, we learned how to use compute shaders to offload work from the
CPU to the GPU. Without compute shaders, many effects in modern games and
applications would either not be possible or would run a lot slower. But even
more than graphics, compute has a lot of use-cases, and this chapter only gives
you a glimpse of what’s possible. So now that you know how to use compute
shaders, you may want to take look at some advanced compute topics like:
• Shared memory
• Asynchronous compute
• Atomic operations
• Subgroups
You can find some advanced compute samples in the official Khronos Vulkan
Samples repository.
C++ code / Vertex shader / Fragment shader / Compute shader

285
FAQ

This page lists solutions to common problems that you may encounter while
developing Vulkan applications.

I get an access violation error in the core valida-


tion layer
Make sure that MSI Afterburner / RivaTuner Statistics Server is not running,
because it has some compatibility problems with Vulkan.

I don’t see any messages from the validation lay-


ers / Validation layers are not available
First make sure that the validation layers get a chance to print errors by keeping
the terminal open after your program exits. You can do this from Visual Studio
by running your program with Ctrl-F5 instead of F5, and on Linux by executing
your program from a terminal window. If there are still no messages and you
are sure that validation layers are turned on, then you should ensure that your
Vulkan SDK is correctly installed by following the “Verify the Installation” in-
structions on this page. Also ensure that your SDK version is at least 1.1.106.0
to support the VK_LAYER_KHRONOS_validation layer.

vkCreateSwapchainKHR triggers an error in


SteamOverlayVulkanLayer64.dll
This appears to be a compatibility problem in the Steam client beta. There
are a few possible workarounds: * Opt out of the Steam beta program. *
Set the DISABLE_VK_LAYER_VALVE_steam_overlay_1 environment variable
to 1 * Delete the Steam overlay Vulkan layer entry in the registry under
HKEY_LOCAL_MACHINE\SOFTWARE\Khronos\Vulkan\ImplicitLayers
Example:

286
vkCreateInstance fails with VK_ERROR_INCOMPATIBLE_DRIV
If you are using MacOS with the latest MoltenVK SDK then vkCreateInstance
may return the VK_ERROR_INCOMPATIBLE_DRIVER error. This is be-
cause Vulkan SDK version 1.3.216 or newer requires you to enable the
VK_KHR_PORTABILITY_subset extension to use MoltenVK, because it is
currently not fully conformant.
You have to add the VK_INSTANCE_CREATE_ENUMERATE_PORTABILITY_BIT_KHR
flag to your VkInstanceCreateInfo and add VK_KHR_PORTABILITY_ENUMERATION_EXTENSION_NAME
to your instance extension list.
Code example:
1 ...
2
3 std::vector<const char*> requiredExtensions;
4
5 for(uint32_t i = 0; i < glfwExtensionCount; i++) {
6 requiredExtensions.emplace_back(glfwExtensions[i]);
7 }
8
9 requiredExtensions.emplace_back(VK_KHR_PORTABILITY_ENUMERATION_EXTENSION_NAME);
10
11 createInfo.flags |= VK_INSTANCE_CREATE_ENUMERATE_PORTABILITY_BIT_KHR;
12
13 createInfo.enabledExtensionCount = (uint32_t)
requiredExtensions.size();
14 createInfo.ppEnabledExtensionNames = requiredExtensions.data();
15
16 if (vkCreateInstance(&createInfo, nullptr, &instance) != VK_SUCCESS)
{
17 throw std::runtime_error("failed to create instance!");
18 }

287

You might also like