0% found this document useful (0 votes)

1 views

DX11 Performance Tips and Tricks

Uploaded by

王宏亮

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views

DX11 Performance Tips and Tricks

Uploaded by

王宏亮

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Direct3D 11 Performance

Tips & Tricks

Holger Gruen AMD ISV Relations

Cem Cebenoyan NVIDIA ISV Relations
Agenda

 Introduction
 Shader Model 5
 Resources and Resource Views
 Multithreading
 Miscellaneous
 Q&A
Introduction

 Direct3D 11 has numerous new

features
 However these new features need
to be used wisely for good
performance
 For generic optimization advice
please refer to last year‘s talk
http://developer.amd.com/gpu_as
sets/The A to Z of DX10
Performance.pps
Shader Model 5 (1)

 Use Gather/GatherCmp() for

fast multi-channel texture fetches
 Use smaller number of RTs while still
fetching efficiently
 Store depth to FP16 alpha for SSAO
 Use Gather*() for region fetch of
alpha/depth
 Fetch 4 RGB values in just three ops
 Image post processing
Fetch 4 RGB values in
just three texture ops
SampleOp0 red0 green0 blue0 alpha0

SampleOp1 red1 green1 blue1 alpha1

SampleOp2 red2 green2 blue2 alpha2

red0 red1
green0 green1 SampleOp3 red3 green3 blue3 alpha3
blue0 blue1
alpha0 alpha1
red2 red3
green2 green3
blue2 blue3 GatherRed red2 red3 red1 red0
alpha2 alpha3 GatherGreen green2 green3 green1
GatherBlue blue2 blue3 blue1 blue0
Shader Model 5 (2)

 Use ‘Conservative Depth’ to keep

early depth rejection active for fast
depth sprites
 Output SV_DepthGreater/LessEqual
instead of SV_Depth from your PS
 Keeps early depth rejection active even
with shader-modified Z
 The hardware/driver will enforce legal
behavior
 If you write an invalid depth value it will
be clamped to the rasterized value
Depth Sprites under
Direct3D 11
Scene
Geometry
drawn first

Depth sprite
for a sphere

Direct3D 11 can fully cull this depth sprite if

SV_DepthGreaterEqual is output by the PS
Shader Model 5 (3)
 Use EvaluateAttribute*() for fast
shader AA without super sampling
 Call EvaluateAttribute*() at subpixel positions
 Allows shader AA for procedural materials
 Input SV_COVERAGE to compute a color for
each covered subsample and write average
color
 Slightly better image quality than pure MSAA
 Output SV_Coverage for MSAA alpha-test
 This feature has been around since 10.1
 EvaluateAttribute*() makes implementation
simpler
 But check if alpha to coverage gives you what
you need already, as it should be faster.
Shader Model 5 (4)

 A quick Refresher on UAVs and

Atomics
 Use PS scattering and UAVs wisely
 Use Interlocked*() Operations wisely
 See DirectCompute performance
presentation!
Shader Model 5 (5)

 Reduce stream out passes

 Addressable stream output
 Output to up to 4 streams in one pass
 All streams can have multiple elements

 Write simpler code using Geometry

shader instancing
 Use GSInstanceID instead of loop

Shader Model 5 (6)

 Force early depth-stencil testing

for your PS using [earlydepthstencil]
 Can introduce significant speedup
specifically if writing to UAVs or
AppendBuffers
 AMD‘s OIT demo uses this
 Put ‘[earlydepthstencil]’ above your
pixel shader function declaration to
enable it
Early Depth Stencil and
OIT
Projection Plane
Opaque Geometry
drawn first

Transparent Geometry
Drawn after all
opaque Geometry

A ‘[earlydepthstencil]’ pixel shader that

writes OIT color layers to a UAV only will
cull all pixels outside the purple area!
Shader Model 5 (7)

 Use the numerous new intrinsics

for faster shaders
 Fast bitops – countbits(),
reversebits() (needed in FFTs), etc.
 Conversion instructions - fp16 to fp32
and vice versa (f16to32() and f32to16())
 Faster packing/unpacking
 Fast coarse deriatives (ddx/y_coarse)
 ...

Shader Model 5 (8)

 Use Dynamic shader linkage of

subroutines wisely
 Subroutines are not free
 No cross function boundary optimizations
 Only use dynamic linkage for large
subroutines
 Avoid using a lot of small subroutines



Resources and Resource
Views (1)
 Reduce memory size and
bandwidth for more performance
 BC6 and BC7 provide new capabilities
 Very high quality, and HDR support
 All static textures should now be
compressible
BC7 image quality

Original BC1 BC7

Image Compressed Compressed
Resources and Resource
Views (2)
 Use Read-Only depth buffers to
avoid copying the depth buffer
 Direct3D 11 allows the sampling of a
depth buffer still bound for depth
testing
 Useful for deferred lighting if depth is part
of the g-buffer
 Useful for soft particles

 AMD: Using a depth buffer as a SRV

may trigger a decompression step
 Do it as late in the frame as possible
Free Threaded Resource
Creation
 Use fast Direct3D 11 asynchronous
resource creation
 In general it should just be faster and
more parallel
 Do not destroy a resource in a
frame in which it’s used
 Destroying resources would most
likely cause synchronizing events
 Avoid create-render-destroy
sequences
Display Lists (aka command lists
created from a deferred context)

 First make sure your app is multi-

threaded well
 Only use display lists if command
construction is a large enough
bottleneck
 Now consider display lists to express
parallelism in GPU command
construction
 Avoid fine grained command lists
 Drivers are already multi-threaded
Deferred Contexts

 On deferred contexts Map() or

UpdateSubResource() will use extra
memory
 Remember, all initial Maps need to use
DISCARD semantic
 Note that on a single core system a
deferred context will be slower than just
using the immediate context
 For dual core, it is also probably best to just
use the immediate context
 Don’t use Deferred Contexts unless
there is significant parallelism
Miscellaneous

 Use DrawIndirect to further lower

your CPU overhead
 Kick off instanced draw calls/dispatch
using args from a GPU written buffer
 Could use the GPU for limited scene traversal
and culling

 Use Append/Consume Buffers for

fast ’stream out‘
 Faster than GS as there are no input ordering
constraints
 One pass SO with ’unlimited‘ data amplification
Questions?

[email protected]
[email protected]

Sega Saturn Architecture: Architecture of Consoles: A Practical Analysis, #5
From Everand
Sega Saturn Architecture: Architecture of Consoles: A Practical Analysis, #5
Rodrigo Copetti
No ratings yet
Direct3D 11 Computer Shader More Generality For Advanced Techniques
No ratings yet
Direct3D 11 Computer Shader More Generality For Advanced Techniques
54 pages
Gpubbq Dx10 Perf
No ratings yet
Gpubbq Dx10 Perf
38 pages
DX10 DX9 Performance
No ratings yet
DX10 DX9 Performance
42 pages
Batch and Cull in Opengl
No ratings yet
Batch and Cull in Opengl
25 pages
Deferred Shading Optimizations
No ratings yet
Deferred Shading Optimizations
40 pages
Aaltonen Sebastian GPU Based Clay
No ratings yet
Aaltonen Sebastian GPU Based Clay
70 pages
Graphics Performance Optimization
No ratings yet
Graphics Performance Optimization
44 pages
Screen Space Fluid Rendering For Games
No ratings yet
Screen Space Fluid Rendering For Games
19 pages
Ultimate Graphics Performance for DirectX 10 Hardware
No ratings yet
Ultimate Graphics Performance for DirectX 10 Hardware
29 pages
DX12 Do's and Don'ts
No ratings yet
DX12 Do's and Don'ts
10 pages
A Brief Introduction To 3d
100% (1)
A Brief Introduction To 3d
84 pages
DirectX 10 For Techies
No ratings yet
DirectX 10 For Techies
29 pages
Practical No - 2 AIM: To Draw A Triangle Using Direct3D 11 (Buffers, Shaders and HLSL)
No ratings yet
Practical No - 2 AIM: To Draw A Triangle Using Direct3D 11 (Buffers, Shaders and HLSL)
17 pages
Graphics Card:: FPS or Frames Per Second
No ratings yet
Graphics Card:: FPS or Frames Per Second
10 pages
DX8 Overview
No ratings yet
DX8 Overview
35 pages
Amd 2018 Porting To Vulkan dx12 Adam Sawicki
No ratings yet
Amd 2018 Porting To Vulkan dx12 Adam Sawicki
45 pages
Z-Buffer Optimizations: Patrick Cozzi Analytical Graphics, Inc
No ratings yet
Z-Buffer Optimizations: Patrick Cozzi Analytical Graphics, Inc
37 pages
Siggraph2016_idTech6
No ratings yet
Siggraph2016_idTech6
58 pages
Object Representation and Ray Tracing
No ratings yet
Object Representation and Ray Tracing
10 pages
El Mansouri Jalal Rendering Rainbow Six PDF
No ratings yet
El Mansouri Jalal Rendering Rainbow Six PDF
82 pages
DirectX 11 Technology Update US
No ratings yet
DirectX 11 Technology Update US
54 pages
Object Space Lighting Rev 21
No ratings yet
Object Space Lighting Rev 21
62 pages
Section 3 - Getting D3D To Work
No ratings yet
Section 3 - Getting D3D To Work
30 pages
Evolution of Graphics API: Sujal Bista CMSC 828V
No ratings yet
Evolution of Graphics API: Sujal Bista CMSC 828V
26 pages
The Technology Behind The Elemental Demo (Unreal Engine 4)
100% (1)
The Technology Behind The Elemental Demo (Unreal Engine 4)
71 pages
DirectX 11 Technology Update US
No ratings yet
DirectX 11 Technology Update US
54 pages
03 Basic Direct3D Programming
No ratings yet
03 Basic Direct3D Programming
46 pages
Computer Graphics
No ratings yet
Computer Graphics
39 pages
Deligiannis Johannes It Just Works
No ratings yet
Deligiannis Johannes It Just Works
87 pages
Rendering Pipeline: Viewing: Geometry Processing Rendering Pixel Processing
No ratings yet
Rendering Pipeline: Viewing: Geometry Processing Rendering Pixel Processing
16 pages
Parallel Distributed Computing
No ratings yet
Parallel Distributed Computing
38 pages
Data - Parallel Algorithms On Gpus
No ratings yet
Data - Parallel Algorithms On Gpus
31 pages
Understanding The Graphics Pipeline
No ratings yet
Understanding The Graphics Pipeline
35 pages
Unite07 Optimization
No ratings yet
Unite07 Optimization
82 pages
Final Game Programming
No ratings yet
Final Game Programming
53 pages
Porting Your Title To Oculus Quest
No ratings yet
Porting Your Title To Oculus Quest
26 pages
Epic Directx 11 Nvidia Samaritian Demo
No ratings yet
Epic Directx 11 Nvidia Samaritian Demo
42 pages
Practical No - 4 Aim: To Program Diffuse Lighting Using Direct3D 11 (Lighting)
No ratings yet
Practical No - 4 Aim: To Program Diffuse Lighting Using Direct3D 11 (Lighting)
26 pages
GRK
No ratings yet
GRK
12 pages
GDC2012 Mastering DirectX11 With Unity
No ratings yet
GDC2012 Mastering DirectX11 With Unity
84 pages
[CG_ver1] 4 - Input Assembler_Vertex Processing
No ratings yet
[CG_ver1] 4 - Input Assembler_Vertex Processing
48 pages
Are We Done With Ray Tracing
No ratings yet
Are We Done With Ray Tracing
91 pages
S3032 Advanced Scenegraph Rendering Pipeline PDF
No ratings yet
S3032 Advanced Scenegraph Rendering Pipeline PDF
42 pages
Vertex & Pixel Shaders: CPS124 - Computer Graphics
No ratings yet
Vertex & Pixel Shaders: CPS124 - Computer Graphics
11 pages
Advanced DX9 Capabilities For ATI Radeon Cards
No ratings yet
Advanced DX9 Capabilities For ATI Radeon Cards
15 pages
The End of The Gpu Roadmap: Tim Sweeney CEO, Founder Epic Games
No ratings yet
The End of The Gpu Roadmap: Tim Sweeney CEO, Founder Epic Games
74 pages
OpenGL Quick Reference
No ratings yet
OpenGL Quick Reference
12 pages
Procedural Shaders
No ratings yet
Procedural Shaders
28 pages
Message
No ratings yet
Message
9 pages
Blending & Stencil
No ratings yet
Blending & Stencil
41 pages
PDF
No ratings yet
PDF
9 pages
Assignment Cover Sheet Faculty of Science and Technology
No ratings yet
Assignment Cover Sheet Faculty of Science and Technology
7 pages
Introductions and Tutorials With DirectX 9
No ratings yet
Introductions and Tutorials With DirectX 9
393 pages
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
From Everand
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
Rodrigo Copetti
No ratings yet
Nintendo DS Architecture: Architecture of Consoles: A Practical Analysis, #14
From Everand
Nintendo DS Architecture: Architecture of Consoles: A Practical Analysis, #14
Rodrigo Copetti
No ratings yet
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
From Everand
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
Joerg Christian Seubert
No ratings yet
Master System Architecture: Architecture of Consoles: A Practical Analysis, #15
From Everand
Master System Architecture: Architecture of Consoles: A Practical Analysis, #15
Rodrigo Copetti
2/5 (1)
Game and Graphics Programming for iOS and Android with OpenGL ES 2.0
From Everand
Game and Graphics Programming for iOS and Android with OpenGL ES 2.0
Romain Marucchi-Foino
No ratings yet
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
From Everand
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
Rodrigo Copetti
No ratings yet
WWW Polygonblog Com Character Rigging
No ratings yet
WWW Polygonblog Com Character Rigging
37 pages
Vgis Performance
No ratings yet
Vgis Performance
13 pages
Aveva Bocad TM
No ratings yet
Aveva Bocad TM
61 pages
The RAM Structural System V8i: 3D Viewer
No ratings yet
The RAM Structural System V8i: 3D Viewer
23 pages
ADOBE - Creative Trends 2023
No ratings yet
ADOBE - Creative Trends 2023
20 pages
Instant download The SketchUp Workflow for Architecture Modeling Buildings Visualizing Design and Creating Construction Documents with SketchUp Pro and LayOut Second Edition Michael Brightman pdf all chapter
No ratings yet
Instant download The SketchUp Workflow for Architecture Modeling Buildings Visualizing Design and Creating Construction Documents with SketchUp Pro and LayOut Second Edition Michael Brightman pdf all chapter
47 pages
?Bionic A15 Pro Extensions [ Vulkan ]
No ratings yet
?Bionic A15 Pro Extensions [ Vulkan ]
3 pages
Lynda+Cheat+Sheet+Maya+6 8
No ratings yet
Lynda+Cheat+Sheet+Maya+6 8
3 pages
Computer Graphics Lecture 1 Summary Notes_013216
No ratings yet
Computer Graphics Lecture 1 Summary Notes_013216
39 pages
CG Lesson12 (En)
No ratings yet
CG Lesson12 (En)
101 pages
VMODFlex MODFLOW-USG Tutorial PDF
No ratings yet
VMODFlex MODFLOW-USG Tutorial PDF
38 pages
Infinityset Brochure 2019
No ratings yet
Infinityset Brochure 2019
7 pages
Pipe Network Structure Rim Elevation Warning - Autodesk Community
No ratings yet
Pipe Network Structure Rim Elevation Warning - Autodesk Community
4 pages
Evermotion 44 PDF
No ratings yet
Evermotion 44 PDF
2 pages
Brief of Thesis The Impact of VR Study o
No ratings yet
Brief of Thesis The Impact of VR Study o
9 pages
Setting up Isometrics in Autodesk® AutoCAD® Plant 3D
No ratings yet
Setting up Isometrics in Autodesk® AutoCAD® Plant 3D
23 pages
Creative Tech 7 Qiii Exam 2023-2024
No ratings yet
Creative Tech 7 Qiii Exam 2023-2024
3 pages
Project Report
No ratings yet
Project Report
51 pages
Abdul's Resume
No ratings yet
Abdul's Resume
2 pages
Open Source, Experimental, and Tiny Tools Roundup
No ratings yet
Open Source, Experimental, and Tiny Tools Roundup
20 pages
Cadd 2d Module Complete 2021
No ratings yet
Cadd 2d Module Complete 2021
71 pages
Cad Question Bank
No ratings yet
Cad Question Bank
2 pages
BUDGET-OF-WORK 7 STE 3rd Q
100% (1)
BUDGET-OF-WORK 7 STE 3rd Q
2 pages
Release Notes - Vissim - 2023.00-05 - EN
No ratings yet
Release Notes - Vissim - 2023.00-05 - EN
12 pages
HKASD_BIM Guide for Facilities Upkeep_Ver3.0_Sep22
No ratings yet
HKASD_BIM Guide for Facilities Upkeep_Ver3.0_Sep22
53 pages
Open Souce Lidar Software
No ratings yet
Open Souce Lidar Software
29 pages
Archmodels Vol 137
No ratings yet
Archmodels Vol 137
23 pages
Modular 3-D-Printed Education Tool For Blind and Visually Impaired Students Oriented To Net Structures
No ratings yet
Modular 3-D-Printed Education Tool For Blind and Visually Impaired Students Oriented To Net Structures
7 pages
A Challenge in Dental Computerised Photogrammetry
No ratings yet
A Challenge in Dental Computerised Photogrammetry
13 pages
Electra 2017 PDF
No ratings yet
Electra 2017 PDF
4 pages

DX11 Performance Tips and Tricks

Uploaded by

DX11 Performance Tips and Tricks

Uploaded by

Direct3D 11 Performance

Tips & Tricks

Holger Gruen AMD ISV Relations

 Direct3D 11 has numerous new

 Use Gather*/GatherCmp*() for

SampleOp1 red1 green1 blue1 alpha1

SampleOp2 red2 green2 blue2 alpha2

 Use ‘Conservative Depth’ to keep

Direct3D 11 can fully cull this depth sprite if

 A quick Refresher on UAVs and

 Reduce stream out passes

 Write simpler code using Geometry

 Force early depth-stencil testing

A ‘[earlydepthstencil]’ pixel shader that

 Use the numerous new intrinsics

 Use Dynamic shader linkage of

Original BC1 BC7

 AMD: Using a depth buffer as a SRV

 First make sure your app is multi-

 On deferred contexts Map() or

 Use DrawIndirect to further lower

 Use Append/Consume Buffers for

You might also like

 Use Gather/GatherCmp() for