Stuttering in Game Graphics:: Detection and Solutions
Stuttering in Game Graphics:: Detection and Solutions
Cem Cebenoyan
Director of Developer Technology, NVIDIA Corporation
1
gameworks.nvidia.com
Stuttering – A Killer to Game Experience
• When people talk to you:
– “For every few seconds, the game hitches…”
– “The framerate is high, but it doesn’t feel smooth…”
– “The animation’s choppy…”
– “The response to input lags constantly…”
– ……
2
gameworks.nvidia.com
In this talk,
• We are covering:
– Top stuttering situations in graphics pipe
– Methods to identify the root causes
– Mitigation plans
• Not covering:
– Stutters raised by disk/network IO, sound,
and things other than graphics
3
gameworks.nvidia.com
Agenda
• A quick glimpse into the top stuttering
causes
• Stutter diagnosis
4
gameworks.nvidia.com
A Quick Glimpse into the
Top Stuttering Causes
5
gameworks.nvidia.com
The Many Faces of Stutter
• Framerate hitching
– Appearance: every so often, the framerate freezes
and resumes
– Possible causes: shader compilation, resource
updating and/or vidmem paging
• Micro-stuttering
– Appearance: the frames-per-second is high, but the
overall feeling is laggy
– Possible causes: highly uneven duration of each frame
• Timing discrepancy
– Appearance: framerate is fine, but animation and
simulation are choppy
– Possible causes: incorrectly measured time interval
and frame queuing
6
gameworks.nvidia.com
Top 5 Stuttering Causes
1. Shader compilation
– The driver translates D3D assembly into machine-
level instructions, which will cause stalls
3. Resource management
– Creating, destroying & updating resources may thrash
the performance
4. Queued frames
– Uneven workload between CPU & GPU requires
buffering, but which can also raise timing issues
5. Improper queries
– Event & occlusion queries may change the default
driver behavior, and sometimes block pipeline
7
gameworks.nvidia.com
Stutter Diagnosing
8
gameworks.nvidia.com
Identify Stuttering
• Identifying stuttering is hard
– It may only reproduce on some hardware under
certain conditions
– No convenient way to capture data for analysis
9
gameworks.nvidia.com
Preliminary: CPU/GPU Communication
Driver OS Driver
App 0 command queue
submit cmd buffer
Graphics
App 1 command queue Scheduler GPU hardware queue
submit cmd buffer submit to GPU
• Frame latency
– With no queued frames, GPU works 1 frame behind CPU
11
gameworks.nvidia.com
Preliminary: WDDM
• Windows Display Driver Model
– Introduced since Vista
– Virtualized video memory, better fault-tolerance, OS
scheduled graphics task, …
12
gameworks.nvidia.com
Tools for Stutter Diagnosing
• Fraps
– Framerate recording
– Quick stats
• Nsight
– Static & dynamic analysis
®
– GPU pipeline inspection NVIDIA
TM
Nsight
• GPUView
– In-depth analysis
13
gameworks.nvidia.com
Framerate Hitching Diagnosing
• Appearance
– Every so often, the framerate freezes and resumes
16
gameworks.nvidia.com
Framerate Hitching Diagnosing (cont. 3)
Courtesy of
17 Matt Fisher
gameworks.nvidia.com
Micro-stuttering Diagnosing
• Appearance
– The frames-per-second is high, but the overall feeling
is laggy
90
80
70
60
Frame Time (s)
50
40
30
20
10
0
25 35 45 55 65 75
Time (s)
18
gameworks.nvidia.com
Micro-stuttering Diagnosing (cont. 1)
19
gameworks.nvidia.com
Micro-stuttering Diagnosing (cont. 2)
20
gameworks.nvidia.com
Micro-stuttering Diagnosing (cont. 3)
21
gameworks.nvidia.com
Timing Discrepancy Diagnosing
• Appearance
– Framerate all good, but animation and
simulation are choppy
• Possible causes
– The game engine uses incorrect time interval
for scene updating (camera, animation,
simulation, etc.)
22
gameworks.nvidia.com
Timing Discrepancy Diagnosing (cont.)
23
gameworks.nvidia.com
Causes & Solutions
24
gameworks.nvidia.com
Scenarios
• Recall the top 5 causes:
1. Shader compilation
2. Video memory oversubscription
3. Resource management
4. Queued frames
5. Improper queries
25
gameworks.nvidia.com
Shader Compilation Basics
• Why compile shaders at runtime?
– The driver needs to translate D3D assembly to
machine instructions
– Each GPU generation has drastically different
instruction set
26
gameworks.nvidia.com
Shader Compilation Basics (cont.)
28
gameworks.nvidia.com
State Dependent Recompile (cont.)
29
gameworks.nvidia.com
Shader Compilation: Mitigation
• Old methods are still good:
– At loading time, create all shaders that will be used in
the level
– Render everything in the scene with at least 1
primitive per mesh
– For dynamic streaming, render a hidden object that
contains mostly used materials at streaming time
30
gameworks.nvidia.com
Shader Compilation: Mitigation (cont.)
• For state dependent recompile:
– Group objects by dangerous states
– Avoid or reduce changing the states
– Ensure the shader is created and rendered with under
the states used most
32
gameworks.nvidia.com
Resource Management Basics (cont. 1)
Presents Presents
34
gameworks.nvidia.com
Resource Management Basics (cont. 3)
35
gameworks.nvidia.com
Resource Management: Mitigation
• General guidance 1
– Use DISCARD flag when locking/mapping
resources
• General guidance 2
– Avoid creating/destroying resource at
runtime
– Try allocating buffers at startup and reusing
them at runtime
– Before reusing a resource, issue a query to
check if GPU has finished using it
37
gameworks.nvidia.com
Resource Management: Mitigation (cont. 2)
39
gameworks.nvidia.com
Oversubscription: Mitigation (cont. 1)
40
gameworks.nvidia.com
Oversubscription: Mitigation (cont. 2)
41
gameworks.nvidia.com
Queued Frames
• The necessity of frame queuing
– Why the driver always tries to buffer more frames?
42
gameworks.nvidia.com
Dilemmas in Queued Frames
• Dilemma #1
– Limiting buffered frames to 1 can shorten input latency
– But it increases the chance of micro-stuttering, and idle
bubbles in GPU processing, meaning lower performance
• Dilemma #2
– Not limiting buffered frames can help smooth framerate
– But a bad sync point will hurt more than no buffering
43
gameworks.nvidia.com
Queued Frames: Mitigation
• Experiment on queued frames
– Adjust the “maximum pre-rendered frames”
setting in NV control panel
– Is the stuttering getting better?
44
gameworks.nvidia.com
Queued Frames: Mitigation (cont.)
45
gameworks.nvidia.com
Timing Issues
• CPU frametime vs. GPU frametime
T0 T1 T2
∆T0 ∆T1
∆t0 ∆t1
t0 t1 t2
– The game engine invokes Present at t0, t1, t2, …
– The user sees the frames at T0, T1, T2, …
– ∆t0, ∆t1 cannot be used as elapsed time for updating
since they are not the same values as ∆T0, ∆T1
46
gameworks.nvidia.com
Timing Issues (cont.)
• A couple of situations
– Using CPU frametime is fine if the frame is CPU bound
∆T0 ∆T1
∆t0 ∆t1
∆t0 ∆t1
47
gameworks.nvidia.com
Timing Issues: Mitigation
• Frametime estimation
– Straightforward way: averaging frametimes in past
several frames
– More advanced way: comparing CPU frametimes to
GPU timestamps to see the frame is CPU bound or
GPU bound, and compute a weighted result
48
gameworks.nvidia.com
Query Basics
• Asynchronized queries in D3D
– Async query introduced since D3D9 due to
GPU working behind CPU
– Spinning on retrieving query result can result
in pipeline bubble
49
gameworks.nvidia.com
Event Queries
• Event queries can be used to eliminate
queued frames in driver
– It helps to reduce input latency, but…
– It also exposes unbalanced frame-to-frame
CPU workload -> micro-stuttering
– CPU has to wait on the query return, thus
the parallelism between CPU & GPU becomes
lower -> lower performance
– The driver is unable to perform certain
optimizations without knowledge of multiple
queued frames
50
gameworks.nvidia.com
Occlusion Queries
• Occlusion queries tend to have high
latency
– The result may return after 1~3 frames
– Avoid spinning on GetData, which can cause
much worse stalls:
51
gameworks.nvidia.com
Queries: Mitigation
• Be cautious when using queries
– Make sure your use of queries is optimal and
not introducing bubbles in the pipe
– Ideally, with optimized resource
management and high framerate, you should
not be limiting queued frames with event
query.
– Efficiently using occlusion query requires a
complicated non-block system (not covered
in this talk)
52
gameworks.nvidia.com
Check Your Middleware
• Middleware is generally written in a
vacuum
– What works best in a small environment
might not scale well
• Especially check for CPU-GPU sync
points
53
gameworks.nvidia.com
Vsync, SLI & Many Other Things
54
gameworks.nvidia.com
Vsync
• Vsync is a source of micro-stuttering
– Framerate fluctuates between vsync points:
60fps, 30fps, 20fps, …
– Applications can implement customized
frame constraint system to avoid sudden
framerate change
55
gameworks.nvidia.com
SLI
• Micro-stuttering is much easier to trigger
in multi-GPU environment
– Two or more GPUs may present the rendering results
at uneven cadences
– Sync points raised by resource updating and query are
harder to cover
– Inter-GPU data transfer will place additional sync points
Q&A
58
gameworks.nvidia.com