SlideShare a Scribd company logo
© Copyright Khronos Group 2014 - Page 1
OpenGL Efficiency: AZDO
Cass Everitt
OpenGL Engineer, NVIDIA
GDC, San Francisco, March 2014
© Copyright Khronos Group 2014 - Page 2
AZDO?
• Approaching Zero Driver Overhead
© Copyright Khronos Group 2014 - Page 3
Why do you care about driver overhead?
•Because driver overhead == cost
•Costs
- CPU cycles from app
- CPU cache from app
- power / battery
- GPU throughput
© Copyright Khronos Group 2014 - Page 4
OpenGL Fallacy: Old and Inefficient
Immediate
Mode Fixed
Function
Ancient
crufty stuff
Feedback
Selection
Evaluators
Display Lists
Selectors
© Copyright Khronos Group 2014 - Page 5
OpenGL Reality: Modern & Efficient
Bindless
ARB
SSBO
GL4.3
Multi-Draw
Indirect
GL4.3
UBO
GL3.1
Texture
Arrays
GL3.0
Buffer
Storage
GL4.4
© Copyright Khronos Group 2014 - Page 6
Plus, OpenGL has all the features
Compute
Tessellation
Geometry
Shaders
Sparse
Textures
Image
Load/Store
© Copyright Khronos Group 2014 - Page 7
indirect draw
buffer object
buffer object
texture object
buffer object
buffer object
texture object
buffer object
buffer object
buffer object
render target
buffer object
Classic OpenGL Model
CPU
GPU
…
Memory
cmd cmd cmdcmd
Direct Drawing Commands
(via the command fifo)
© Copyright Khronos Group 2014 - Page 8
Classic Model Pros / Cons
• Pro
- Very stable – 20+ year old code still “just works”
- Simple
- driver handles hazards, sync, allocation
- Empowered the GPU revolution
- Many classes of applications well served
• Cons
- Demanding apps are not so well served
- Intense games, VR
- Doesn’t scale with high scene complexity
- Threading model
- Hardware abstraction showing age
© Copyright Khronos Group 2014 - Page 9
Aspirational Goal
• Can we address the cons within the framework of the
existing API?
- That is, can we fix the cons without tossing the pros?
• Good question!
- As it turns out, Smart People in Khronos have actually
been working on this question for a while now
- And they’ve developed an efficient, modern OpenGL that
- Gives amazing perf improvements, and lives within the
existing framework
• And here’s what it looks like…
© Copyright Khronos Group 2014 - Page 10
indirect draw
buffer object
indirect draw
buffer object
texture object
buffer object
indirect draw
buffer object
texture object
buffer object
buffer object
buffer object
render target
buffer object
Efficient OpenGL Model
CPU
CPU
CPU
CPU
GPU
…
Memory
© Copyright Khronos Group 2014 - Page 11
CPU and GPU decoupled
CPU
CPU
CPU
CPU
GPU
…
Memory
© Copyright Khronos Group 2014 - Page 12
indirect draw
buffer object
indirect draw
buffer object
texture object
buffer object
indirect draw
buffer object
texture object
buffer object
buffer object
buffer object
render target
buffer object
CPU Writes Memory – multi-threaded (no API)!
CPU
CPU
CPU
CPU
GPU
…
Memory
© Copyright Khronos Group 2014 - Page 13
indirect draw
buffer object
indirect draw
buffer object
texture object
buffer object
indirect draw
buffer object
texture object
buffer object
buffer object
buffer object
render target
buffer object
And/Or GPU Writes Memory
CPU
CPU
CPU
CPU
GPU
…
Memory
GPU Work Creation
Still no API – the magic of communicating through memory…
© Copyright Khronos Group 2014 - Page 14
indirect draw
buffer object
indirect draw
buffer object
texture object
buffer object
indirect draw
buffer object
texture object
buffer object
buffer object
buffer object
render target
buffer object
GPU Reads Commands from Memory
CPU
CPU
CPU
CPU
GPU
…
Memory
Minimal CPU / driver involvement…
© Copyright Khronos Group 2014 - Page 15
Results
•Integer multiple speedups ~5x – ~15x
- This is not a typo
- On driver limited cases, obviously
•Works TODAY on existing drivers!
- Mostly GL4.2+
- Extensions are at least EXT
© Copyright Khronos Group 2014 - Page 16
Bonuses
• Enables scalable multi-threading with no new API
- Cores just write to memory
• Enables GPU Work Creation
- Compute job or similar
- Builds buffers, constructs MDI commands
• Does not require a new object model
• Does not require breaking existing applications
© Copyright Khronos Group 2014 - Page 17
Results
•
- This is not a typo
- On driver limited cases, obviously
•
- Mostly GL4.2+
- Extensions are at least EXT
© Copyright Khronos Group 2014 - Page 18
Results
•Integer multiple speedups ~5x – ~15x
- This is not a typo
- On driver limited cases, obviously
•Works TODAY on existing drivers!
- Mostly GL4.2+
- Extensions are at least EXT
© Copyright Khronos Group 2014 - Page 19
Results
•Integer multiple speedups ~5x – ~15x
- This is not a typo
- On driver limited cases, obviously
•Works TODAY on existing drivers!
- Mostly GL4.2+
- Extensions are at least EXT

More Related Content

What's hot (20)

PPTX
Optimizing the Graphics Pipeline with Compute, GDC 2016
Graham Wihlidal
 
PPTX
Triangle Visibility buffer
Wolfgang Engel
 
PPSX
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
AMD Developer Central
 
PDF
Lighting Shading by John Hable
Naughty Dog
 
PPTX
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
Electronic Arts / DICE
 
PPTX
Approaching zero driver overhead
Cass Everitt
 
PPT
Star Ocean 4 - Flexible Shader Managment and Post-processing
umsl snfrzb
 
PDF
Taking Killzone Shadow Fall Image Quality Into The Next Generation
Guerrilla
 
PDF
Graphics Gems from CryENGINE 3 (Siggraph 2013)
Tiago Sousa
 
PDF
Siggraph2016 - The Devil is in the Details: idTech 666
Tiago Sousa
 
PPTX
Beyond porting
Cass Everitt
 
PPTX
DirectX 11 Rendering in Battlefield 3
Electronic Arts / DICE
 
PDF
Lighting of Killzone: Shadow Fall
Guerrilla
 
PPTX
Shiny PC Graphics in Battlefield 3
Electronic Arts / DICE
 
PPT
NVIDIA's OpenGL Functionality
Mark Kilgard
 
PPT
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
repii
 
PDF
Rendering Tech of Space Marine
Pope Kim
 
PPTX
Frostbite on Mobile
Electronic Arts / DICE
 
PPT
Destruction Masking in Frostbite 2 using Volume Distance Fields
Electronic Arts / DICE
 
PPTX
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Electronic Arts / DICE
 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Graham Wihlidal
 
Triangle Visibility buffer
Wolfgang Engel
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
AMD Developer Central
 
Lighting Shading by John Hable
Naughty Dog
 
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
Electronic Arts / DICE
 
Approaching zero driver overhead
Cass Everitt
 
Star Ocean 4 - Flexible Shader Managment and Post-processing
umsl snfrzb
 
Taking Killzone Shadow Fall Image Quality Into The Next Generation
Guerrilla
 
Graphics Gems from CryENGINE 3 (Siggraph 2013)
Tiago Sousa
 
Siggraph2016 - The Devil is in the Details: idTech 666
Tiago Sousa
 
Beyond porting
Cass Everitt
 
DirectX 11 Rendering in Battlefield 3
Electronic Arts / DICE
 
Lighting of Killzone: Shadow Fall
Guerrilla
 
Shiny PC Graphics in Battlefield 3
Electronic Arts / DICE
 
NVIDIA's OpenGL Functionality
Mark Kilgard
 
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
repii
 
Rendering Tech of Space Marine
Pope Kim
 
Frostbite on Mobile
Electronic Arts / DICE
 
Destruction Masking in Frostbite 2 using Volume Distance Fields
Electronic Arts / DICE
 
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Electronic Arts / DICE
 

Similar to Gl efficiency (20)

PPT
Advanced Graphics Workshop - GFX2011
Prabindh Sundareson
 
PPTX
GFX Part 1 - Introduction to GPU HW and OpenGL ES specifications
Prabindh Sundareson
 
PPTX
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
Owen Wu
 
PDF
Commandlistsiggraphasia2014 141204005310-conversion-gate02
RubnCuesta2
 
PDF
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
AMD Developer Central
 
PDF
OpenGL ES and Mobile GPU
Jiansong Chen
 
PPT
GTC 2009 OpenGL Gold
Mark Kilgard
 
PPTX
Open Standards for Cross-Platform Gaming, Virtual & Augmented Reality | Neil ...
Jessica Tams
 
PPTX
GFX Part 7 - Introduction to Rendering Targets in OpenGL ES
Prabindh Sundareson
 
PDF
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
Edge AI and Vision Alliance
 
PPTX
What is OpenGL ?
Mohammad Hosein Nemati
 
PDF
clWrap: Nonsense free control of your GPU
John Colvin
 
PPTX
Sig13 ce future_gfx
Cass Everitt
 
PDF
"An Update on Open Standard APIs for Vision Processing," a Presentation from ...
Edge AI and Vision Alliance
 
PDF
Low Level Graphics & OpenGL
Dominic Farolino
 
PDF
CUDA by Example : Graphics Interoperability : Notes
Subhajit Sahu
 
PDF
Embedded Graphics Drivers in Mesa (ELCE 2019)
Igalia
 
PPTX
Mantle for Developers
Electronic Arts / DICE
 
PPTX
Siggraph 2016 - Vulkan and nvidia : the essentials
Tristan Lorach
 
PDF
IWOCL 2025 Write Once, Deploy Many – 3D Rendering With SYCL Cross-Vendor Supp...
Xavier Hallade
 
Advanced Graphics Workshop - GFX2011
Prabindh Sundareson
 
GFX Part 1 - Introduction to GPU HW and OpenGL ES specifications
Prabindh Sundareson
 
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
Owen Wu
 
Commandlistsiggraphasia2014 141204005310-conversion-gate02
RubnCuesta2
 
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
AMD Developer Central
 
OpenGL ES and Mobile GPU
Jiansong Chen
 
GTC 2009 OpenGL Gold
Mark Kilgard
 
Open Standards for Cross-Platform Gaming, Virtual & Augmented Reality | Neil ...
Jessica Tams
 
GFX Part 7 - Introduction to Rendering Targets in OpenGL ES
Prabindh Sundareson
 
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
Edge AI and Vision Alliance
 
What is OpenGL ?
Mohammad Hosein Nemati
 
clWrap: Nonsense free control of your GPU
John Colvin
 
Sig13 ce future_gfx
Cass Everitt
 
"An Update on Open Standard APIs for Vision Processing," a Presentation from ...
Edge AI and Vision Alliance
 
Low Level Graphics & OpenGL
Dominic Farolino
 
CUDA by Example : Graphics Interoperability : Notes
Subhajit Sahu
 
Embedded Graphics Drivers in Mesa (ELCE 2019)
Igalia
 
Mantle for Developers
Electronic Arts / DICE
 
Siggraph 2016 - Vulkan and nvidia : the essentials
Tristan Lorach
 
IWOCL 2025 Write Once, Deploy Many – 3D Rendering With SYCL Cross-Vendor Supp...
Xavier Hallade
 
Ad

Gl efficiency

  • 1. © Copyright Khronos Group 2014 - Page 1 OpenGL Efficiency: AZDO Cass Everitt OpenGL Engineer, NVIDIA GDC, San Francisco, March 2014
  • 2. © Copyright Khronos Group 2014 - Page 2 AZDO? • Approaching Zero Driver Overhead
  • 3. © Copyright Khronos Group 2014 - Page 3 Why do you care about driver overhead? •Because driver overhead == cost •Costs - CPU cycles from app - CPU cache from app - power / battery - GPU throughput
  • 4. © Copyright Khronos Group 2014 - Page 4 OpenGL Fallacy: Old and Inefficient Immediate Mode Fixed Function Ancient crufty stuff Feedback Selection Evaluators Display Lists Selectors
  • 5. © Copyright Khronos Group 2014 - Page 5 OpenGL Reality: Modern & Efficient Bindless ARB SSBO GL4.3 Multi-Draw Indirect GL4.3 UBO GL3.1 Texture Arrays GL3.0 Buffer Storage GL4.4
  • 6. © Copyright Khronos Group 2014 - Page 6 Plus, OpenGL has all the features Compute Tessellation Geometry Shaders Sparse Textures Image Load/Store
  • 7. © Copyright Khronos Group 2014 - Page 7 indirect draw buffer object buffer object texture object buffer object buffer object texture object buffer object buffer object buffer object render target buffer object Classic OpenGL Model CPU GPU … Memory cmd cmd cmdcmd Direct Drawing Commands (via the command fifo)
  • 8. © Copyright Khronos Group 2014 - Page 8 Classic Model Pros / Cons • Pro - Very stable – 20+ year old code still “just works” - Simple - driver handles hazards, sync, allocation - Empowered the GPU revolution - Many classes of applications well served • Cons - Demanding apps are not so well served - Intense games, VR - Doesn’t scale with high scene complexity - Threading model - Hardware abstraction showing age
  • 9. © Copyright Khronos Group 2014 - Page 9 Aspirational Goal • Can we address the cons within the framework of the existing API? - That is, can we fix the cons without tossing the pros? • Good question! - As it turns out, Smart People in Khronos have actually been working on this question for a while now - And they’ve developed an efficient, modern OpenGL that - Gives amazing perf improvements, and lives within the existing framework • And here’s what it looks like…
  • 10. © Copyright Khronos Group 2014 - Page 10 indirect draw buffer object indirect draw buffer object texture object buffer object indirect draw buffer object texture object buffer object buffer object buffer object render target buffer object Efficient OpenGL Model CPU CPU CPU CPU GPU … Memory
  • 11. © Copyright Khronos Group 2014 - Page 11 CPU and GPU decoupled CPU CPU CPU CPU GPU … Memory
  • 12. © Copyright Khronos Group 2014 - Page 12 indirect draw buffer object indirect draw buffer object texture object buffer object indirect draw buffer object texture object buffer object buffer object buffer object render target buffer object CPU Writes Memory – multi-threaded (no API)! CPU CPU CPU CPU GPU … Memory
  • 13. © Copyright Khronos Group 2014 - Page 13 indirect draw buffer object indirect draw buffer object texture object buffer object indirect draw buffer object texture object buffer object buffer object buffer object render target buffer object And/Or GPU Writes Memory CPU CPU CPU CPU GPU … Memory GPU Work Creation Still no API – the magic of communicating through memory…
  • 14. © Copyright Khronos Group 2014 - Page 14 indirect draw buffer object indirect draw buffer object texture object buffer object indirect draw buffer object texture object buffer object buffer object buffer object render target buffer object GPU Reads Commands from Memory CPU CPU CPU CPU GPU … Memory Minimal CPU / driver involvement…
  • 15. © Copyright Khronos Group 2014 - Page 15 Results •Integer multiple speedups ~5x – ~15x - This is not a typo - On driver limited cases, obviously •Works TODAY on existing drivers! - Mostly GL4.2+ - Extensions are at least EXT
  • 16. © Copyright Khronos Group 2014 - Page 16 Bonuses • Enables scalable multi-threading with no new API - Cores just write to memory • Enables GPU Work Creation - Compute job or similar - Builds buffers, constructs MDI commands • Does not require a new object model • Does not require breaking existing applications
  • 17. © Copyright Khronos Group 2014 - Page 17 Results • - This is not a typo - On driver limited cases, obviously • - Mostly GL4.2+ - Extensions are at least EXT
  • 18. © Copyright Khronos Group 2014 - Page 18 Results •Integer multiple speedups ~5x – ~15x - This is not a typo - On driver limited cases, obviously •Works TODAY on existing drivers! - Mostly GL4.2+ - Extensions are at least EXT
  • 19. © Copyright Khronos Group 2014 - Page 19 Results •Integer multiple speedups ~5x – ~15x - This is not a typo - On driver limited cases, obviously •Works TODAY on existing drivers! - Mostly GL4.2+ - Extensions are at least EXT