SlideShare a Scribd company logo
v4DirectX 11 Rendering in Battlefield 3Johan Andersson, Rendering Architect, DICE
AgendaOverviewFeature:Deferred ShadingCompute Shader Tile-Based LightingTerrain Displacement MappingDirect Stereo 3D renderingQuality:Antialiasing: MSAAAntialiasing: FXAAAntialiasing: SRAATransparency SupersamplingPerformance:InstancingParallel dispatchMulti-GPUResource StreamingConclusionsQ & A
overview
Battlefield 3FPSFall 2011DX11, Xbox 360 and PS3Frostbite 2 engine
Frostbite 2Developed for Battlefield 3 and future DICE/EA gamesMassive focus on creating simple to use & powerful workflowsMajor pushes in animation, destruction, streaming, rendering, lightingand landscapes
DX11DX11 API onlyRequires a DX10 or DX11 GPUsRequires Vista SP1 or Windows 7No Windows XP!Why?CPU performance winGPU performance & quality winEase ofdevelopment - no legacyFuture proofBF3 is a big title - will drive OS & HW adoptionWhich is good for your game as well! 
Options for renderingSwitched to Deferred Shading in FB2Rich mix of Outdoor + Indoor + Urban environments in BF3Wanted lots more light sourcesWhy not Forward Rendering?Light culling / shader permutations not efficient for usExpensive & more difficultdecaling / destruction maskingWhy not Light Pre-pass?2x geometry pass too expensive on both CPU & GPU for usWas able to generalize our BRDF enough to just a few variations Saw major potential in full tile-based deferred shadingSee also:Nicolas Thibieroz’s talk ”Deferred Shading Optimizations”
Deferred ShadingWeaknesses with traditional deferred lighting/shading:Massive overdraw & ROP cost when having lots of big light sourcesExpensive to have multiple per-pixel materials in light shadersMSAA lighting can be slow (non-coherent, extra BW)
features
Tile-based Deferred Shading1. Divide screen into tiles and determine which lights affects which tiles2. Only apply the visible light sources on pixels Custom shader with multiple lightsReduced bandwidth & setup costHow can we do this best in DX11?
Lighting with Compute ShaderTile-based Deferred Shading using Compute ShadersPrimarily for analytical light sourcesPoint lights, cone lights, line lightsNo shadowsRequires Compute Shader 5.0Hybrid Graphics/Compute shading pipeline:Graphics pipeline rasterizes gbuffers for opaque surfacesCompute pipeline uses gbuffers, culls lights, computes lighting & combines with shadingGraphics pipeline renders transparent surfaces on top
CS requirements & setup1 thread per pixel, 16x16 thread groups (aka tile)Input: gbuffers, depth buffer & list oflightsOutput: fully composited & lit HDR textureNormalRoughnessTexture2D<float4> gbufferTexture0 : register(t0);Texture2D<float4> gbufferTexture1 : register(t1);Texture2D<float4> gbufferTexture2 : register(t2);Texture2D<float4> depthTexture : register(t3);RWTexture2D<float4> outputTexture : register(u0);#define BLOCK_SIZE 16[numthreads(BLOCK_SIZE,BLOCK_SIZE,1)]void csMain(    uint3 groupId : SV_GroupID,    uint3 groupThreadId : SV_GroupThreadID,    uint groupIndex: SV_GroupIndex,    uint3 dispatchThreadId : SV_DispatchThreadID){    ...}Diffuse AlbedoSpecular Albedo
groupshared uint minDepthInt;groupshared uint maxDepthInt;// --- globals above, function below -------float depth =       depthTexture.Load(uint3(texCoord, 0)).r;uint depthInt = asuint(depth);minDepthInt = 0xFFFFFFFF;maxDepthInt = 0;GroupMemoryBarrierWithGroupSync();InterlockedMin(minDepthInt, depthInt);InterlockedMax(maxDepthInt, depthInt);GroupMemoryBarrierWithGroupSync();float minGroupDepth = asfloat(minDepthInt);float maxGroupDepth = asfloat(maxDepthInt);CS steps 1-21. Load gbuffers & depth2. Calculate min & max z in threadgroup / tileUsing InterlockedMin/Max on groupshared variableAtomics only work on ints Can cast float to int (z is always +)
CS step 3 – CullingDetermine visible light sources for each tileCull all light sources against tile frustumInput (global): Light list, frustum& SW occlusionculledOutput (per tile):# of visible light sourcesIndex list of visible light sourcesPer-tile visible light count(black = 0 lights, white = 40)
struct Light {    float3 pos; float sqrRadius;    float3 color; float invSqrRadius;};int lightCount;StructuredBuffer<Light> lights;groupshared uint visibleLightCount = 0;groupshared uint visibleLightIndices[1024];// --- globals above, cont. function below ---uint threadCount = BLOCK_SIZE*BLOCK_SIZE; uint passCount = (lightCount+threadCount-1) / threadCount;for (uint passIt = 0; passIt < passCount; ++passIt) {    uint lightIndex = passIt*threadCount + groupIndex;    // prevent overrun by clamping to a last ”null” light    lightIndex = min(lightIndex, lightCount);     if (intersects(lights[lightIndex], tile))    {        uint offset;        InterlockedAdd(visibleLightCount, 1, offset);        visibleLightIndices[offset] = lightIndex;    }	}GroupMemoryBarrierWithGroupSync();3a. Each thread switches to process lights instead of pixelsWow, parallelism switcharoo!256 light sources in parallel Multiple iterations for >256 lights3b. Intersect light and tileMultiple variants – accuracy vsperfTile min & max z is used as a ”depth bounds” test3c. Append visible light indices to listAtomic add to threadgroup shared memory”inlined stream compaction”3d. Switch back to processing pixelsSynchronize the thread groupWe now know which light sources affect the tileCS step 3 – Impl
CS deferred shading final steps4. For each pixel, accumulate lighting from visible lightsRead from tile visible light index list in groupshared memory5. Combine lighting & shadingalbedosOutput is non-MSAA HDR textureRender transparent surfaces on topComputed lightingfloat3 color = 0;for (uintlightIt = 0; lightIt < visibleLightCount; ++lightIt){uintlightIndex = visibleLightIndices[lightIt];Lightlight = lights[lightIndex];		    color += diffuseAlbedo * evaluateLightDiffuse(light, gbuffer);    color += specularAlbedo * evaluateLightSpecular(light, gbuffer);}
Massive lighting test scene - 1000 large point lightsGDC 2011 test content
MSAA Compute Shader LightingOnly edge pixels need full per-sample lightingBut edges have bad screen-space coherency! InefficientCompute Shader can build efficient coherent pixel listEvaluate lighting for each pixel (sample 0)Determine if pixel requires per-sample lightingIf so, add to atomic list in shared memoryWhen all pixels are done, synchronizeGo through and light sample 1-3 for pixels in listMajor performance improvement!Described in detail in [Lauritzen10]
Terrain rendering
Terrain renderingBattlefield 3 terrainsHuge area & massive view distancesDynamic destructible heightfieldsProcedural virtual texturingStreamed heightfields, colormaps, masksFull details at at a later conferenceWe stream in source heightfield data at close to pixel ratioDerive high-quality per-pixel normals in shaderHow can we increase detail even further on DX11?Create better silhouettes and improved depthKeep small-scale detail (dynamic craters & slopes)
>
Normal mapped terrain
Displacement mapped terrain
Terrain Displacement MappingStraight high-res heightfields, no procedural detailLots of data in the heightfieldsPragmatic & simple choiceNo changes in physics/collisionsNo content changes, artists see the true detail they createdUses DX11 fixed edge tessellation factorsStable, no swimming vertices Though can be wastefulHeight morphing for streaming by fetching 2 heightfields in domain shader & blend based on patch CLOD factorMore work left to figure optimal tessellation scheme for our use case
Stereo 3D rendering in DX11Nvidia’s 3D Visiondrivers is a good and almost automatic stereo 3D rendering method But only for forward rendering, doesn’t work with deferred shadingOr on AMD or Intel GPUsTransparent surfaces do not get proper 3D depthWe instead use explicit 3D stereo renderingRender unique frame for each eyeWorks with deferred shading & includes all surfacesHigher performance requirements, 2x draw callsWorks with Nvidia’s 3D Vision and AMD’s HD3DSimilar to OpenGL quad buffer supportAsk your friendly IHV contact how
PERFORMANCE
InstancingDraw calls can still be major performance bottleneckLots of materials / lots of variationComplex shadowmapsHigh detail / long view distancesFull 3D stereo renderingBattlefield have lots of use cases for heavy instancingProps, foliage, debris, destruction, mesh particlesBatching submissions is still important, just as before!Richard Huddy:”Batch batch batch!”
Instancing in DirectXDX9-style stream instancing is good, but restrictiveExtra vertex attributes, GPU overheadCan’t be (efficiently) combined with skinningUsed primarily for tiny meshes (particles, foliage)DX10/DX11 brings support for shader Buffer objectsVertex shaders have access to SV_InstanceIDCan do completely arbitrary loads, not limited to fixed elementsCan support per-instance arrays and other data structures!Let’s rethink how instancing can be implemented..
Instancing dataMultiple object typesRigid / skinned / composite meshesMultiple object lighting typesSmall/dynamic: light probesLarge/static: light mapsDifferent types of instancing data we haveTransform 		float4x3Skinning transforms 	float4x3 arraySH light probe 		float4 x 4Lightmap UV scale/offset 	float4Let’s pack all instancing data into single  big buffer!
Buffer<float4> instanceVectorBuffer : register(t0);cbuffera{    float g_startVector;float g_vectorsPerInstance;}VsOutputmain(    // ....    uint instanceId : SV_InstanceId){    uint worldMatrixVectorOffset = g_startVector + input.instanceId * g_vectorsPerInstance + 0;        uint probeVectorOffset       = g_startVector + input.instanceId * g_vectorsPerInstance + 3;    float4 r0 = instanceVectorBuffer.Load(worldMatrixVectorOffset + 0);    float4 r1 = instanceVectorBuffer.Load(worldMatrixVectorOffset + 1);    float4 r2 = instanceVectorBuffer.Load(worldMatrixVectorOffset + 2);    float4 lightProbeShR = instanceVectorBuffer.Load(probeVectorOffset + 0);    float4 lightProbeShG = instanceVectorBuffer.Load(probeVectorOffset + 1);    float4 lightProbeShB = instanceVectorBuffer.Load(probeVectorOffset + 2);    float4 lightProbeShO = instanceVectorBuffer.Load(probeVectorOffset + 3);    // ....}Instancing example: transform + SH
half4 weights = input.boneWeights;int4 indices = (int4)input.boneIndices;float4 skinnedPos =  mul(float4(pos,1), getSkinningMatrix(indices[0])).xyz * weights[0];skinnedPos += mul(float4(pos,1), getSkinningMatrix(indices[1])).xyz * weights[1];skinnedPos += mul(float4(pos,1), getSkinningMatrix(indices[2])).xyz * weights[2];skinnedPos += mul(float4(pos,1), getSkinningMatrix(indices[3])).xyz * weights[3];// ...float4x3 getSkinningMatrix(uintboneIndex){uintvectorOffset = g_startVector + instanceId * g_vectorsPerInstance;vectorOffset += boneIndex*3;    float4 r0 = instanceVectorBuffer.Load(vectorOffset + 0);    float4 r1 = instanceVectorBuffer.Load(vectorOffset + 1);    float4 r2 = instanceVectorBuffer.Load(vectorOffset + 2);return createMat4x3(r0, r1, r2);}Instancing example: skinning
Instancing benefitsSingle draw call per object type instead of per instanceMinor GPU hit for big CPU gainInstancing does not break when skinning partsMore deterministic & better overall performanceEnd result is typically 1500-2000 draw callsRegardless of how many object instances the artists place!Instead of 3000-7000 draw calls in some heavy cases
Parallel Dispatch in TheoryGreat key DX11 feature!Improve performance by scaling dispatching to D3D to more coresReduce frame latencyHow we use it:DX11 deferred context per HW threadRenderer builds list of all draw calls we want to do for each rendering ”layer” of the frameSplit draw calls for each layer into chunks of ~256 Dispatch chunks in parallel to the deferred contexts to generate command listsRender to immediate context & execute command listsProfit! ** but theory != practice
Parallel Dispatch in PracticeStill no performant drivers available for our use case Have waited for 2 years and still areBig driver codebases takes time to refactorIHVs vs Microsoft quagmireHeavy driver threads collide with game threadsHow it should work (an utopia?)Driver does not create any processing threads of its ownGame submits workload in parallel to multiple deferred contextsDriver make sure almost all processing required happens on the draw call on the deferred contextGame dispatches command list on immediate context, driver does absolute minimal work with itStill good to design engine for + instancing is great!quagmire [ˈkwægˌmaɪə ˈkwɒg-]n1. (Earth Sciences / Physical Geography) a soft wet area of land that gives way under the feet; bog2. an awkward, complex, or embarrassing situation
Resource streamingEven with modern GPUs with lots of memory, resource streaming is often requiredCan’t require 1+ GB graphics cardsBF3 levels have much more than 1 GB of textures & meshesReduced load timesBut creating & destroying DX resources in-frame has never been a good thingCan cause non-deterministic & large driver / OS stalls Has been a problem for a very long time in DXAbout time to fix it
DX11 Resource StreamingHave worked with Microsoft, Nvidia & AMD to make sure we can do stall free async resource streaming of GPU resources in DX11Want neither CPU nor GPU perf hitKey foundation: DX11 concurrent createsResource creation flow:Streaming system determines resources toload (texturemipmaps or meshLODs)Add up DX resource creation on to queue on our own separate low-priority threadThread creates resources using initial data, signals streaming systemResource created, game starts using itEnablesasync stall-free DMA in drivers!Resource destruction flow:Streaming system deletes D3D resourceDriver keeps it internally alive until GPU frames using it are done. NO STALL!D3D11_FEATURE_DATA_THREADING threadingCaps;FB_SAFE_DX(m_device->CheckFeatureSupport(   D3D11_FEATURE_THREADING,   &threadingCaps, sizeof(threadingCaps)));if (threadingCaps.DriverConcurrentCreates)
Multi-GPUEfficiently supporting Crossfire and SLI is important for usHigh-end consumers expect itIHVs expect it (and can help!)Allows targeting higher-end HW then currently available during devAFR is easy: Do not reuse GPU resources from previous frame!UpdateSubResourceis easy & robust to use for dynamic resources, but not idealAll of our playtests run with exe named AFR-FriendlyD3D.exe Disables all driver AFR synchronization workaroundsRather find corruption during dev then have bad perfForceSingleGPU.exe is also useful to track down issues
quality
AntialiasingReducing aliasing is one of our key visual prioritiesCreates a more smooth gameplay experienceExtra challenging goal due to deferred shadingWe use multiple methods:MSAA –Multisample AntialiasingFXAA – Fast Approximate AntialiasingSRAA – Sub-pixel Reconstruction AntialiasingTSAA – Transparency Supersampling AntialiasingAliasing
MSAAOur solution: Deferred geometry pass renders with MSAA (2x, 4x or 8x)Light shaders evaluate per-sample (when needed), averages the samples and writes out per-pixelTransparent surfaces rendered on top without MSAA1080p gbuffer+z with 4x MSAA is 158 MBLots of memory and lots of bandwidth Could be tiled to reduce memory usageVery nice quality though Our (overall) highest quality optionBut not fast enough for more GPUsNeed additional solution(s)..
FXAA”Fast Approximate Antialiasing”GPU-based MLAA implementation by Timothy Lottes (Nvidia)Multiple quality options~1.3 ms/f for 1080p on Geforce 580Pros & cons:Superb antialiased long edges! Smooth overall picture Reasonably fast Moving pictures do not benefit as much ”Blurry aliasing” Will be released here at GDC’11 Part of Nvidia’s example SDK
SRAA”Sub-pixel Reconstruction Antialiasing”Presented at I3D’11 2 weeks ago [Chajdas11]Use 4x MSAA buffers to improve reconstructionMultiple variants:MSAA depth bufferMSAA depth buffer + normal bufferMSAA Primitive ID / spatial hashing bufferPros:Better at capturing small scale detail Less ”pixel snapping” than MLAA variants Cons:Extra MSAA z/normal/id pass can be prohibitive Integration not as easy due to extra pass 
DirectX 11 Rendering in Battlefield 3
No antialiasing
4x MSAA
4x SRAA, depth-only
4x SRAA, depth+normal
FXAA
No antialiasing
MSAA Sample CoverageNone of the AA solutions can solve all aliasingFoliage & other alpha-tested surfaces are extra difficult casesUndersampled geometry, requires sub-samples DX 10.1 added SV_COVERAGE as a pixel shader outputDX 11 added SV_COVERAGE as a pixel shader inputWhat does this mean?We get full programmable control over the coverage maskNo need to waste the alpha channel output (great for deferred)We can do partial supersampling on alpha-tested surfaces!
Transparency Supersamplingstatic const float2 msaaOffsets[4] ={    float2(-0.125, -0.375),    float2(0.375, -0.125),    float2(-0.375,  0.125),    float2(0.125,  0.375)};void psMain(   out float4 color : SV_Target,    out uint coverage : SV_Coverage){    float2 texCoord_ddx = ddx(texCoord);    float2 texCoord_ddy = ddy(texCoord);    coverage = 0;    [unroll]    for (int i = 0; i < 4; ++i)    {        float2 texelOffset = msaaOffsets[i].x * texCoord_ddx;        texelOffset +=       msaaOffsets[i].y * texCoord_ddy;        float4 temp = tex.SampleLevel(sampler, texCoord + texelOffset);        if (temp.a >= 0.5)            coverage |= 1<<i;    }}Shade per-pixel but evaluate alpha test per-sampleWrite out coverage bitmaskMSAA offsets are defined in DX 10.1Requires shader permutation for each MSAA levelGradients still quite limitedBut much better than standard MSAA! Can combine with screen-space ditherSee also:DirectX SDK ’TransparencyAA10.1GDC’09 STALKER talk [Lobanchikov09]
Alpha testing4x MSAA + Transparency Supersampling
ConclusionsDX11 is here – in force2011 is a great year to focus on DX112012 will be a great year for more to drop DX9We’ve found lots & lots of quality & performance enhancing features using DX11And so will you for your game!Still have only started, lots of potentialTake advantage of the PC strengths, don’t hold it backBig end-user valueGood preparation for Gen4
ThanksChristina Coffin (@ChristinaCoffin)Mattias WidmarkKenny MagnussonColin Barré-Brisebois (@ZigguratVertigo)Timothy Lottes (@TimothyLottes)Matthäus G. Chajdas (@NIV_Anteru)Miguel SainzNicolas ThibierozBattlefield teamFrostbite teamMicrosoftNvidiaAMDIntel
Questions?Email: repi@dice.seBlog:     http://repi.seTwitter: @repiBattlefield 3 & Frostbite 2 talks at GDC’11:For more DICE talks: http://publications.dice.se
References[Lobanchikov09] Igor A. Lobanchikov, ”GSC Game World’s STALKER: Clear Sky – a showcase for Direct3D 10.0/1” GDC’09. http://developer.amd.com/gpu_assets/01gdc09ad3ddstalkerclearsky210309.ppt[Lauritzen10] Andrew Lauritzen, ”Deferred Rendering for Current and Future Rendering Pipelines” SIGGRAPH’10 http://bps10.idav.ucdavis.edu/[Chajdas11] Matthäus G. Chajdas et al ”Subpixel Reconstruction Antialiasing for Deferred Shading.”. I3D’11

More Related Content

What's hot (20)

PPTX
Rendering Technologies from Crysis 3 (GDC 2013)
Tiago Sousa
 
PPTX
Stochastic Screen-Space Reflections
Electronic Arts / DICE
 
PPSX
Dx11 performancereloaded
mistercteam
 
PDF
Siggraph2016 - The Devil is in the Details: idTech 666
Tiago Sousa
 
PDF
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
Philip Hammer
 
PDF
Screen Space Reflections in The Surge
Michele Giacalone
 
PDF
Taking Killzone Shadow Fall Image Quality Into The Next Generation
Guerrilla
 
PDF
Dissecting the Rendering of The Surge
Philip Hammer
 
PPTX
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
AMD Developer Central
 
PPT
Secrets of CryENGINE 3 Graphics Technology
Tiago Sousa
 
PDF
OpenGL 4.4 - Scene Rendering Techniques
Narann29
 
PPTX
Physically Based and Unified Volumetric Rendering in Frostbite
Electronic Arts / DICE
 
PPSX
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
AMD Developer Central
 
PPTX
The Rendering Technology of Killzone 2
Guerrilla
 
PPTX
[1023 박민수] 깊이_버퍼_그림자_1
MoonLightMS
 
PPTX
Hable John Uncharted2 Hdr Lighting
ozlael ozlael
 
PPTX
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
Electronic Arts / DICE
 
PPTX
Calibrating Lighting and Materials in Far Cry 3
stevemcauley
 
PPTX
Triangle Visibility buffer
Wolfgang Engel
 
PPT
Destruction Masking in Frostbite 2 using Volume Distance Fields
Electronic Arts / DICE
 
Rendering Technologies from Crysis 3 (GDC 2013)
Tiago Sousa
 
Stochastic Screen-Space Reflections
Electronic Arts / DICE
 
Dx11 performancereloaded
mistercteam
 
Siggraph2016 - The Devil is in the Details: idTech 666
Tiago Sousa
 
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
Philip Hammer
 
Screen Space Reflections in The Surge
Michele Giacalone
 
Taking Killzone Shadow Fall Image Quality Into The Next Generation
Guerrilla
 
Dissecting the Rendering of The Surge
Philip Hammer
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
AMD Developer Central
 
Secrets of CryENGINE 3 Graphics Technology
Tiago Sousa
 
OpenGL 4.4 - Scene Rendering Techniques
Narann29
 
Physically Based and Unified Volumetric Rendering in Frostbite
Electronic Arts / DICE
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
AMD Developer Central
 
The Rendering Technology of Killzone 2
Guerrilla
 
[1023 박민수] 깊이_버퍼_그림자_1
MoonLightMS
 
Hable John Uncharted2 Hdr Lighting
ozlael ozlael
 
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
Electronic Arts / DICE
 
Calibrating Lighting and Materials in Far Cry 3
stevemcauley
 
Triangle Visibility buffer
Wolfgang Engel
 
Destruction Masking in Frostbite 2 using Volume Distance Fields
Electronic Arts / DICE
 

Similar to DirectX 11 Rendering in Battlefield 3 (20)

PPTX
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
repii
 
PPT
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
repii
 
PPTX
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
Electronic Arts / DICE
 
PDF
Gdc2011 direct x 11 rendering in battlefield 3
drandom
 
PPTX
A Bizarre Way to do Real-Time Lighting
Steven Tovey
 
PPT
Bending the Graphics Pipeline
Electronic Arts / DICE
 
PPTX
Real-time lightmap baking
Rosario Leonardi
 
PPTX
Summer Games University - Day 3
Clemens Kern
 
PPT
A Bit More Deferred Cry Engine3
guest11b095
 
PPT
NVIDIA's OpenGL Functionality
Mark Kilgard
 
PPT
CS 354 GPU Architecture
Mark Kilgard
 
PPT
Advanced Lighting for Interactive Applications
stefan_b
 
PDF
Deferred shading
ozlael ozlael
 
PPT
Geometry Shader-based Bump Mapping Setup
Mark Kilgard
 
PPT
Your Game Needs Direct3D 11, So Get Started Now!
repii
 
PDF
Hpg2011 papers kazakov
mistercteam
 
PPT
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
repii
 
PPT
Shadow Volumes on Programmable Graphics Hardware
stefan_b
 
PPSX
Oit And Indirect Illumination Using Dx11 Linked Lists
Holger Gruen
 
PPT
NVIDIA Graphics, Cg, and Transparency
Mark Kilgard
 
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
repii
 
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
repii
 
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
Electronic Arts / DICE
 
Gdc2011 direct x 11 rendering in battlefield 3
drandom
 
A Bizarre Way to do Real-Time Lighting
Steven Tovey
 
Bending the Graphics Pipeline
Electronic Arts / DICE
 
Real-time lightmap baking
Rosario Leonardi
 
Summer Games University - Day 3
Clemens Kern
 
A Bit More Deferred Cry Engine3
guest11b095
 
NVIDIA's OpenGL Functionality
Mark Kilgard
 
CS 354 GPU Architecture
Mark Kilgard
 
Advanced Lighting for Interactive Applications
stefan_b
 
Deferred shading
ozlael ozlael
 
Geometry Shader-based Bump Mapping Setup
Mark Kilgard
 
Your Game Needs Direct3D 11, So Get Started Now!
repii
 
Hpg2011 papers kazakov
mistercteam
 
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
repii
 
Shadow Volumes on Programmable Graphics Hardware
stefan_b
 
Oit And Indirect Illumination Using Dx11 Linked Lists
Holger Gruen
 
NVIDIA Graphics, Cg, and Transparency
Mark Kilgard
 
Ad

More from Electronic Arts / DICE (20)

PPTX
GDC2019 - SEED - Towards Deep Generative Models in Game Development
Electronic Arts / DICE
 
PPT
SIGGRAPH 2010 - Style and Gameplay in the Mirror's Edge
Electronic Arts / DICE
 
PDF
SEED - Halcyon Architecture
Electronic Arts / DICE
 
PDF
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
Electronic Arts / DICE
 
PPTX
Khronos Munich 2018 - Halcyon and Vulkan
Electronic Arts / DICE
 
PDF
CEDEC 2018 - Towards Effortless Photorealism Through Real-Time Raytracing
Electronic Arts / DICE
 
PPTX
CEDEC 2018 - Functional Symbiosis of Art Direction and Proceduralism
Electronic Arts / DICE
 
PPTX
SIGGRAPH 2018 - PICA PICA and NVIDIA Turing
Electronic Arts / DICE
 
PPTX
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
Electronic Arts / DICE
 
PPTX
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
Electronic Arts / DICE
 
PDF
EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...
Electronic Arts / DICE
 
PDF
DD18 - SEED - Raytracing in Hybrid Real-Time Rendering
Electronic Arts / DICE
 
PDF
Creativity of Rules and Patterns: Designing Procedural Systems
Electronic Arts / DICE
 
PPTX
Shiny Pixels and Beyond: Real-Time Raytracing at SEED
Electronic Arts / DICE
 
PPTX
Future Directions for Compute-for-Graphics
Electronic Arts / DICE
 
PPTX
Physically Based Sky, Atmosphere and Cloud Rendering in Frostbite
Electronic Arts / DICE
 
PPTX
High Dynamic Range color grading and display in Frostbite
Electronic Arts / DICE
 
PPTX
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
Electronic Arts / DICE
 
PPTX
Lighting the City of Glass
Electronic Arts / DICE
 
PPTX
Photogrammetry and Star Wars Battlefront
Electronic Arts / DICE
 
GDC2019 - SEED - Towards Deep Generative Models in Game Development
Electronic Arts / DICE
 
SIGGRAPH 2010 - Style and Gameplay in the Mirror's Edge
Electronic Arts / DICE
 
SEED - Halcyon Architecture
Electronic Arts / DICE
 
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
Electronic Arts / DICE
 
Khronos Munich 2018 - Halcyon and Vulkan
Electronic Arts / DICE
 
CEDEC 2018 - Towards Effortless Photorealism Through Real-Time Raytracing
Electronic Arts / DICE
 
CEDEC 2018 - Functional Symbiosis of Art Direction and Proceduralism
Electronic Arts / DICE
 
SIGGRAPH 2018 - PICA PICA and NVIDIA Turing
Electronic Arts / DICE
 
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
Electronic Arts / DICE
 
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
Electronic Arts / DICE
 
EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...
Electronic Arts / DICE
 
DD18 - SEED - Raytracing in Hybrid Real-Time Rendering
Electronic Arts / DICE
 
Creativity of Rules and Patterns: Designing Procedural Systems
Electronic Arts / DICE
 
Shiny Pixels and Beyond: Real-Time Raytracing at SEED
Electronic Arts / DICE
 
Future Directions for Compute-for-Graphics
Electronic Arts / DICE
 
Physically Based Sky, Atmosphere and Cloud Rendering in Frostbite
Electronic Arts / DICE
 
High Dynamic Range color grading and display in Frostbite
Electronic Arts / DICE
 
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
Electronic Arts / DICE
 
Lighting the City of Glass
Electronic Arts / DICE
 
Photogrammetry and Star Wars Battlefront
Electronic Arts / DICE
 
Ad

Recently uploaded (16)

PDF
Aaron Ace Christian Reveals 5 Keys to Succeed in Music
Aaron Ace Christian
 
PPTX
Most Played Songs on YouTube 2025 Edition.pptx
Marco Wilson
 
PPTX
Hunka Hunka Burnin’ Love Prototype Website 2013
Cheryl M
 
PDF
Embracing the Morning Coffee Rituals Today
Raj Kumble
 
PPT
How Bokep Indo Impacts Relationships and Social Norms in Indonesia
roohinxt
 
PDF
Exploring the Influence of Biopics in Modern Movies
Roshan Rai
 
PDF
Blake and Mortimer 01 The Secret of the Swordfish
SuperDrive16
 
PPTX
英国硕士毕业证利兹三一大学学位证书LTU学生卡定做
Taqyea
 
PPTX
Pillars of Sustainability Infographipptx
ramosjohnedrian9
 
PPTX
Maypole Dance Presentation an English tradition.pptx
IrvingAlexisHernndez
 
PPTX
Fun Friday for corporate virtual fridays
chandana94smiles
 
PPTX
一比一原版(UDC毕业证)拉科鲁尼亚大学毕业证如何办理
Taqyea
 
PDF
***Grim Reaper in Love June 2025 boards***
AlexisBrooks20
 
PPTX
HARASSMENT IN WORKPLACE FOR DEEP KNOWLEDGE
ashishmalikp2426
 
PPTX
ANKITA PPT FINAL.pptx....................
PriyalSharma25
 
PDF
Mommy J. At San Vicente Ward (Group 1)).pdf
MitsuriKanroji22
 
Aaron Ace Christian Reveals 5 Keys to Succeed in Music
Aaron Ace Christian
 
Most Played Songs on YouTube 2025 Edition.pptx
Marco Wilson
 
Hunka Hunka Burnin’ Love Prototype Website 2013
Cheryl M
 
Embracing the Morning Coffee Rituals Today
Raj Kumble
 
How Bokep Indo Impacts Relationships and Social Norms in Indonesia
roohinxt
 
Exploring the Influence of Biopics in Modern Movies
Roshan Rai
 
Blake and Mortimer 01 The Secret of the Swordfish
SuperDrive16
 
英国硕士毕业证利兹三一大学学位证书LTU学生卡定做
Taqyea
 
Pillars of Sustainability Infographipptx
ramosjohnedrian9
 
Maypole Dance Presentation an English tradition.pptx
IrvingAlexisHernndez
 
Fun Friday for corporate virtual fridays
chandana94smiles
 
一比一原版(UDC毕业证)拉科鲁尼亚大学毕业证如何办理
Taqyea
 
***Grim Reaper in Love June 2025 boards***
AlexisBrooks20
 
HARASSMENT IN WORKPLACE FOR DEEP KNOWLEDGE
ashishmalikp2426
 
ANKITA PPT FINAL.pptx....................
PriyalSharma25
 
Mommy J. At San Vicente Ward (Group 1)).pdf
MitsuriKanroji22
 

DirectX 11 Rendering in Battlefield 3

  • 1. v4DirectX 11 Rendering in Battlefield 3Johan Andersson, Rendering Architect, DICE
  • 2. AgendaOverviewFeature:Deferred ShadingCompute Shader Tile-Based LightingTerrain Displacement MappingDirect Stereo 3D renderingQuality:Antialiasing: MSAAAntialiasing: FXAAAntialiasing: SRAATransparency SupersamplingPerformance:InstancingParallel dispatchMulti-GPUResource StreamingConclusionsQ & A
  • 4. Battlefield 3FPSFall 2011DX11, Xbox 360 and PS3Frostbite 2 engine
  • 5. Frostbite 2Developed for Battlefield 3 and future DICE/EA gamesMassive focus on creating simple to use & powerful workflowsMajor pushes in animation, destruction, streaming, rendering, lightingand landscapes
  • 6. DX11DX11 API onlyRequires a DX10 or DX11 GPUsRequires Vista SP1 or Windows 7No Windows XP!Why?CPU performance winGPU performance & quality winEase ofdevelopment - no legacyFuture proofBF3 is a big title - will drive OS & HW adoptionWhich is good for your game as well! 
  • 7. Options for renderingSwitched to Deferred Shading in FB2Rich mix of Outdoor + Indoor + Urban environments in BF3Wanted lots more light sourcesWhy not Forward Rendering?Light culling / shader permutations not efficient for usExpensive & more difficultdecaling / destruction maskingWhy not Light Pre-pass?2x geometry pass too expensive on both CPU & GPU for usWas able to generalize our BRDF enough to just a few variations Saw major potential in full tile-based deferred shadingSee also:Nicolas Thibieroz’s talk ”Deferred Shading Optimizations”
  • 8. Deferred ShadingWeaknesses with traditional deferred lighting/shading:Massive overdraw & ROP cost when having lots of big light sourcesExpensive to have multiple per-pixel materials in light shadersMSAA lighting can be slow (non-coherent, extra BW)
  • 10. Tile-based Deferred Shading1. Divide screen into tiles and determine which lights affects which tiles2. Only apply the visible light sources on pixels Custom shader with multiple lightsReduced bandwidth & setup costHow can we do this best in DX11?
  • 11. Lighting with Compute ShaderTile-based Deferred Shading using Compute ShadersPrimarily for analytical light sourcesPoint lights, cone lights, line lightsNo shadowsRequires Compute Shader 5.0Hybrid Graphics/Compute shading pipeline:Graphics pipeline rasterizes gbuffers for opaque surfacesCompute pipeline uses gbuffers, culls lights, computes lighting & combines with shadingGraphics pipeline renders transparent surfaces on top
  • 12. CS requirements & setup1 thread per pixel, 16x16 thread groups (aka tile)Input: gbuffers, depth buffer & list oflightsOutput: fully composited & lit HDR textureNormalRoughnessTexture2D<float4> gbufferTexture0 : register(t0);Texture2D<float4> gbufferTexture1 : register(t1);Texture2D<float4> gbufferTexture2 : register(t2);Texture2D<float4> depthTexture : register(t3);RWTexture2D<float4> outputTexture : register(u0);#define BLOCK_SIZE 16[numthreads(BLOCK_SIZE,BLOCK_SIZE,1)]void csMain( uint3 groupId : SV_GroupID, uint3 groupThreadId : SV_GroupThreadID, uint groupIndex: SV_GroupIndex, uint3 dispatchThreadId : SV_DispatchThreadID){ ...}Diffuse AlbedoSpecular Albedo
  • 13. groupshared uint minDepthInt;groupshared uint maxDepthInt;// --- globals above, function below -------float depth = depthTexture.Load(uint3(texCoord, 0)).r;uint depthInt = asuint(depth);minDepthInt = 0xFFFFFFFF;maxDepthInt = 0;GroupMemoryBarrierWithGroupSync();InterlockedMin(minDepthInt, depthInt);InterlockedMax(maxDepthInt, depthInt);GroupMemoryBarrierWithGroupSync();float minGroupDepth = asfloat(minDepthInt);float maxGroupDepth = asfloat(maxDepthInt);CS steps 1-21. Load gbuffers & depth2. Calculate min & max z in threadgroup / tileUsing InterlockedMin/Max on groupshared variableAtomics only work on ints Can cast float to int (z is always +)
  • 14. CS step 3 – CullingDetermine visible light sources for each tileCull all light sources against tile frustumInput (global): Light list, frustum& SW occlusionculledOutput (per tile):# of visible light sourcesIndex list of visible light sourcesPer-tile visible light count(black = 0 lights, white = 40)
  • 15. struct Light { float3 pos; float sqrRadius; float3 color; float invSqrRadius;};int lightCount;StructuredBuffer<Light> lights;groupshared uint visibleLightCount = 0;groupshared uint visibleLightIndices[1024];// --- globals above, cont. function below ---uint threadCount = BLOCK_SIZE*BLOCK_SIZE; uint passCount = (lightCount+threadCount-1) / threadCount;for (uint passIt = 0; passIt < passCount; ++passIt) { uint lightIndex = passIt*threadCount + groupIndex; // prevent overrun by clamping to a last ”null” light lightIndex = min(lightIndex, lightCount); if (intersects(lights[lightIndex], tile)) { uint offset; InterlockedAdd(visibleLightCount, 1, offset); visibleLightIndices[offset] = lightIndex; } }GroupMemoryBarrierWithGroupSync();3a. Each thread switches to process lights instead of pixelsWow, parallelism switcharoo!256 light sources in parallel Multiple iterations for >256 lights3b. Intersect light and tileMultiple variants – accuracy vsperfTile min & max z is used as a ”depth bounds” test3c. Append visible light indices to listAtomic add to threadgroup shared memory”inlined stream compaction”3d. Switch back to processing pixelsSynchronize the thread groupWe now know which light sources affect the tileCS step 3 – Impl
  • 16. CS deferred shading final steps4. For each pixel, accumulate lighting from visible lightsRead from tile visible light index list in groupshared memory5. Combine lighting & shadingalbedosOutput is non-MSAA HDR textureRender transparent surfaces on topComputed lightingfloat3 color = 0;for (uintlightIt = 0; lightIt < visibleLightCount; ++lightIt){uintlightIndex = visibleLightIndices[lightIt];Lightlight = lights[lightIndex]; color += diffuseAlbedo * evaluateLightDiffuse(light, gbuffer); color += specularAlbedo * evaluateLightSpecular(light, gbuffer);}
  • 17. Massive lighting test scene - 1000 large point lightsGDC 2011 test content
  • 18. MSAA Compute Shader LightingOnly edge pixels need full per-sample lightingBut edges have bad screen-space coherency! InefficientCompute Shader can build efficient coherent pixel listEvaluate lighting for each pixel (sample 0)Determine if pixel requires per-sample lightingIf so, add to atomic list in shared memoryWhen all pixels are done, synchronizeGo through and light sample 1-3 for pixels in listMajor performance improvement!Described in detail in [Lauritzen10]
  • 20. Terrain renderingBattlefield 3 terrainsHuge area & massive view distancesDynamic destructible heightfieldsProcedural virtual texturingStreamed heightfields, colormaps, masksFull details at at a later conferenceWe stream in source heightfield data at close to pixel ratioDerive high-quality per-pixel normals in shaderHow can we increase detail even further on DX11?Create better silhouettes and improved depthKeep small-scale detail (dynamic craters & slopes)
  • 21. >
  • 24. Terrain Displacement MappingStraight high-res heightfields, no procedural detailLots of data in the heightfieldsPragmatic & simple choiceNo changes in physics/collisionsNo content changes, artists see the true detail they createdUses DX11 fixed edge tessellation factorsStable, no swimming vertices Though can be wastefulHeight morphing for streaming by fetching 2 heightfields in domain shader & blend based on patch CLOD factorMore work left to figure optimal tessellation scheme for our use case
  • 25. Stereo 3D rendering in DX11Nvidia’s 3D Visiondrivers is a good and almost automatic stereo 3D rendering method But only for forward rendering, doesn’t work with deferred shadingOr on AMD or Intel GPUsTransparent surfaces do not get proper 3D depthWe instead use explicit 3D stereo renderingRender unique frame for each eyeWorks with deferred shading & includes all surfacesHigher performance requirements, 2x draw callsWorks with Nvidia’s 3D Vision and AMD’s HD3DSimilar to OpenGL quad buffer supportAsk your friendly IHV contact how
  • 27. InstancingDraw calls can still be major performance bottleneckLots of materials / lots of variationComplex shadowmapsHigh detail / long view distancesFull 3D stereo renderingBattlefield have lots of use cases for heavy instancingProps, foliage, debris, destruction, mesh particlesBatching submissions is still important, just as before!Richard Huddy:”Batch batch batch!”
  • 28. Instancing in DirectXDX9-style stream instancing is good, but restrictiveExtra vertex attributes, GPU overheadCan’t be (efficiently) combined with skinningUsed primarily for tiny meshes (particles, foliage)DX10/DX11 brings support for shader Buffer objectsVertex shaders have access to SV_InstanceIDCan do completely arbitrary loads, not limited to fixed elementsCan support per-instance arrays and other data structures!Let’s rethink how instancing can be implemented..
  • 29. Instancing dataMultiple object typesRigid / skinned / composite meshesMultiple object lighting typesSmall/dynamic: light probesLarge/static: light mapsDifferent types of instancing data we haveTransform float4x3Skinning transforms float4x3 arraySH light probe float4 x 4Lightmap UV scale/offset float4Let’s pack all instancing data into single big buffer!
  • 30. Buffer<float4> instanceVectorBuffer : register(t0);cbuffera{ float g_startVector;float g_vectorsPerInstance;}VsOutputmain( // .... uint instanceId : SV_InstanceId){ uint worldMatrixVectorOffset = g_startVector + input.instanceId * g_vectorsPerInstance + 0; uint probeVectorOffset = g_startVector + input.instanceId * g_vectorsPerInstance + 3; float4 r0 = instanceVectorBuffer.Load(worldMatrixVectorOffset + 0); float4 r1 = instanceVectorBuffer.Load(worldMatrixVectorOffset + 1); float4 r2 = instanceVectorBuffer.Load(worldMatrixVectorOffset + 2); float4 lightProbeShR = instanceVectorBuffer.Load(probeVectorOffset + 0); float4 lightProbeShG = instanceVectorBuffer.Load(probeVectorOffset + 1); float4 lightProbeShB = instanceVectorBuffer.Load(probeVectorOffset + 2); float4 lightProbeShO = instanceVectorBuffer.Load(probeVectorOffset + 3); // ....}Instancing example: transform + SH
  • 31. half4 weights = input.boneWeights;int4 indices = (int4)input.boneIndices;float4 skinnedPos = mul(float4(pos,1), getSkinningMatrix(indices[0])).xyz * weights[0];skinnedPos += mul(float4(pos,1), getSkinningMatrix(indices[1])).xyz * weights[1];skinnedPos += mul(float4(pos,1), getSkinningMatrix(indices[2])).xyz * weights[2];skinnedPos += mul(float4(pos,1), getSkinningMatrix(indices[3])).xyz * weights[3];// ...float4x3 getSkinningMatrix(uintboneIndex){uintvectorOffset = g_startVector + instanceId * g_vectorsPerInstance;vectorOffset += boneIndex*3; float4 r0 = instanceVectorBuffer.Load(vectorOffset + 0); float4 r1 = instanceVectorBuffer.Load(vectorOffset + 1); float4 r2 = instanceVectorBuffer.Load(vectorOffset + 2);return createMat4x3(r0, r1, r2);}Instancing example: skinning
  • 32. Instancing benefitsSingle draw call per object type instead of per instanceMinor GPU hit for big CPU gainInstancing does not break when skinning partsMore deterministic & better overall performanceEnd result is typically 1500-2000 draw callsRegardless of how many object instances the artists place!Instead of 3000-7000 draw calls in some heavy cases
  • 33. Parallel Dispatch in TheoryGreat key DX11 feature!Improve performance by scaling dispatching to D3D to more coresReduce frame latencyHow we use it:DX11 deferred context per HW threadRenderer builds list of all draw calls we want to do for each rendering ”layer” of the frameSplit draw calls for each layer into chunks of ~256 Dispatch chunks in parallel to the deferred contexts to generate command listsRender to immediate context & execute command listsProfit! ** but theory != practice
  • 34. Parallel Dispatch in PracticeStill no performant drivers available for our use case Have waited for 2 years and still areBig driver codebases takes time to refactorIHVs vs Microsoft quagmireHeavy driver threads collide with game threadsHow it should work (an utopia?)Driver does not create any processing threads of its ownGame submits workload in parallel to multiple deferred contextsDriver make sure almost all processing required happens on the draw call on the deferred contextGame dispatches command list on immediate context, driver does absolute minimal work with itStill good to design engine for + instancing is great!quagmire [ˈkwægˌmaɪə ˈkwɒg-]n1. (Earth Sciences / Physical Geography) a soft wet area of land that gives way under the feet; bog2. an awkward, complex, or embarrassing situation
  • 35. Resource streamingEven with modern GPUs with lots of memory, resource streaming is often requiredCan’t require 1+ GB graphics cardsBF3 levels have much more than 1 GB of textures & meshesReduced load timesBut creating & destroying DX resources in-frame has never been a good thingCan cause non-deterministic & large driver / OS stalls Has been a problem for a very long time in DXAbout time to fix it
  • 36. DX11 Resource StreamingHave worked with Microsoft, Nvidia & AMD to make sure we can do stall free async resource streaming of GPU resources in DX11Want neither CPU nor GPU perf hitKey foundation: DX11 concurrent createsResource creation flow:Streaming system determines resources toload (texturemipmaps or meshLODs)Add up DX resource creation on to queue on our own separate low-priority threadThread creates resources using initial data, signals streaming systemResource created, game starts using itEnablesasync stall-free DMA in drivers!Resource destruction flow:Streaming system deletes D3D resourceDriver keeps it internally alive until GPU frames using it are done. NO STALL!D3D11_FEATURE_DATA_THREADING threadingCaps;FB_SAFE_DX(m_device->CheckFeatureSupport( D3D11_FEATURE_THREADING, &threadingCaps, sizeof(threadingCaps)));if (threadingCaps.DriverConcurrentCreates)
  • 37. Multi-GPUEfficiently supporting Crossfire and SLI is important for usHigh-end consumers expect itIHVs expect it (and can help!)Allows targeting higher-end HW then currently available during devAFR is easy: Do not reuse GPU resources from previous frame!UpdateSubResourceis easy & robust to use for dynamic resources, but not idealAll of our playtests run with exe named AFR-FriendlyD3D.exe Disables all driver AFR synchronization workaroundsRather find corruption during dev then have bad perfForceSingleGPU.exe is also useful to track down issues
  • 39. AntialiasingReducing aliasing is one of our key visual prioritiesCreates a more smooth gameplay experienceExtra challenging goal due to deferred shadingWe use multiple methods:MSAA –Multisample AntialiasingFXAA – Fast Approximate AntialiasingSRAA – Sub-pixel Reconstruction AntialiasingTSAA – Transparency Supersampling AntialiasingAliasing
  • 40. MSAAOur solution: Deferred geometry pass renders with MSAA (2x, 4x or 8x)Light shaders evaluate per-sample (when needed), averages the samples and writes out per-pixelTransparent surfaces rendered on top without MSAA1080p gbuffer+z with 4x MSAA is 158 MBLots of memory and lots of bandwidth Could be tiled to reduce memory usageVery nice quality though Our (overall) highest quality optionBut not fast enough for more GPUsNeed additional solution(s)..
  • 41. FXAA”Fast Approximate Antialiasing”GPU-based MLAA implementation by Timothy Lottes (Nvidia)Multiple quality options~1.3 ms/f for 1080p on Geforce 580Pros & cons:Superb antialiased long edges! Smooth overall picture Reasonably fast Moving pictures do not benefit as much ”Blurry aliasing” Will be released here at GDC’11 Part of Nvidia’s example SDK
  • 42. SRAA”Sub-pixel Reconstruction Antialiasing”Presented at I3D’11 2 weeks ago [Chajdas11]Use 4x MSAA buffers to improve reconstructionMultiple variants:MSAA depth bufferMSAA depth buffer + normal bufferMSAA Primitive ID / spatial hashing bufferPros:Better at capturing small scale detail Less ”pixel snapping” than MLAA variants Cons:Extra MSAA z/normal/id pass can be prohibitive Integration not as easy due to extra pass 
  • 48. FXAA
  • 50. MSAA Sample CoverageNone of the AA solutions can solve all aliasingFoliage & other alpha-tested surfaces are extra difficult casesUndersampled geometry, requires sub-samples DX 10.1 added SV_COVERAGE as a pixel shader outputDX 11 added SV_COVERAGE as a pixel shader inputWhat does this mean?We get full programmable control over the coverage maskNo need to waste the alpha channel output (great for deferred)We can do partial supersampling on alpha-tested surfaces!
  • 51. Transparency Supersamplingstatic const float2 msaaOffsets[4] ={ float2(-0.125, -0.375), float2(0.375, -0.125), float2(-0.375, 0.125), float2(0.125, 0.375)};void psMain( out float4 color : SV_Target, out uint coverage : SV_Coverage){ float2 texCoord_ddx = ddx(texCoord); float2 texCoord_ddy = ddy(texCoord); coverage = 0; [unroll] for (int i = 0; i < 4; ++i) { float2 texelOffset = msaaOffsets[i].x * texCoord_ddx; texelOffset += msaaOffsets[i].y * texCoord_ddy; float4 temp = tex.SampleLevel(sampler, texCoord + texelOffset); if (temp.a >= 0.5) coverage |= 1<<i; }}Shade per-pixel but evaluate alpha test per-sampleWrite out coverage bitmaskMSAA offsets are defined in DX 10.1Requires shader permutation for each MSAA levelGradients still quite limitedBut much better than standard MSAA! Can combine with screen-space ditherSee also:DirectX SDK ’TransparencyAA10.1GDC’09 STALKER talk [Lobanchikov09]
  • 52. Alpha testing4x MSAA + Transparency Supersampling
  • 53. ConclusionsDX11 is here – in force2011 is a great year to focus on DX112012 will be a great year for more to drop DX9We’ve found lots & lots of quality & performance enhancing features using DX11And so will you for your game!Still have only started, lots of potentialTake advantage of the PC strengths, don’t hold it backBig end-user valueGood preparation for Gen4
  • 54. ThanksChristina Coffin (@ChristinaCoffin)Mattias WidmarkKenny MagnussonColin Barré-Brisebois (@ZigguratVertigo)Timothy Lottes (@TimothyLottes)Matthäus G. Chajdas (@NIV_Anteru)Miguel SainzNicolas ThibierozBattlefield teamFrostbite teamMicrosoftNvidiaAMDIntel
  • 55. Questions?Email: [email protected]: http://repi.seTwitter: @repiBattlefield 3 & Frostbite 2 talks at GDC’11:For more DICE talks: http://publications.dice.se
  • 56. References[Lobanchikov09] Igor A. Lobanchikov, ”GSC Game World’s STALKER: Clear Sky – a showcase for Direct3D 10.0/1” GDC’09. http://developer.amd.com/gpu_assets/01gdc09ad3ddstalkerclearsky210309.ppt[Lauritzen10] Andrew Lauritzen, ”Deferred Rendering for Current and Future Rendering Pipelines” SIGGRAPH’10 http://bps10.idav.ucdavis.edu/[Chajdas11] Matthäus G. Chajdas et al ”Subpixel Reconstruction Antialiasing for Deferred Shading.”. I3D’11