Optimize Your Game Performance For Consoles and PCs in Unity Unity 6 Edition
Optimize Your Game Performance For Consoles and PCs in Unity Unity 6 Edition
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Profiling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Deep Profiling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Project Auditor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Memory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Assets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Compress textures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
The AssetPostprocessor . . . . . . . . . . . . . . . . . . . . . . . 31
UnityDataTools. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Use Addressables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Use ScriptableObjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Project configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Switch to IL2CPP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Forward. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Forward +. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Deferred shading. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Spatial-Temporal Post-Processing . . . . . . . . . . . . . . . . . . 66
Bake lightmaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Disable shadows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
GPU Lightmapper. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
GPU optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Culling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Dynamic resolution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
User interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
User interface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Audio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
2D Physics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Animation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
This guide brings together all the best and latest PC and console performance optimization
tips available in Unity 6. It is one of two optimization guides available for Unity 6, the other
being Optimize your game performance for mobile, XR, and Unity Web in Unity 6.
Performance optimization is a vast topic that requires careful attention. In order to find an
efficient solution that satisfies your design requirements, you will need to master Unity’s
classes and components, algorithms and data structures, and your platform’s profiling tools.
Optimizing your console or PC application is an essential process that underpins the entire
game development cycle. While your audience may take it for granted that your game runs at
silky-smooth 60+ frames per second (fps), achieving your performance goals across multiple
platforms is not always easy. To optimize effectively requires effort to make both your code
architecture and art assets more efficient and an understanding of how your target hardware
operates within its limitations.
This guide assembles knowledge and advice from Unity’s expert software engineers who have
tested these best practices with our industry partners in real-world scenarios.
Follow these steps to get the best performance from your PC and console game.
It’s essential that you can measure how your game runs on its target platform and use this
information to track down the cause of a performance problem. By watching the profiling tool as
you make changes, you can gauge whether the changes actually fix the performance problem.
The Unity Profiler provides performance information about your application, but it can’t help
you if you don’t use it.
Profiling is the process of measuring aspects of your game’s performance at runtime and to
track down the cause of a performance problem. By monitoring the profiling tool as you make
changes, you can gauge whether changes actually fix the performance problem.
Profile your project early and throughout the development cycle, not just when you are close
to shipping. Investigate glitches or spikes as soon as they appear and make sure to benchmark
performance before and after major changes in your project. As you develop a “performance
signature” for your project, you’ll be able to spot new issues more easily.
While profiling in the Editor can give you an idea of the relative performance of different
systems in your game, profiling on each device gives you the opportunity to gain more
accurate insights. Profile a development build on target devices whenever possible. Remember
to profile and optimize for both the highest- and lowest-spec devices that you plan to support.
Unity offers a suite of profiling tools to help identify bottlenecks including the Unity Profiler,
the Memory Profiler and Profile Analyzer.
Of course, not every optimization described here will apply to your application. Something
that works well in one project may not translate to yours. Identify genuine bottlenecks and
concentrate your efforts on what benefits your work.
To learn more about how to plan your profiling workflows see the Ultimate guide to profiling
Unity games.
The Profiler is instrumentation-based. It profiles timings of game and engine code that are
automatically marked up (such as MonoBehaviour’s Start or Update methods, or specific API
calls), or explicitly wrapped with the help of ProfilerMarker API.
Begin by enabling the CPU and Memory tracks as your default. You can monitor
supplementary Profiler Modules like Renderer, Audio, and Physics as needed for your game
(for example, profiling as needed for physics-heavy or music-based gameplay). However, only
enable what you need so you don’t impact performance and skew your results.
To capture profiling data from an actual device within your chosen platform, check the
Development Build before you click Build and Run. Then, connect the Profiler to your
application manually once it’s running.
You can optionally check Autoconnect Profiler under Platform Settings in the Build Profiles
window. Sometimes this is useful if you specifically want to capture the first few frames of the
application. Be warned that this option can add 5–10 seconds of startup time, so only use it
when necessary.
Choose the platform target to profile. The Record button tracks several seconds of your
application’s playback (300 frames by default). Go to Unity > Preferences > Analysis >
Profiler > Frame Count to increase this up to 2000 if you need longer captures. While this
costs more CPU and memory resources, it can be useful depending on your specific scenario.
When using the Deep Profiling setting, Unity can profile the beginning and end of every
function call in your script code, telling you exactly which part of your application is being
executed and potentially causing a delay. However, deep profiling adds overhead to every
method call and may skew the performance analysis.
Click in the window to analyze a specific frame. Next, use either the Timeline or Hierarchy
view for the following:
— Timeline shows the visual breakdown of timing for a specific frame. This allows you to
visualize how the activities relate to one another and across different threads. Use this
option to determine if you are CPU- or GPU-bound.
— Hierarchy shows the hierarchy of ProfileMarkers, grouped together. This allows you to
sort the samples based on time cost in milliseconds (Time ms and Self ms). You can also
count the number of Calls to a function and the managed heap memory (GC Alloc) on
the frame.
If you’re new to profiling, start with this series of short video tutorials:
— Profiler walkthrough
For a complete overview of the Unity Profiler, download the profiling e-book here.
Before optimizing anything in your project, save the Profiler .data file. Implement your changes
and compare the saved .data before and after the modification. Rely on this cycle to improve
performance: profile, optimize, and compare.
Deep Profiling
You can also enable Deep Profiling Support in the Build Settings. When the built Player
starts, the Deep Profiler profiles every part of your code, not just code timings explicitly
wrapped in ProfilerMarkers.
When Deep Profiling is enabled, Unity can profile the beginning and end of every function
call in your script code. This can help you identify exactly which part of your application is
causing a slowdown.
However, Deep Profiling is resource intensive and uses a lot of memory. Each
ProfilerMarker adds a tiny bit of overhead (about 10ns, depending on the platform), so
each additional point of measurement slows your application more. Also, be warned that
if you have a lot of function calls, deep profiling amplifies their overhead as well.
If you want to see more details on samples with markers such as GC.Alloc or JobHandle.
Complete, navigate to the Profiler window toolbar and enable the Call Stacks setting.
This provides the sample’s full call stack, which gives you the information you need
without incurring the overhead of Deep Profiling.
In general, only use Deep Profiling when it’s necessary, since your application runs
significantly slower when it’s in use.
Take an even deeper dive into frames and marker data with the Profile Analyzer, which complements the existing Profiler.
Watch this Profile Analyzer tutorial to learn more about its features.
Each frame will have a time budget based on your target frames per second (fps). For an
application to run at 30 fps, its frame budget can’t exceed 33.33 ms per frame (1000 ms / 30
fps). Likewise, a target of 60 fps leaves 16.66 ms per frame.
We recommend that you use frame time in milliseconds instead. To understand why, look
at this graph of fps versus Frame Time:
If your application is running at 900 fps, this translates into a frame time of 1.111
milliseconds per frame. At 450 fps, this is 2.222 milliseconds per frame. This represents
a difference of only 1.111 milliseconds per frame, even though the frame rate appears to
drop by one half.
If you look at the differences between 60 fps and 56.25 fps, that translates into 16.666
milliseconds per frame and 17.777 milliseconds per frame, respectively. This also
represents 1.111 milliseconds extra per frame, but here, the drop in frame rate feels far
less dramatic percentage-wise.
This is why developers use the average frame time to benchmark game speed rather than
fps.
Don’t worry about fps unless you drop below your target frame rate. Focus on frame time
to measure how fast your game is running, then stay within your frame budget.
Read the original article, “Robert Dunlop’s FPS versus Frame Time,” for more information.
When the CPU takes too long to process and prepare the frame for the GPU, then the overall
frame rate is limited by the CPU’s performance – it becomes CPU bound.
Similarly, when the GPU takes too long to render the frame after it has been prepared by the
CPU, it’s GPU bound.
The Profiler can tell you if your CPU is taking longer than your allotted frame budget, or if the
culprit is your GPU. It does this by emitting markers prefixed with Gfx as follows:
— If you see the Gfx.WaitForCommands marker, it means that the render thread is ready,
but you might be waiting for a bottleneck on the main thread.
If the render thread spends time in Camera.Render, your application is CPU-bound and
might be spending too much time sending draw calls or textures to the GPU.
Refer to the Common Profiler markers documentation for a complete list of markers. Also,
check out our blog post on Fixing Time.deltaTime in Unity 2020.2 for smoother gameplay for
more information about the frame pipeline.
— Intel VTune: Quickly find and fix performance bottlenecks on Intel platforms with this
suite of tools for Intel processors only.
— Intel GPA suite: This suite of graphics-focused tools can help you improve your game’s
performance by quickly identifying problem areas.
— PIX: PIX is a performance tuning and debugging tool for Windows and Xbox game
developers using DirectX 12. It includes tools for understanding and analyzing CPU and
GPU performance, as well as monitoring various real-time performance counters. For
Windows developers, start here. For more details about PIX for Xbox, you need to be a
registered Xbox developer: Start here.
PC / Universal
— AMD μProf: AMD uProf is a performance analysis tool for understanding and profiling
performance for applications running on AMD hardware.
— NVIDIA NSight Developer Tools: This tooling enables developers to build, debug, profile,
and develop class-leading and cutting-edge software using the latest accelerated
computing hardware from NVIDIA.
PlayStation®
— CPU profiler tools are available for the PlayStation development environment. For more
details, you need to be a registered PlayStation developer: Start here.
Web
— Firefox Profiler: Dig into the call stacks and view flame graphs for Unity Web builds
(among other things) with the Firefox Profiler. It also features a comparison tool to look
at profiling captures side by side.
— Chrome DevTools Performance: This web browser tool can be used to profile Unity Web
builds.
Some are platform-specific and offer closer platform integration. Here are some of the tools
specifically available for various platforms:
— Apple Frame Capture Debugging Tools: GPU debugging for Apple platforms
— Visual Studio Graphics Diagnostics: Choose this and/or PIX for DirectX-based platforms
such as Windows or Xbox
— NVIDIA Nsight Graphics: Graphics profiler and debugger for NVIDIA GPUs.
— AMD Radeon Developer Tool Suite: GPU profiler for AMD GPUs
Project Auditor
The Project Auditor is an experimental tool capable of performing static analysis of a project’s
scripts and settings. It offers a great way to track down the causes of managed memory
allocations, inefficient project configurations, and possible performance bottlenecks.
The Project Auditor is a free, unofficial package for use with the Editor. For information, please
refer to the Project Auditor documentation.
Unity employs automatic memory management for user-generated code and scripts. Small
pieces of data, like value-typed local variables, are allocated to the stack. Larger pieces of
data and long-term storage are allocated to the managed or native heaps.
The garbage collector periodically identifies and deallocates unused managed heap
memory. The Asset garbage collection runs on demand or when you load a new scene, and
it deallocates native objects and resources. While this runs automatically, the process of
examining all the objects in the heap can cause the game to stutter or run slowly.
Optimizing your memory usage means being conscious of when you allocate and deallocate
managed heap memory, and how you minimize the effect of garbage collection.
Use the Unity Objects tab to identify areas where you can eliminate duplicate memory entries
or find which objects use the most memory. The All of Memory tab displays a breakdown of all
the memory in the snapshot that Unity tracks.
Learn how to leverage the Memory Profiler in Unity for improved memory usage.
— Strings: In C#, strings are reference types, not value types. This means that every
new string will be allocated on the managed heap, even if it’s only used temporarily.
Reduce unnecessary string creation or manipulation if you are using them at large
scale. Avoid parsing string-based data files such as JSON and XML, and store data
in ScriptableObjects or formats like MessagePack or Protobuf instead. Use the
StringBuilder class if you need to build strings at runtime.
— Unity function calls: Some Unity API functions create heap allocations, particularly
ones which return an array of managed objects. Cache references to arrays rather
than allocating them in the middle of a loop. Also, take advantage of certain functions
that avoid generating garbage. For example, use GameObject.CompareTag instead of
manually comparing a string with GameObject.tag (as returning a new string creates
garbage).
— Coroutines: Though yield does not produce garbage, creating a new WaitForSeconds
object does. Cache and reuse the WaitForSeconds object rather than creating it in the
yield line.
— LINQ and Regular Expressions: Both of these generate garbage from behind-the-
scenes boxing. Avoid LINQ and Regular Expressions if performance is an issue. Write for
loops and use lists as an alternative to creating new arrays.
— Generic Collections and other managed types: Don’t declare and populate a List or
collection every frame in Update (for example, a list of enemies within a certain radius
of the player). Instead make the List a member of the MonoBehaviour and initialize it in
Start. Simply empty the collection with Clear every frame before using it.
For more information, see the manual page on Garbage Collection Best Practices.
See Understanding Automatic Memory Management for examples of how to use this to your
advantage.1
1 Note that using the GC can add read-write barriers to some C# calls, which come with little overhead
that can add up to ~1 ms per frame of scripting call overhead. For optimal performance, it is ideal to have
no GC Allocs in the main gameplay loops and to hide the GC.Collect where a user won’t notice it.
Note: Incremental GC can temporarily help mitigate garbage collection issues, but the best
long-term course of action is to locate and stop frequent allocations that trigger garbage
collection.
The asset pipeline can dramatically impact your application’s performance. An experienced
technical artist can help your team define and enforce asset formats, specifications, and
import settings for smooth processes.
Don’t rely on default settings. Use the platform-specific override tab to optimize assets such
as textures and mesh geometry. Incorrect settings might yield larger build sizes, longer build
times, poor GPU performance, and poor memory usage. Consider using the Presets feature to
help customize baseline settings that will enhance a specific project.
See this guide to best practices for importing art assets. For a mobile-specific guide (with
many general tips as well), check out the Unity Learn course on 3D art optimization for mobile
applications. And watch the GDC 2023 session “Technical tips for every stage of game
creation” to learn more about how to leverage Presets.
Compress textures
Consider these two examples using the same model and texture. The settings on the top
consume more than five times the memory compared to those on the bottom, without much
benefit in visual quality.
Texture compression offers significant performance benefits when you apply it correctly.
This can result in faster load times, a smaller memory footprint, and dramatically increased
rendering performance. Compressed textures only use a fraction of the memory bandwidth
needed for uncompressed 32-bit RGBA textures.
Refer to this recommended list of texture compression formats for your target platform:
— Lower the Max Size: Use the minimum settings that produce visually acceptable results.
This is non-destructive and can quickly reduce your texture memory.
— Use powers of two (POT): Unity requires POT texture dimensions for texture
compression formats.
— Toggle off the Read/Write Enabled option: When enabled, this option creates a copy in
both CPU- and GPU-addressable memory, doubling the texture’s memory footprint. In
most cases, keep this disabled (only enable this if you generate a texture at runtime and
need to overwrite it). You can also enforce this option via Texture2D.Apply, passing in
makeNoLongerReadable set to true.
Proper texture import settings will help optimize your build size.
For 2D projects, you can use a Sprite Atlas (Asset > Create > 2D > Sprite Atlas) rather than
rendering individual Sprites and Textures.
For 3D projects, you can use your digital content creation (DCC) package of choice. Several
third-party tools like MA_TextureAtlasser or TexturePacker also can build texture atlases.
Combine textures and remap UVs for any 3D geometry that doesn’t require high-resolution
maps. A visual editor gives you the ability to set and prioritize the sizes and positions in the
texture atlas or sprite sheet.
The texture packer consolidates the individual maps into one large texture. Unity can then
issue a single draw call to access the packed Textures with a smaller performance overhead.
Keep the geometric complexity of GameObjects in your scenes to a minimum, otherwise Unity
has to push a lot of vertex data to the graphics card.
Consider cutting down models in your DCC package of choice. Delete unseen polygons from
the camera’s point of view. For example, if you never see the back of a cupboard resting
against a wall, the model should not have any faces there.
Be aware that the bottleneck is not usually polygon count on modern GPUs, but rather polygon
density. We recommend performing an art pass across all assets to reduce the polygon count
of distant objects. Microtriangles can be a significant cause of poor GPU performance.
Depending on the target platform, investigate adding details via high-resolution textures to
compensate for low-poly geometry. Use textures and normal maps instead of increasing the
density of the mesh.
Reduce pixel complexity by baking as much detail into the textures as possible. For example,
capture the specular highlights into the texture to avoid having to compute the highlight in the
fragment shader.
Be mindful and remember to profile regularly, as these techniques can impact performance,
and may not be suitable for your target platform.
— Use mesh compression: Aggressive mesh compression can reduce disk space (memory
at runtime, however, is unaffected). Note that mesh quantization can result in inaccuracy,
so experiment with compression levels to see what works for your models.
— Disable Read/Write: Enabling this option duplicates the mesh in memory, which keeps
one copy of the mesh in system memory and another in GPU memory. In most cases,
you should disable it (in Unity 2019.2 and earlier, this option is checked by default).
— Disable rigs and BlendShapes: If your mesh does not need skeletal or blendshape
animation, disable these options wherever possible.
— Disable normals and tangents: If you are absolutely certain the mesh’s material will not
need normals or tangents, uncheck these options for extra savings.
— Vertex Compression sets vertex compression per channel. For example, you can enable
compression for everything except positions and lightmap UVs. This can reduce runtime
memory usage from your meshes.
Note that the Mesh Compression in each mesh’s Import Settings overrides the vertex
compression setting. In that event, the runtime copy of the mesh is uncompressed and
may use more memory.
— Optimize Mesh Data removes any data from meshes that is not required by the material
applied to them (such as tangents, normals, colors, and UVs).
The AssetPostprocessor
The AssetPostprocessor allows you to hook into the import pipeline and run scripts prior to or
when importing assets. This prompts you to customize settings before and/or after importing
models, textures, audio, and so on in a way that’s similar to presets but through code.
Learn more about the process in the GDC 2023 talk “Technical tips for every stage of game
creation.”
UnityDataTools
UnityDataTools is a collection of open source tools provided by Unity that aim to enhance
the data management and serialization capabilities in Unity projects. It includes features for
analyzing and optimizing project data, such as identifying unused assets, detecting asset
dependencies, and reducing build size.
Learn more about the tools here and read more about Asset Auditing in the Understanding
Optimization in Unity section of the best practice guide.
If either the upload rate is too slow or the main thread stalls while loading several Textures
at once, adjust the Texture buffers. Usually you can set the value (in MB) to the size of the
largest texture you need to load in the Scene.
Note: Be aware that changing the default values can lead to high memory pressure. Also,
you cannot return ring buffer memory to the system after Unity allocates it. If GPU memory
overloads, the GPU unloads the least-recently used Texture and forces the CPU to re-
upload it the next time it enters the camera frustum.
Read more about memory restrictions in Texture buffers when using time-slice awake in the
Memory Management in Unity guide.
This system reduces the total amount of memory needed for textures because it only loads
the mipmaps necessary to render the current Camera position. Otherwise, Unity loads all of
the textures by default. Texture streaming trades a small amount of CPU resources to save a
potentially large amount of GPU memory.
You can use the Mipmap Streaming API for additional control. Texture streaming automatically
reduces mipmap levels to stay within the user-defined memory budget.
Use Addressables
The Addressable Asset System simplifies how you manage the assets that make up your
game. Any asset, including scenes, prefabs, text assets, and so on, can be marked as
“addressable” and given a unique name. You can then call this alias from anywhere.
Titles being developed for platforms that include standalone and consoles should consider
reviewing the following default settings on Addressable Asset Groups:
— AssetBundle Compression set to LZ4 is unnecessary on consoles (the filesystem has its
own better compression) and also prevents efficient patching of small changes within a
large bundle. Consider setting this to Uncompressed on console platforms and to LZ4 on
Windows.
— The Bundle Naming Mode setting is set to Append Hash to Filename works against
consoles’ patching systems, which do not recognize a rename to “the same” file. Console
platforms should have this setting set to Filename.
Adding this extra level of abstraction between the game and its assets can streamline
certain tasks, such as creating a separate downloadable content pack. Addressables makes
referencing those asset packs easier as well, whether they’re local or remote.
Install the Addressables package from the Package Manager. Each asset or prefab in the
project has the ability to become “addressable” as a result. Checking the option under an
asset’s name in the Inspector assigns it a default unique address.
Once marked, the corresponding assets appear in the Window > Asset Management >
Addressables > Groups window.
In the Addressables Groups, you can see each asset’s custom address paired with its location.
Whether the asset is hosted elsewhere or stored locally, the system will locate it using the
Addressable Name string. An addressable prefab does not load into memory until needed and
automatically unloads its associated assets when no longer in use.
— Tales from the optimization trenches: Saving memory with Addressables which
demonstrates an example of how to organize your Addressable Groups in order to be
more efficient with memory.
— Addressables: Planning and best practices which provides tips for organizing, building,
loading, and unloading addressable assets.
The Unity PlayerLoop contains functions for interacting with the core of the game engine. This
structure includes a number of systems that handle initialization and per-frame updates. All of
your scripts will rely on this PlayerLoop to create gameplay.
When profiling, you’ll see your project’s user code under the PlayerLoop (with Editor
components under the EditorLoop).
The Profiler will show your custom scripts, settings, and graphics in the context of the entire engine’s execution.
If you do need to use Update, consider running the code every n frames. This is one way
to apply time slicing, a common technique of distributing a heavy workload across multiple
frames. In this example, we run the ExampleExpensiveFunction once every three frames:
The trick is to interleave this with other work that runs on the other frames. In this example,
you could “schedule” other expensive functions when Time.frameCount % interval == 1 or
Time.frameCount % interval == 2.
Alternatively, use a custom UpdateManager class (below) and update subscribed objects
every n frames.
— Awake
— OnEnable/OnDisable
— Start
Avoid expensive logic in these functions until your application renders its first frame.
Otherwise, you might encounter longer loading times than necessary.
Refer to the order of execution for event functions for more details on the first scene load.
Frequently calling these methods can contribute to CPU spikes. Wherever possible, run
expensive functions in the initialization phase (i.e., MonoBehaviour.Awake and MonoBehaviour.
Start). Cache the needed references and reuse them later.
void Update()
{
Renderer myRenderer = GetComponent<Renderer>();
ExampleFunction(myRenderer);
}
Instead, invoke GetComponent only once, as the result of the function is cached. The cached
result can be reused in Update without any further calls to GetComponent.
Use preprocessor directives if you are employing these methods for testing:
#if UNITY_EDITOR
void Update()
{
}
#endif
Here, you can freely use the Update in-Editor for testing without unnecessary overhead
slipping into your build.
Everytime Unity calls a Message method like Update or LateUpdate, it makes an interop call, a
call from the C/C++ side to the managed C# side. For a small number of objects, this is not an
issue. When you have thousands of objects, this overhead starts becoming significant.
Consider creating a custom UpdateManager if you have a large project using Update or
LateUpdate in this fashion (e.g. an open-world game). Have active objects subscribe to this
UpdateManager when they want callbacks, and unsubscribe when they don’t. This pattern
could reduce many of the interop calls to your Monobehaviour objects.
To do this more easily, consider making a Conditional attribute along with a preprocessing
directive. For example, create a custom class like this:
Generate your log message with your custom class. If you disable the ENABLE_LOG
preprocessor in the Player Settings > Scripting Define Symbols, all of your Log statements
disappear in one fell swoop.
The same thing applies for other use cases of the Debug Class, such as Debug.DrawLine
and Debug.DrawRay. These are also only intended for use during development and can
significantly impact performance.
Handling strings and text is a common source of performance problems in Unity projects.
Removing Log statements – and their expensive string formatting – can be a huge win.
If your application is logging errors or warning messages in your release build (e.g., to
generate crash reports in the wild), disable stack traces to improve performance.
When using a Set or Get method on an animator, material, or shader, harness the integer-
valued method instead of the string-valued methods. The string methods simply perform
string hashing and then forward the hashed ID to the integer-valued methods.
Use Animator.StringToHash for Animator property names and Shader.PropertyToID for material
and shader property names. Get these hashes during initialization and cache them in variables
for when they’re needed to pass to a Get or Set method.
Instantiating a prefab with the desired components already set up is generally more performant.
Object pooling is a design pattern that can provide performance optimization by reducing the
processing power required of the CPU to run repetitive create and destroy calls. Instead, with
object pooling, existing GameObjects can be reused over and over.
The key function of object pooling is to create objects in advance and store them in a pool,
rather than have them created and destroyed on demand. When an object is needed, it’s taken
from the pool and used. When it’s no longer needed, it’s returned to the pool rather than being
destroyed.
Rather than regularly instantiating and destroying GameObjects (e.g., shooting bullets from a
gun), use pools of preallocated objects that can be reused and recycled.
This reduces the number of managed allocations in your project and can prevent garbage
collection problems.
Unity includes a built-in object pooling feature via the UnityEngine.Pool namespace. Available
in Unity 2021 LTS and later, this namespace facilitates the management of object pools,
automating aspects like object lifecycle and pool size control.
Learn how to create a simple object pooling system in Unity here. You can also see the
object pooling pattern, and many others, implemented in a Unity scene in this sample project
available on the Unity Asset Store.
GameObject.Instantiate(prefab, parent);
GameObject.Instantiate(prefab, parent, position, rotation);
Use ScriptableObjects
Store static unchanging values or settings in a ScriptableObject instead of a MonoBehaviour.
The ScriptableObject is an asset that lives inside of the project that you only need to set up
once.
MonoBehaviours carry extra overhead since they require a GameObject – and by default a
Transform – to act as a host. That means that you need to create a lot of unused data before
storing a single value. The ScriptableObject slims down this memory footprint by dropping the
GameObject and Transform. It also stores the data at the project level, which is helpful if you
need to access the same data from multiple scenes.
A common use case is having many GameObjects that rely on the same duplicate data, which
does not need to change at runtime. Rather than having this duplicate local data on each
GameObject, you can funnel it into a ScriptableObject. Then, each of the objects stores a
reference to the shared data asset, rather than copying the data itself. This is a benefit that
can provide significant performance improvements in projects with thousands of objects.
Create fields in the ScriptableObject to store your values or settings, then reference the
ScriptableObject in your MonoBehaviours.
In this example, a ScriptableObject called Inventory holds settings for various GameObjects.
Using fields from the ScriptableObject can prevent unnecessary duplication of data every time
you instantiate an object with that MonoBehaviour.
In software design, this is an optimization known as the flyweight pattern. Restructuring your
code in this way using ScriptableObjects avoids copying a lot of values and reduces your
memory footprint. Learn more about the flyweight pattern and many others, as well as design
principles in the e-book Level up your code with design patterns and SOLID.
Watch this Introduction to ScriptableObjects tutorial to see how ScriptableObjects can benefit
your project. Reference Unity documentation here as well as the technical guide Create
modular game architecture in Unity with ScriptableObjects.
Refactor any methods containing closures while using lambda expressions. See an example of
how to do that here.
Often in multithreaded programming, one CPU thread of execution, the main thread, creates
other threads to handle tasks. These additional worker threads then synchronize with the
main thread once their work completes.
In traditional multithreaded programming, threads are created and destroyed. In the C# job system, small jobs run on a pool of threads.
If you have a few tasks that run for a long time, this approach to multithreading works well.
However, it’s less efficient for a game application, which must typically process many short
tasks at 30–60 frames per second.
That’s why Unity uses a slightly different approach to multithreading called the C# job system.
Rather than generate many threads with a short lifetime, it breaks your work into smaller units
called jobs.
Timeline view in the Profiler shows jobs running on the worker threads.
These jobs go into a queue, which schedules them to run on a shared pool of worker threads.
JobHandles help you create dependencies, ensuring that the jobs run in the correct order.
One potential issue with multithreading is a race condition, which occurs when two threads
access a shared variable at the same time. To prevent this, Unity multithreading uses a safety
system to isolate the data a job needs to execute. The C# job system launches each job with
a copy of the job structure, eliminating race conditions.
— Change classes to be structs. A job is any struct that implements the IJob interface.
If you’re performing the same task on a large number of objects, you could also use
IJobParallelFor to run across multiple cores.
— Data passed into a job must be blittable. Remove reference types and pass only the
blittable data into the job as a copy.
— Because the work within each job remains isolated for safety, you send the results back
to the main thread using a NativeContainer. A NativeContainer from the Unity Collections
package provides a C# wrapper for native memory. Its subtypes (e.g., NativeArray,
NativeList, NativeHashMap, NativeQueue, etc.) work like their equivalent C# data
structures.
Refer to the documentation to see how you can optimize CPU performance in your own project
using the C# job system.
Burst allows Unity developers to continue using a subset of C# for convenience while
improving performance.
— Remove static variables. If you need to write to a list, consider using a NativeArray
decorated with the NativeDisableContainerSafetyRestriction attribute. This allows
parallel jobs to write to the NativeArray.
[BurstCompile]
public struct MyFirstJob : IJob
{
public NativeArray<float3> ToNormalize;
Here is an example Burst job that runs over an array of float3’s and normalizes the vectors. It
uses the Unity Mathematics package, as mentioned above.
Both the C# job system and the Burst compiler form part of Unity’s Data-Oriented Tech Stack
(DOTS). However, you can use them with both “classic” Unity GameObjects or the Entity
Component System.
Refer to the latest documentation and download the e-book Introduction to the Data-Oriented
Technology Stack for advanced Unity users to see how Burst can accelerate your workflow
when combined with the C# job system.
There are a few project settings that can affect your performance.
Switch to IL2CPP
We recommend switching the Scripting Backend from Mono to IL2CPP (Intermediate Language
to C++). Doing so will provide overall better runtime performance.
Be aware this does increase build times. Some developers prefer to use Mono locally for faster
iteration, then switch to IL2CPP for build machines and/or release candidates. Refer to the
Optimizing IL2CPP build times documentation to reduce your build times.
On PlayStation platforms where IL2CPP is the only option, locate the Player Settings >
Other Settings > IL2CPP optimization level settings. Use the less optimal options during
development to speed up build times. For profiling or final release, select Optimized
Compile, Remove Unused Code, Optimized link.
Using this option, Unity converts IL code from scripts and assemblies to C++ before creating a
native binary file (e.g., .exe, .apk, .xap) for your target platform.
Please refer to the documentation, which provides information on how to optimize build times.
Because lighting and effects are quite complex, we recommend that you review the render
pipeline documentation before attempting to optimize.
Before you begin lighting your scenes, you will need to choose one of the available render
pipelines. A render pipeline performs a series of operations that take the contents of a scene
to display them onscreen.
Unity provides three prebuilt render pipelines with different capabilities and performance
characteristics, or, you can create your own.
1. The Universal Render Pipeline (URP) is a prebuilt Scriptable Render Pipeline (SRP).
URP provides artist-friendly workflows to create optimized graphics across a range
of platforms, from mobile to high-end consoles and PCs. URP is the successor to the
Built-In Render Pipeline, providing graphics and rendering features unavailable with
the older pipeline. In order to maintain performance, it makes tradeoffs to reduce the
computational cost of lighting and shading. Choose URP if you want to reach the most
target platforms, including mobile and VR.
Get a complete overview of the capabilities in URP in the e-book Introduction to the
Universal Render Pipeline for advanced Unity creators.
2. The High Definition Render Pipeline (HDRP) is another prebuilt SRP designed for
cutting-edge, high-fidelity graphics. HDRP targets high-end hardware such as PC,
Xbox, and PlayStation.2 It’s the recommended render pipeline for creating photorealistic
realistic graphics with the highest level of realism in your game, with advanced lighting,
reflections, and shadows. HDRP uses physically based lighting and materials and
supports improved debugging tools.
Get a complete overview of the capabilities in HDRP in the e-book Lighting and
environments in the High Definition Render Pipeline.
3. The Built-in Render Pipeline is Unity’s older, general-purpose render pipeline with limited
customization.
Proportion of games released using the different pipelines available from Unity
2
HDRP is not currently supported on mobile platforms. See the Requirements and compatibility page for
more details.
However, while most Unity projects are now being built on URP or HDRP, the Built-In Render
Pipeline (see third option in the list) will remain an available option in Unity 6.
URP and HDRP work on top of the Scriptable Render Pipeline (SRP). This is a thin API layer that
lets you schedule and configure rendering commands using C# scripts. This flexibility allows
you to customize virtually every part of the pipeline. You can also create your own custom
render pipeline based on SRP.
See Render Pipelines in Unity for a more detailed comparison of the available pipelines.
— PlayStation 4: com.unity.render-pipelines.ps4
— PlayStation 5: com.unity.render-pipelines.ps5
— Xbox: com.unity.render-pipelines.gamecore
Materials, such as skin or foliage, can benefit from the advanced lighting and shading features preconfigured with the HDRP.
Forward
In forward rendering, the graphics card projects the geometry and splits it into vertices. Those
vertices are further broken down into fragments, or pixels, which render to screen to create
the final image.
The pipeline passes each object, one at a time, to the graphics API. Forward rendering comes
with a cost for each light. The more lights in your scene, the longer rendering will take.
The Built-in Pipeline’s forward renderer draws each light in a separate pass per object. If you
have multiple lights hitting the same GameObject, this can create significant overdraw, where
overlapping areas need to draw the same pixel more than once. Minimize the number of
realtime lights to reduce overdraw.
Rather than rendering one pass per light, the URP culls the lights per-object. This allows for
the lighting to be computed in one single pass, resulting in fewer draw calls compared to the
Built-In Render Pipeline’s forward renderer.
Forward +
Forward+ rendering improves upon standard Forward rendering by culling lights spatially
rather than per object. This increases the overall number of lights that can be utilized in
rendering a frame. In Deferred rendering it supports the Native RenderPass API, allowing
G-buffer and lighting passes to be combined into a single render pass.
Deferred shading
In deferred shading, lighting is not calculated per object.
Deferred shading instead postpones lighting calculation to a later stage. Deferred shading
uses two passes.
In the first pass, or the G-Buffer geometry pass, Unity renders the GameObjects. This pass
retrieves several types of geometric properties and stores them in a set of textures. G-buffer
textures can include:
— surface smoothness
— occlusion
In the second pass, or lighting pass, Unity renders the scene’s lighting based on the G-buffer.
Imagine iterating over each pixel and calculating the lighting information based on the buffer
instead of the individual objects. Thus, adding more non-shadow casting lights in deferred
shading does not incur the same performance hit as with forward rendering.
Though choosing a rendering path is not an optimization per se, it can affect how you optimize
your project. The other techniques and workflows in this section may vary depending on what
render pipeline and which rendering path you’ve chosen.
Begin each Shader Graph with a compatible master node, which determines the graph’s
output. Add nodes and operators with the visual interface, and construct the shader logic.
This Shader Graph then passes into the render pipeline’s backend. The final result is a
ShaderLab shader, functionally similar to one written in HLSL or Cg.
Optimizing a Shader Graph follows many of the same rules that apply to traditional HLSL/
Cg Shaders. The more processing your Shader Graph does, the more it will impact the
performance of your application.
If you are CPU-bound, optimizing your shaders won’t improve frame rate, but may still improve
your battery life for mobile platforms.
If you are GPU-bound, follow these guidelines for improving performance with Shader Graphs:
— Decimate your nodes: Remove unused nodes. Don’t change any defaults or connect
nodes unless those changes are necessary. Shader Graph compiles out any unused
features automatically.
When possible, bake values into textures. For example, instead of using a node to
brighten a texture, apply the extra brightness into the texture asset itself.
— Use a smaller data format: Switch to a smaller data structure when possible. Consider
using Vector2 instead of Vector3 if it does not impact your project. You can also reduce
precision if the situation allows (e.g. half instead of float).
— Reduce math operations: Shader operations run many times per second, so optimize
any math operators when possible. Try to blend results instead of creating a logical
branch. Use constants, and combine scalar values before applying vectors. Finally,
convert any properties that do not need to appear in the Inspector as in-line Nodes. All
of these incremental speedups can help your frame budget.
— Branch a preview: As your graph gets larger, it may become slower to compile. Simplify
your workflow with a separate, smaller branch just containing the operations you want
to preview at the moment, then iterate more quickly on this smaller branch until you
achieve the desired results.
If the branch is not connected to the master node, you can safely leave the preview
branch in your graph. Unity removes nodes that do not affect the final output during
compilation.
— Manually optimize: Even if you’re an experienced graphics programmer, you can still use
a Shader Graph to lay down some boilerplate code for a script-based shader. Select the
Shader Graph asset, then select Copy Shader from the context menu.
Create a new HLSL/Cg Shader and then paste in the copied Shader Graph. This is a one-
way operation, but it lets you squeeze additional performance with manual optimizations.
You can use the Shader compilation pragma directives to compile the shader differently for
target platforms. Then, use a shader keyword (or Shader Graph Keyword node) to create
shader variants with certain features enabled or disabled.
You can prevent shader variants from being included in your build if you know that they are
not required.
Parse the Editor.log for shader timing and size. Locate the lines that begin with “Compiled
shader” and “Compressed shader.”
— The shader expands into 482 variants due to #pragma multi_compile and shader_feature.
— Unity compresses the shader included in the game data to roughly the sum of the
compressed sizes: 0.14+0.12+0.20+0.15 = 0.61MB.
— At runtime, Unity keeps the compressed data in memory (0.61MB), while the data for
your currently used graphics API is uncompressed. For example, if your current API was
Metal, that would account for 2.56MB.
After a build, Project Auditor can parse the Editor.log to display a list of all shaders, shader
keywords, and shader variants compiled into a project. It can also analyze the Player.log after
the game is run. This shows you what variants the application actually compiled and used at
runtime.
Employ this information to build a scriptable shader stripping system and reduce the number
of variants. This can improve build times, build sizes, and runtime memory usage.
Read the Stripping scriptable shader variants blog post to see this process in detail.
— The Built-In Particle System can simulate thousands of particles on the CPU. You can
use C# scripts to define a system and its individual particles. Particle Systems can
interact with Unity’s underlying physics system and any colliders in your scene. Particle
Systems offer maximum compatibility and work with any of Unity’s supported build
platforms.
— The VFX Graph is the newer system that provides enhanced capabilities for creating
sophisticated visual effects in Unity, particularly for projects targeting high-end graphics
and performance. It moves calculations on the GPU using compute shaders and can
simulate millions of particles in large-scale visual effects. The workflow includes a highly
customizable graph view. Particles can also interact with the color and depth buffer.
Though it does not have access to the underlying physics system, a VFX Graph can
interact with complex assets, such as Point Caches, Vector Fields, and Signed Distance
Fields. VFX Graph works with both HDRP and URP and on platforms that support
compute shaders.
When selecting one of the two systems, keep device compatibility in mind. Most PCs and
consoles support compute shaders, but many mobile devices do not. If your target platform
does support compute shaders, Unity allows you to use both types of particle simulation in
your project.
Learn more about creating high-end visual effects with the e-book The definitive guide to
creating advanced visual effects in Unity.
If you are using Forward rendering with the Built-in Render Pipeline, Multisample Anti-aliasing
(MSAA) is available in the Quality settings. MSAA produces high-quality anti-aliasing, but it can
be expensive. The MSAA Sample Count from the drop-down menu (None, 2X, 4X, 8X) defines
how many samples the renderer uses to evaluate the effect.
If you are using Forward rendering with the URP or HDRP, you can enable MSAA on the Render
Pipeline Asset.
Alternatively, you can add anti-aliasing as a post-processing effect. This appears on the
Camera component under Anti-aliasing:
— Fast approximate anti-aliasing (FXAA) smooths edges on a per-pixel level. This is the
least resource-intensive anti-aliasing and slightly blurs the final image.
In HDRP, you can also use FXAA and SMAA with the Post Anti-aliasing setting on the Camera.
URP and HDRP also offer an additional option:
— Temporal anti-aliasing (TAA) smooths edges using frames from the history buffer. This
works more effectively than FXAA but requires motion vectors in order to work. TAA can
also improve ambient occlusion and volumetrics. It is generally higher quality than FXAA,
but costs more resources and can produce occasional ghosting artifacts.
Spatial-Temporal Post-Processing
Spatial-Temporal Post-Processing (STP) is designed to enhance visual quality across a wide
range of platforms like mobile, consoles, and PCs. STP is a spatio-temporal anti-aliasing
upscaler that works with both HDRP and URP render pipelines, offering high-quality content
scaling without the need for changes to existing content. This solution is particularly optimized
for GPU performance, ensuring faster rendering times and making it easier to achieve high
performance while maintaining visual quality.
— In the Inspector navigate to Quality > Upscaling Filter and select Spatial-Temporal Post-
Processing.
Bake lightmaps
The fastest option to create lighting is one that doesn’t need to be computed per-frame.
To do this, use Lightmapping to “bake” static lighting just once, instead of calculating it in
real-time.
Add dramatic lighting to your static geometry using Global Illumination (GI). Mark objects
with Contribute GI so you can store high-quality lighting in the form of lightmaps.
The process of generating a lightmapped environment takes longer than just placing a
light in the scene in Unity, but this:
— Looks better – GI can calculate realistic-looking direct and indirect lighting. The
lightmapper smooths and denoises the resulting map.
Baked shadows and lighting can then render without the same performance hit of real-
time lighting and shadows.
Complex scenes may require long bake times. If your hardware supports the Progressive
GPU Lightmapper, this option can dramatically speed up your lightmap generation, up to
tenfold in some cases.
Adjust the lightmapping settings (Windows > Rendering > Lighting Settings) and lightmap size to limit memory usage.
If using Type: Realtime is necessary in URP, avoid Every Frame if possible. Adjust the
Refresh Mode and Time Slicing settings to reduce the update rate. You can also control
the refresh with the Via Scripting option and render the probe from a custom script.
If using Type: Realtime is necessary in HDRP, use On Demand mode. You can also modify
the Frame Settings in Project Settings > HDRP Default Settings. Reduce the quality and
features under Realtime Reflection for improved performance.
APVs offer a range of features to enhance global illumination, particularly in dynamic and
large scenes. URP now supports per-vertex sampling for improved performance on lower-
end devices, while VFX particles benefit from indirect lighting baked into probe volumes.
Having multiple APVs is helpful for having more control over the probe density in the level.
APV data can be streamed from disk to CPU and GPU, optimizing lighting information for
large environments. Developers can bake and blend multiple lighting scenarios, allowing
real-time transitions like day/night cycles. The system also supports sky occlusion,
integrates with the Ray Intersector API for more efficient probe calculations, and offers
control over Light Probe sample density to reduce light leaking and speed up iterations.
The new C# baking API also enables independent baking of APV from lightmaps or
reflection probes.
To get started using APVs, check out the talk Efficient and impactful lighting with
Adaptive Probe Volumes from GDC 2023.
Disable shadows
Shadow casting can be disabled per MeshRenderer and light. Disable shadows whenever
possible to reduce draw calls.
You can also create fake shadows using a blurred texture applied to a simple mesh or quad
underneath your characters. Otherwise, you can create blob shadows with custom shaders.
In particular, avoid enabling shadows for point lights. Each point light with shadows
requires six shadow map passes per light – compare that with a single shadow map pass
for a spotlight. Consider replacing point lights with spotlights where dynamic shadows
are absolutely necessary. If you can avoid dynamic shadows, use a cubemap as a Light.
cookie with your point lights instead.
GPU Lightmapper
The GPU Lightmapper is production ready in Unity 6. It dramatically accelerates lighting
data generation by leveraging the GPU, offering significantly faster bake times compared
to traditional CPU lightmapping. It introduces a new light baking backend that simplifies
the codebase and delivers more predictable results. Additionally, the minimum GPU
requirement has been lowered to 2GB, and also includes a new API for moving light probe
positions at runtime, which is particularly useful for procedurally generated content,
alongside various quality-of-life improvements.
To optimize your graphics rendering, you’ll need to understand the limitations of your
target hardware and how to profile the GPU. Profiling helps you check and verify that the
optimizations you’re making are effective.
Use these best practices for reducing the rendering workload on the GPU.
See GFXBench for a great list of different industry-standard benchmarks for GPUs and
graphics cards. The website provides a good overview of the current GPUs available and how
they stack up against each other.
— CPU Main: Total time to process one frame (and update the Editor for all windows)
— CPU Render: Total time to render one frame of the Game view
— SetPass calls: The number of times Unity must switch shader passes to render the
GameObjects on-screen; each pass can introduce extra CPU overhead.
Note: In-Editor fps does not necessarily translate to build performance. We recommend
that you profile your build for the most accurate results. Frame time in milliseconds is a
more accurate metric than frames per second when benchmarking, as outlined in the “FPS:
A deceptive metric” section.
PC and console hardware can push a lot of draw calls, but the overhead of each call is still
high enough to warrant trying to reduce them. On mobile devices, draw call optimization is
vital. You can achieve this with draw call batching.
Draw call batching minimizes these state changes and reduces the CPU cost of rendering
objects. Unity can combine multiple objects into fewer batches using several techniques:
— SRP Batching: If you are using HDRP or URP, enable the SRP Batcher in your Pipeline
Asset under Advanced. When using compatible shaders, the SRP Batcher reduces the
GPU setup between draw calls and makes material data persistent in GPU Memory. This
can speed up your CPU rendering times significantly. Use fewer Shader Variants with a
minimal amount of Keywords to improve SRP batching. Consult this SRP documentation
to see how your project can take advantage of this rendering workflow.
— GPU instancing: If you have a large number of identical objects (e.g., buildings, trees,
grass, and so on with the same mesh and material), use GPU instancing. This technique
batches them using graphics hardware. To enable GPU Instancing, select your material
in the Project window, and, in the Inspector, check Enable Instancing.
— Static batching: For non-moving geometry, Unity can reduce draw calls for any meshes
sharing the same material. It is more efficient than dynamic batching, but it uses more
memory.
Mark all meshes that never move as Batching Static in the Inspector. Unity combines
all static meshes into one large mesh at build time. The StaticBatchingUtility also allows
you to create these static batches yourself at runtime (for example, after generating a
procedural level of non-moving parts).
— Dynamic Batching: For small meshes, Unity can group and transform vertices on the
CPU, then draw them all in one go. Note: Do not use this unless you have enough
low-poly meshes (no more than 300 vertices each and 900 total vertex attributes).
Otherwise, enabling it will waste CPU time looking for small meshes to batch.
— Use as few textures in a scene as possible. Fewer textures require fewer unique materials,
making them easier to batch. Additionally, use texture atlases wherever possible.
— Always bake lightmaps at the largest atlas size possible. Fewer lightmaps require fewer
material state changes, but keep an eye on the memory footprint.
— Keep an eye on the number of static and dynamic batch counts versus the total draw call
count by using the Profiler or the rendering stats during optimizations.
Please refer to the draw call batching documentation for more information.
Note: The Frame Debugger does not show individual draw calls or state changes. Only
native GPU profilers give you detailed draw call and timing information. However, the Frame
Debugger can still be very helpful in debugging pipeline problems or batching issues.
One advantage of the Unity Frame Debugger is that you can relate a draw call to a specific
GameObject in the scene. This makes it easier to investigate certain issues that may not be
possible in external frame debuggers.
For more information, read the Frame Debugger documentation, and see the section “Use
native profiling and debugging tools” for a list of platform-specific debugging tools.
If your game is limited by fill rate, this means that it’s trying to draw more pixels per frame than
the GPU can handle.
Drawing on top of the same pixel multiple times is called overdraw. Overdraw decreases fill
rate and costs extra memory bandwidth. The most common causes of overdraw are:
— Unoptimized particles
— Overlapping UI elements
While you want to minimize its effect, there is no one-size-fits-all approach to solving
overdraw problems. Begin by experimenting with the above factors to reduce their impact.
The Built-in Render Pipeline sorts GameObjects according to their Rendering Mode and
renderQueue. Each object’s shader places it in a render queue, which often determines its
draw order.
Each render queue may follow different rules for sorting before Unity actually draws the
objects to screen. For example, Unity sorts the Opaque Geometry queue front-to-back,
but the Transparent queue sorts back-to-front.
Objects rendering on top of one another create overdraw. If you are using the Built-in
Render Pipeline, you can visualize overdraw in the Scene view control bar. Switch the
draw mode to Overdraw.
Brighter pixels indicate objects drawing on top of one another; dark pixels mean less
overdraw.
The same scene in Overdraw view – overlapping geometry is often a source of overdraw
HDRP controls the render queue slightly differently. To calculate the order of the render
queue, the HDRP:
The resulting queue is a list of GameObjects that are first sorted by their material’s
Priority, then by their individual Mesh Renderer’s Priority. This page on Renderer and
Material Priority illustrates this in more detail.
To visualize transparency overdraw with HDRP, use the Render Pipeline Debug window
(Window > Render Pipeline > Render Pipeline Debug) to select TransparencyOverdraw.
To view transparency overdraw select Rendering, then from the Fullscreen Debug Mode
drop down select TransparencyOverdraw.
By using the
transparency overdraw
visualization, developers
can make informed
decisions on how to
optimize their scenes for
better performance.
The HDRP Render Pipeline Debug window can visualize overdraw from transparent materials.
When correcting overdraw, these diagnostic tools can offer a visual barometer of your
optimizations.
To begin, locate a frame with a high GPU load. Microsoft and Sony provide excellent tools
for analyzing your project’s performance on both the CPU and on the GPU. Use your
respective native profiler to break down the frame cost into its specific parts. This will be
your starting point to improve graphics performance.
The view was GPU-bound on a PlayStation®4 Pro at roughly 45 milliseconds per frame.
— Use Occlusion culling to remove objects hidden behind foreground objects and
reduce overdraw. Be aware this requires additional CPU processing, so use the
Profiler to ensure moving work from the GPU to CPU is beneficial.
— GPU instancing can also reduce your batches if you have many objects that share
the same mesh and material. Limiting the number of models in your scene can
improve performance. If it’s done artfully, you can build a complex scene without
making it look repetitive.
— The SRP Batcher can reduce the GPU setup between DrawCalls by batching Bind
and Draw GPU commands. To benefit from this SRP batching, use as many materials
as needed, but restrict them to a small number of compatible shaders (e.g., Lit and
Unlit shaders in URP and HDRP).
Enable this option in Player Settings > Other Settings to take advantage of the
PlayStation’s or Xbox’s multi-core processors. Graphics Jobs allows Unity to spread the
rendering work across multiple CPU cores, removing pressure from the render thread. See
Multithreaded Rendering and Graphics Jobs tutorial for details.
If you want to generate or modify vertex data on the GPU, a compute shader is often a
better choice than a geometry shader. Doing the work in a compute shader means that the
vertex shader that actually renders the geometry can be comparatively fast and simple.
Each SIMD has a maximum number of wavefronts that can be running at one time.
Wavefront occupancy refers to how many wavefronts are currently in use relative to the
maximum. This measures how well you are using the GPU’s potential. Console specific
performance analysis tools show wavefront occupancy in great detail.
In the above example, vertex shader wavefronts appear in green. Pixel shader wavefronts
appear in blue. On the bottom graph, many vertex shader wavefronts appear without
much pixel shader activity. This shows an underutilization of the GPU’s potential.
If you’re doing a lot of vertex shader work that doesn’t result in pixels, that may indicate
an inefficiency. While low wavefront occupancy is not necessarily bad, it’s a metric
to start optimizing your shaders and checking for other bottlenecks. For example, if
you have a stall due to memory or compute operations, increasing occupancy may
help performance. On the other hand, too many in-flight wavefronts can cause cache
thrashing and decrease performance.
For optimizing the behavior of transparent materials, refer to this page on Renderer and
Material Priority.
For example, during shadow map generation, the GPU performs depth-only rendering.
Very little pixel shader work happens at this point, and many wavefronts remain
unoccupied.
Async Compute can move compute shader work in parallel to the graphics queue.
If you can synchronize some compute shader work with the depth-only rendering, this
makes for a better overall use of the GPU. The unused wavefronts could help with Screen
Space Ambient Occlusion or any task that complements the current work.
In the above example, several optimizations shaved several milliseconds off the
shadow mapping, lighting pass, and atmospherics. The resulting frame cost allowed the
application to run at 30 fps on a PlayStation®4 Pro.
Culling
Occlusion culling disables GameObjects that are fully hidden (occluded) by other
GameObjects. This prevents the CPU and GPU from using time to render objects that will
never be seen by the Camera.
Culling happens per camera. It can have a large impact on performance, especially when
multiple cameras are enabled concurrently. Unity uses two types of culling, frustum culling and
occlusion culling.
Frustum culling is performed automatically on every Camera. It prevents GameObjects that are
outside of the View Frustum from being rendered, helping to optimize performance.
You can set per-layer culling distances manually via Camera.layerCullDistances. This allows
you to cull small GameObjects at a distance shorter than the default farClipPlane.
Organize GameObjects into Layers. Use the layerCullDistances array to assign each of the 32
layers a value less than the farClipPlane (or use 0 to default to the farClipPlane).
Unity culls by layer first, keeping GameObjects only on layers the Camera uses. Afterwards,
frustum culling removes any GameObjects outside the camera frustum. Frustum culling is
performed as a series of jobs to take advantage of available worker threads.
Each layer culling test is very fast (essentially just a bit mask operation). However, this cost
could still add up with a very large number of GameObjects. If this becomes a problem for
your project, you may need to implement some system to divide your world into “sectors” and
disable sectors that are outside the Camera frustum in order to relieve some of the pressure
on Unity’s layer/frustum culling system.
Occlusion culling removes any GameObjects from the Game view if the Camera cannot see
them. Use this feature to prevent rendering of objects hidden behind other objects since these
can still render and cost resources. For example, rendering another room is unnecessary if a
door is closed and the Camera cannot see into the room.
Enabling occlusion culling can significantly increase performance but can also require more
disk space, CPU time, and RAM. Unity bakes the occlusion data during the build and then
needs to load it from disk to RAM while loading a scene.
While frustum culling outside the camera view is automatic, occlusion culling is a baked
process. Simply mark your objects as Static.Occluders or Occludees, then bake through the
Window > Rendering > Occlusion Culling dialog.
Check out the Working with Occlusion Culling tutorial for more information.
Dynamic resolution
Allow Dynamic Resolution is a Camera setting that allows you to dynamically scale individual
render targets to reduce workload on the GPU. In cases where the application’s frame rate
reduces, you can gradually scale down the resolution to maintain a consistent frame rate.
Unity triggers this scaling if performance data suggests that the frame rate is about to
decrease as a result of being GPU-bound. You can also preemptively trigger this scaling
manually with script. This is useful if you are approaching a GPU-intensive section of the
application. If scaled gradually, dynamic resolution can be almost unnoticeable.
Refer to the dynamic resolution manual page for additional information and a list of supported
platforms.
Camera Stacking in URP: the gun and background render with different camera settings
You could use Camera Stacking in URP to render more than one camera view. However, there
is still significant culling and rendering done for each camera. Each camera incurs some
overhead, whether it’s doing meaningful work or not. Only use Camera components required
for rendering. On mobile platforms, each active camera can use up to 1 ms of CPU time, even
when rendering nothing.
The Unity CPU Profiler shows the main thread in the timeline view and indicates that there are multiple Cameras. Unity performs culling for
each Camera.
— Associate it with an Event and inject it into a specific timing of the render loop
A Render Objects Renderer Feature in URP combines multiple Layers into one rendered view.
Using Custom Pass Volumes in HDRP can help you avoid using extra Cameras and the
additional overhead associated with them. Custom passes have extra flexibility in how they
can interact with shaders. You can also extend the Custom Pass class with C#.
See the Working with LODs course on Unity Learn for more detail.
Post-processing tends not to fluctuate much at runtime. Once you’ve determined your Volume
Overrides, allot your post effects a static portion of your total frame budget.
The garden environment from the URP 3D Sample running with the GPU Resident Drawer
The system can be enabled within the HDRP or URP Render Pipeline Asset. Select Instanced
Drawing to enable it. you can also select if you want to enable it in Play mode only or in Edit
mode as well.
When you enable GPU Resident Drawer, games that are CPU bound due to a high number
of draw calls can improve in performance as the amount of draw calls is reduced. The
improvements you will see are dependent on the scale of your scenes and the amount of
instancing you utilize. The more instanceable objects you render, the larger the benefits you
will see.
Select ‘Instanced Drawing’ from the GPU Resident Drawer drop down
Upon selecting the Instanced Drawing option you may get a message in the UI warning you
that the “BatchRenderGroup Variants setting must be ‘Keep All’”. Adjusting this option in the
graphics settings completes the setup for the GPU Resident Drawer.
To activate GPU occlusion culling locate the render pipeline asset and toggle the GPU
Occlusion check box.
To enable the GPU Occlusion Culling in Debug Mode in Unity 6, go to Window > Rendering
> Occlusion Culling. Here, you can visualize how objects are being culled by toggling various
visualization options.
It offers a threading mode that’s supported on multiple desktop and console platforms. The
primary improvement comes from reducing unnecessary synchronization between the main
thread (responsible for general game logic and orchestration) and native graphics job threads
(responsible for rendering tasks).
The performance improvements from this new threading mode scale with the number of draw
calls submitted in each frame. More draw calls benefit more from these optimizations, complex
scenes with many objects and textures can see significant performance enhancements.
Splits Graphics Jobs is available when using DX12 for Windows and also supported by Vulkan
Players.
To enable it go to Project Settings > Player > Other Settings > Graphics Jobs Mode > Split.
We recommend that you always use Splits Graphics Jobs instead of the legacy and native
modes that were available previously.
User interface
Unity offers two UI systems: The older Unity UI and the new UI Toolkit. UI Toolkit is intended to
become the recommended UI system. It’s tailored for maximum performance and reusability,
with workflows and authoring tools inspired by standard web technologies, meaning UI
designers and artists will find it familiar if they already have experience designing web pages.
However, as of Unity 6, UI Toolkit does not have some features that Unity UI and Immediate
Mode GUI (IMGUI) support. Unity UI and IMGUI are more appropriate for certain use cases,
and are required to support legacy projects. See the Comparison of UI systems in Unity for
more information.
Take advantage of UGUI’s ability to support multiple Canvases. Divide UI elements based on
how frequently they need to be refreshed. Keep static UI elements on a separate Canvas, and
dynamic elements that update at the same time on smaller sub-canvases.
Ensure that all UI elements within each Canvas have the same Z value, materials, and textures.
If you only need to turn off the Canvas’s visibility, disable the Canvas component rather
than the whole GameObject. This can prevent your game from having to rebuild meshes and
vertices when you reenable it.
While this is not really a raycaster (despite the name), there is some expense for each
intersection check. Minimize the number of Graphic Raycasters by not adding them to non-
interactive UI Canvases.
In addition, disable Raycast Target on all UI text and images that don’t need it. If the UI is
complex with many elements, all of these small changes can reduce unnecessary computation.
If you do need to use Layout Groups (Horizontal, Vertical, Grid) for your dynamic elements,
avoid nesting them to improve performance.
Consider lowering the Application.targetFrameRate during a fullscreen UI, since you should not
need to update at 60 fps.
Consider using Screen Space – Overlay for your Canvas RenderMode if possible, as that does
not require a camera.
When using World Space Render Mode, make sure to fill in the Event Camera.
UI Toolkit offers improved performance over Unity UI, is tailored for maximum performance
and reusability, and provides workflows and authoring tools inspired by standard web
technologies. One of its key benefits is that it uses a highly optimized rendering pipeline that is
specifically designed for UI elements.
Here are some general recommendations for optimizing performance of your UI with UI Toolkit:
Although audio is not normally a performance bottleneck, you can still optimize to save
memory, disk space, or CPU usage.
If you use any compressed format (such as MP3 or Vorbis), then Unity will decompress it and
recompress it during build time. This results in two lossy passes, degrading the final quality.
Import settings on Audio Clips can save runtime memory and CPU performance:
— Enable the Force To Mono option on stereo audio files if they do not require stereo
sound; this saves runtime memory and disk space.
Spatial Audio Sources should use AudioClips which are either authored in mono or
have Force To Mono enabled in their import settings. If you use stereo sounds in spatial
Audio Sources, the audio data will take up twice the disk space and memory; Unity must
convert the sound to mono during the audio mixing process, which also requires extra
CPU processing time.
— Preload Audio Data ensures that Unity will load any referenced AudioClips before
initializing the scene. However, this may increase scene loading times.
— If your sound clip is not needed immediately, load it asynchronously. Check Load in
Background. This loads the sound at a delayed time on a separate thread, without
blocking the main thread.
— Set the Sample Rate Setting to Optimize Sample Rate or Override Sample Rate.
For mobile platforms, 22050 Hz should be sufficient. Use 44100Hz (i.e. CD-quality)
sparingly. 48000Hz is excessive.
For mobile platforms, use Vorbis for most sounds (or MP3 for sounds not intended to
loop). Use ADPCM for short, frequently used sounds (e .g ., footsteps, gunshots).
For PC and Xbox®, use the Microsoft XMA codec instead of Vorbis or MP3. Microsoft
recommends a compression ratio between 8:1 and 15:1.
For Playstation, use the ATRAC9 format. This has less CPU overhead than Vorbis or MP3.
— The SFX Reverb Effect is one of the most expensive audio effects in the AudioMixer. Adding
a mixer group with SFX Reverb (and a mixer group sending to it) increases CPU cost.
This happens even if there is no AudioSource actually sending a signal to the groups.
Unity’s audio systetm doesn’t distinguish if it’s getting null signals or not.
Adding a Reverb group and a group sending to it is expensive, even if no AudioSource writes to it.
— Avoid parents with a single child group. Whenever possible, combine the two mixer
groups into one.
Physics can create intricate gameplay, but this comes with a performance cost. When you
know these costs, you can tweak the simulation to manage them appropriately. These tips can
help you stay within your target frame rate and create smooth playback with Unity’s built-in 3D
physics, which is an integration of the NVIDIA PhysX engine.
Simplify colliders
Mesh colliders can be expensive. Substitute more complex mesh colliders with primitive or
simplified mesh colliders to approximate the original shape.
Make sure that you edit your Physics settings (Project Settings > Physics) as well. Simplify
your Layer Collision Matrix wherever possible.
The Fixed Timestep field defines the time delta used by each physics step. For example, the
default value of 0.02 seconds (20 ms) is equivalent to 50 fps, or 50 Hz.
The default Fixed Timestep in the Project Settings is 0.02 seconds (50 frames per second).
Because each frame in Unity takes a variable amount of time, it is not perfectly synced
with the physics simulation. The engine counts up to the next physics time step. If a frame
runs slightly slower or faster, Unity uses the elapsed time to know when to run the physics
simulation at the proper time step.
In the event that a frame takes a long time to prepare, this can lead to performance issues. For
example, if your game experiences a spike (e.g., instantiating many GameObjects or loading
a file from disk), the frame could take 40 ms or more to run. With the default 20 ms Fixed
Timestep, this would cause two physics simulations to run on the following frame in order to
“catch up” with the variable time step.
Extra physics simulations, in turn, add more time to process the frame. On lower-end
platforms, this potentially leads to a downward spiral of performance.
A subsequent frame taking longer to prepare makes the backlog of physics simulations longer
as well. This leads to even slower frames and even more simulations to run per frame. The
result is worse and worse performance.
Eventually the time between physics updates could exceed the Maximum Allowed Timestep.
After this cutoff, Unity starts dropping physics updates, and the game stutters.
— Reduce the simulation frequency. For lower-end platforms, increase the Fixed Timestep
to slightly more than your target frame rate. For example, use 0.035 seconds for 30ps on
mobile. This could help prevent that downward performance spiral.
— Decrease the Maximum Allowed Timestep. Using a smaller value (like 0.1 s) sacrifices
some physics simulation accuracy, but also limits how many physics updates can happen
in one frame. Experiment with values to find something that works for your project’s
requirements.
— Simulate the physics step manually if necessary by choosing the SimulationMode during
the Update phase of the frame. This allows you to take control when to run the physics
step. Pass Time.deltaTime to Physics.Simulate in order to keep the physics in sync with
the simulation time.
This approach can cause instabilities in the physics simulation in scenes with complex
physics or highly variable frame times, so use it with caution.
A MeshCollider has several CookingOptions to help you validate the mesh for physics. If you
are certain that your mesh does not need these checks, you can disable them to speed up
your cook time.
Also, if you are targeting PC, make sure you keep Use Fast Midphase enabled. This switches
to a faster algorithm from PhysX 4.1 during the mid-phase of the simulation (which helps
narrow down a small set of potentially intersecting triangles for physics queries).
Use Physics.BakeMesh
If you are generating meshes procedurally during gameplay, you can create a Mesh Collider
at runtime. Adding a MeshCollider component directly to the mesh, however, cooks/bakes the
physics on the main thread. This can consume significant CPU time.
Use Physics.BakeMesh to prepare a mesh for use with a MeshCollider and save the
baked data with the mesh itself. A new MeshCollider referencing this mesh will reuse this
prebaked data (rather than baking the mesh again). This can help reduce Scene load time or
instantiation time later.
To optimize performance, you can offload mesh cooking to another thread with the C# job
system. Refer to this example for details on how to bake meshes across multiple threads.
— The broad phase, which collects potential collisions using a sweep and prune algorithm
— The narrow phase, where the engine actually computes the collisions
The broad phase default setting of Sweep and Prune BroadPhase (Edit > Project Settings >
Physics > BroadPhase Type) can generate false positives for worlds that are generally flat
and have many colliders.
If your scene is large and mostly flat, avoid this issue and switch to Automatic Box Pruning or
Multibox Pruning Broadphase. These options divide the world into a grid, where each grid cell
performs sweep-and-prune.
Multibox Pruning Broadphase allows you to specify the world boundaries and the number of
grid cells manually, while Automatic Box Pruning calculates that for you.
This overrides the Physics.defaultSolverIterations, which can also be found in Edit > Project
Settings > Physics > Default Solver Iterations.
To optimize your physics simulations, set a relatively low value in the project’s
defaultSolveIterations. Then apply higher custom Rigidbody.solverIterations values to the
individual instances that need more detail.
— Use manual syncing: For better performance, manually synchronize Transforms with
Physics.SyncTransforms() before calls that require the latest Transform data. This
approach is more efficient than enabling autoSyncTransforms globally.
The general recommendation is to always enable Reuse Collision Callbacks for performance
benefits. You should only disable this feature for legacy projects where the code relies on
individual Collision class instances, making it impractical to store individual fields.
In the Unity Console, there is a single collision instance on Collision Entered and Collision Stay.
Note that you can move a static collider, contrary to the term “static.” To do so, simply modify
the position of the physics body. Accumulate the positional changes and sync before the
physics update. You don’t need to add a Rigidbody component to the static collider just to
move it.
However, if you want the static collider to interact with other physics bodies in a more complex
way, give it a kinematic Rigidbody. Use Rigidbody.position and Rigidbody.rotation to move it
instead of accessing the Transform component. This guarantees more predictable behavior
from the physics engine.
Note that Physics queries that return multiple colliders as an array, like OverlapSphere or
OverlapBox, need to allocate those objects on the managed heap. This means that the
garbage collector eventually needs to collect the allocated objects, which can decrease
performance if it happens at the wrong time.
To reduce this overhead, use the NonAlloc versions of those queries. For example,
if you are using OverlapSphere to collect all potential colliders around a point, use
OverlapSphereNonAlloc instead.
This allows you to pass in an array of colliders (the results parameter) to act as a buffer.
The NonAlloc method works without generating garbage. Otherwise, it functions like the
corresponding allocating method.
Note that you need to define a results buffer of sufficient size when using a NonAlloc method.
The buffer does not grow if it runs out of space.
2D Physics
Note that the above advice does not apply to 2D physics queries, because in Unity’s 2D
physics system, methods do not have a “NonAlloc” suffix. Instead, all 2D physics methods,
including those that return multiple results, provide overloads that accept arrays or lists. For
instance, while the 3D physics system has methods like RaycastNonAlloc, the 2D equivalent
simply uses an overloaded version of Raycast that can take an array or List<T> as a
parameter, such as:
By using overloads, you can perform non-allocating queries in the 2D physics system without
needing specialized NonAlloc methods.
Use RaycastCommand to batch the query using the C# Job System. This offloads the work
from the main thread so that the raycasts can happen asynchronously and in parallel.
The Physics Debugger helps you visualize how your physics objects can interact with one another.
The following tips will help you when working with animation in Unity. For a comprehensive
guide through the animation system, download the free e-book The definitive guide to
animation in Unity.
The current animation system is optimized for animation blending and more complex setups. It
has temporary buffers used for blending, and there is additional copying of the sampled curve
and other data.
For static animations, especially the ones that have a lot of duplicates, such as crowds, you
can try implementing baked animations. There are two common techniques: Vertex texture
animations (including normal and tangent buffers) and a technique that uses baked bone
matrices for manual skinning in a vertex shader.
Learn more about baking animations in this Unity Learn tutorial and in this tutorial by Llam
Academy.
Also, if possible, consider not using the animation system at all. Create easing functions or
use a third-party tweening library where possible (e.g., DOTween). These can achieve very
natural-looking interpolation with mathematical expressions.
Note: This does not apply to constant curves (curves that have the same value for the
length of the animation clip). Constant curves are optimized, and these are less expensive
than normal curves.
Optimize workflow
Other optimizations are possible at the scene level:
— Implement a small AI Layer to control the Animator. You can make it provide simple
callbacks for OnStateChange, OnTransitionBegin, and other events.
— Use State Tags to easily match your AI state machine to the Unity state machine.
— Use additional curves to mark up your animations, for example in conjunction with target
matching.
Building an application in Unity is a huge endeavor that often involves many developers. Make
sure that your project is set up optimally for your team.
Project files are stored in a shared database called a repository, or “repo.” You backup your
project at regular intervals to the repo, and if something goes wrong, you can revert back to
an earlier version of the project.
With a VCS, you can make multiple individual changes and commit them as a single group for
versioning. This commit sits as a point on the timeline of your project, so that if you need to
revert back to a previous version, everything from that commit is undone and restored to the
state it was at the time. You can review and modify each change grouped within a commit or
undo the commit entirely.
With access to the project’s entire history, it’s easier to identify which changes introduced
bugs, restore previously removed features, and easily document changes between your game
or product releases.
What’s more, because version control is typically stored in the cloud or on a distributed
server, it supports your development team’s collaboration from wherever they’re working – an
increasingly important benefit as remote work becomes commonplace.
To help with version control merges, make sure your Editor settings have Asset Serialization
Mode set to Force Text. This is less space efficient but makes Unity store scene files in a text-
based format.
If you’re using an external version control system (such as Git) in the Version Control settings,
verify that the Mode is set to Visible Meta Files.
Unity also has a built-in YAML (a human-readable, data-serialization language) tool specifically
for merging scenes and Prefabs. For more information, see Smart merge in the Unity
documentation.
(UVCS) is a flexible version control system with unique interfaces to support programmers
and artists alike. It excels at handling large repos and binary files, and as both a file-based and
changeset-based solution, it gives you the capability to download only the specific files you’re
working on, rather than the entire project build.
There are three ways to access UVCS: via multiple applications and repositories through the
UVCS desktop client, by adding it to your projects through the Unity Hub, or accessing the
repository on Unity cloud via your web browser.
— Work knowing that your art assets are securely backed up.
Additionally, UVCS helps you centralize your development with excellent visualization tools.
Artists especially will appreciate the user-friendly workflows that encourage tighter integration
between development and art teams with the Gluon application, which makes it easier to
see and manage just the files they need without dealing with the entire project repository
complexity. Besides offering a simplified workflow it also offers tooling that makes it easy to
see visual differences in asset versions and easier to contribute to a unified version control
environment.
To get started with Unity Version Control, check out the e-book Version control and project
organization best practices.
Note that at runtime, your project can load scenes additively using SceneManager.
LoadSceneAsync passing the LoadSceneMode.Additive parameter mode.
You can download many more e-books for advanced Unity developers and creators from the
Unity best practices hub. Choose from over 30 guides, created by industry experts, and Unity
engineers and technical artists, that provide best practices for game development and will
help you develop efficiently with Unity’s toolsets and systems.
You’ll also find tips, best practices, and news on the Unity Blog and Unity community forums,
as well as through Unity Learn and the #unitytips hashtag.