0% found this document useful (0 votes)

3 views19 pages

Tao 等 - 2025 - GS-Cache a GS-Cache Inference Framework for Large

The GS-Cache framework addresses the challenges of real-time rendering for large-scale 3D Gaussian Splatting (3DGS) models, particularly in virtual reality applications. By implementing a cache-centric pipeline and an elastic multi-GPU scheduler, GS-Cache significantly improves rendering performance, achieving up to 5.35x speed enhancement and reducing GPU memory usage by 42%. This framework enables high-quality, immersive VR experiences with stable frame rates, meeting the demanding requirements for binocular 2K rendering.

Uploaded by

ximenyuejun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views19 pages

Tao 等 - 2025 - GS-Cache a GS-Cache Inference Framework for Large

Uploaded by

ximenyuejun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

GS-Cache: A GS-Cache Inference Framework for Large-scale Gaussian Splatting

Models

Miao Tao1* , Yuanzhen Zhou1* , Haoran Xu1* , Zeyu He1 , Zhenyu Yang1 , Yuchang Zhang1 , Zhongling Su1 ,
Linning Xu1 , Zhenxiang Ma1 , Rong Fu1 , Hengjie Li1 , Xingcheng Zhang1 , and Jidong Zhai2
1 Shanghai Artificial Intelligence Laboratory, Shanghai, China
arXiv:2502.14938v1 [cs.CV] 20 Feb 2025

2 Tsinghua University, Beijing, China

Abstract computation and memory intensity of real-time rendering in

Rendering large-scale 3D Gaussian Splatting(3DGS) model large-scale scenes increases significantly with the expansion
faces significant challenges in achieving real-time, high- of scene scale, and general solutions usually involve a scaling
fidelity performance on consumer-grade devices. Fully re- up [155] of computing resources. The structured Gaussian
alizing the potential of 3DGS in applications such as virtual derivation method achieves high-quality reconstruction and
reality (VR) requires addressing critical system-level chal- reduces the difficulty of real-time rendering in large-scale
lenges to support real-time, immersive experiences. We pro- scenes to a certain extent from the model structure, which
pose GS-Cache, an end-to-end framework that seamlessly makes it one of the most promising methods for achieving
integrates 3DGS’s advanced representation with a highly opti- photo-realistic rendering of large-scale scenes for VR. Be-
mized rendering system. GS-Cache introduces a cache-centric cause the demands for quality and performance in VR render-
pipeline to eliminate redundant computations, an efficiency- ing are significantly higher than in other applications.
aware scheduler for elastic multi-GPU rendering, and opti- The current 3DGS rendering pipeline, which renders im-
mized CUDA kernels to overcome computational bottlenecks. ages for two eyes alternatively, is insufficient to support the
This synergy between 3DGS and system design enables GS- frame rate required by the immersive VR experience (The
Cache to achieve up to 5.35x performance improvement, 35% minimum requirement of 72 FPS per eye means at least 144
latency reduction, and 42% lower GPU memory usage, sup- FPS in total. In the following text, FPS refers to the frame
porting 2K binocular rendering at over 120 FPS with high rate for both eyes). We conducted experiments on a structured
visual quality. By bridging the gap between 3DGS’s repre- 3DGS model OctreeGS [118], tracing the FPS and analyz-
sentation power and the demands of VR systems, GS-Cache ing the inference time overhead, as shown in Figure 1. FPS
establishes a scalable and efficient framework for real-time drops as the perspective is elevated. A significant portion
neural rendering in immersive environments. of the inference time is spent in the derivation stage, which
represents a major overhead. Just as large language models
use a KV cache [156] to accelerate inference, we considered
1 Introduction using a cache to speed up 3D model inference. Rendering is
usually a continuous process with a lot of overlap between
Real-time rendering of high-quality, large-scale 3D scenes is consecutive frames. The overlapping areas reuse data from
a resource-intensive task. The rendering of 3D scenes plays a previous frames, resulting in little color differences. A recent
crucial role in numerous fields, including virtual reality (VR), paper [139] mentioned that slight color variations are difficult
augmented reality (AR), and the metaverse. As novel methods to discern for the human eye. We design an elastic rendering
for 3D reconstruction and rendering, neural radiance fields framework with a dynamic cache that stores previous data to
(NeRF) [102] and 3DGS [66] are capable of rendering incred- accelerate rendering.
ibly photo-realistic and detailed images, supporting a variety
In this paper, we propose the GS-Cache framework, a com-
of applications with high visual perception requirements. As
putation framework for large-scale Gaussian splatting ren-
the demands for finer scene quality and larger scene scale
dering, where the scene contains an area that reaches the
increase, new methods such as structured Gaussian derivation
city-level scale of several square kilometers, to achieve the
methods [97, 118] continue to emerge. 3DGS methods have
real-time rendering frame rate requirements for immersive
enabled the reconstruction of large-scale scenes, achieving
VR experience in binocular 2K resolution head-mounted dis-
the city-level 3DGS model training [68, 94, 96]. However, the
plays (HMD), as shown in Figure 5. The middle part of this
* denotes equal contribution. architecture diagram represents the main structure of the en-

1
makes it difficult to support the real-time rendering frame rate
requirements for immersive VR experiences in large-scale
scenes. GS-Cache framework provides a new solution from
the perspective of computing systems compatible with various
rendering pipelines based on Gaussian derivation strategies.
The optimized computing pipeline eliminates the computing
redundancy, performs effective computing reuse for immer-
sive VR experience, and flexibly schedules GPU computing
resources during the rendering process to ensure stable and
high rendering frame rates and optimize the energy efficiency
of consumer-grade GPU resources. It accelerates main com-
Figure 1: FPS of the basic rending pipeline. As the viewpoint puting bottlenecks in the pipeline through dedicated CUDA
shifts from the ground to a higher altitude and farther away, kernels, further improving the performance of VR rendering.
the height increases, the scene becomes larger, and the FPS Our main contributions include:
decreases accordingly.
• A cache-centric computation de-redundancy rendering
pipeline that effectively eliminates redundancy in stereo
tire framework, the right side shows its elastic parallel sched- continuous rendering, enabling dynamic cache depth that
uler structure, and the left side illustrates the cache-centric balances performance and quality.
rendering pipeline structure. The elastic parallel scheduler
schedules GPU resources dynamically, which can steady the • A multi-GPU elastic parallel rendering scheduler that
FPS and avoid resource waste. For structured 3DGS mod- dynamically allocates consumer-grade GPU resources,
els, we transform the original pipeline into the cache-centric ensuring stable and high rendering frame rates while
pipeline, which aims to improve rendering speed based on the enhancing energy efficiency.
principles of de-redundancy and reuse. Additionally, aiming
at the bottleneck stages in general computing patterns of the • An end-to-end rendering framework designed for immer-
structured Gaussian derivation rendering pipeline, we intro- sive VR experiences, the first holistic system that meets
duce some dedicated CUDA [46] kernels for further accelera- binocular 2K photo-realistic quality rendering require-
tion, which enhance the frame rate performance of real-time ments of 72 FPS for aerial views and 120 FPS for street
rendering during long-time rendering. views in city-level scenes with dedicated efficient CUDA
implementation.
The process of rendering a 3D reconstruction scene in-
volves inference and transformation of the learned 3D spatial
features, which makes the conventional computing frame- 2 Related Work
works that focus on one-dimensional or two-dimensional fea-
tures such as text and images weak in related tasks, such Our work focuses primarily on the real-time and photo-
as Pytorch [111], Tensorflow [3], JAX [21], etc. These deep realistic rendering of large-scale Gaussian splatting scenes,
learning frameworks are versatile and scalable, enough to encompassing city-level scenes with several square kilome-
implement the basic computing pipeline of neural rendering ters. Although novel view synthesis based on neural rendering
methods such as NeRF and 3DGS. Still, it is challenging has made significant achievements in various applications in
to achieve ease of use for further development in rendering recent years, there remains a gap in meeting the demands of
applications, and there is a lack of dedicated operators to sup- rendering performance, quality fidelity, and computational
port sparse computing in high-dimensional space, resulting efficiency required for VR rendering. We provide a brief
in the computing speed of the rendering pipeline being un- overview of the most relevant works, focusing on real-time
able to achieve real-time rendering. A series of dedicated photo-realistic rendering, large-scale novel view synthesis,
frameworks for neural rendering, such as NeRFStudio [132] and rendering framework optimizations.
and Kaolin-Wisp [130], have improved the ease of use for Real-Time Photo-realistic Rendering VR rendering is
model structure experimental research through modulariza- computationally expensive, requiring high-speed and high-
tion; and dedicated operator libraries for sparse computing, quality real-time rendering, which may be hindered by quality
such as Nerfacc [92], have improved the overall rendering degradation and latency overhead in the general rendering
speed by accelerating some stages in the NeRF computing pipeline [126]. To achieve high-fidelity rendering with mini-
pipeline. These works have built a strong community influ- mal latency under relatively low computation resources, var-
ence and quickly promoted related work on neural render- ious optimization methods have been proposed. Foveated
ing such as NeRF and 3DGS, and expanded the applications rendering is a rendering accelerated method, the pioneer
based on neural rendering. However, the rendering speed still work [48] providing a foundational theory and approach,

2
the subsequent works [85, 100, 137] etc., exploring differ- GS [68] designed a hierarchical structure for multi-resolution
ent enhancements and applications. Leveraging eye-tracking representation to improve rendering speed. Other large-scale
technology, foveated rendering tends to allocate more compu- scene reconstruction and rendering works [96, 97, 118, 149]
tational resources in rendering the focus area of the images have also adopted LoD to accelerate the rendering pipeline.
while less the periphery area [141]. To speed up the neural ren- Rendering Framework Optimization In large-scale novel
dering like NeRF, in order to fulfill requirements for real-time view synthesis and city-level scene rendering, the stability of
rendering, including VR rendering, some works have shifted high-speed rendering frame rates remains an intractable prob-
from pure implicit neural representation towards hybrid or lem due to variations such as viewpoint and field of view
explicit primitive-based neural representations and hardware- (FOV), as well as the limitations of computational resources.
based acceleration [25, 56, 133]. VR-NeRF [149] achieves However, little research has focused on optimizations for
high-quality VR rendering using multiple GPUs for parallel large-scale VR rendering from the perspective of a compu-
computation, RT-NeRF [90] realize real-time VR rendering tation system, and most existing methods concentrate pri-
both on cloud and edge devices through efficient pipeline and marily on mesh-based rendering rather than neural render-
dedicated hardware accelerator. Re-ReND [119] presents a ing pipelines. MeshReduce [65] optimizes communication
low resource consumption real-time NeRF rendering method strategy and efficiently converts the scene geometry into the
available on resource-constrained devices. [117, 151, 152] dis- meshes without restraints from computation and memory, yet
till a pretrained NeRF into a sparse structure, enhancing the the stability of rendering frame rates is still difficult to main-
real-time rendering performance. Different from the afore- tain. RT-NeRF [90] employs a hybrid sparse encoding method
mentioned methods, to speed up neural rendering like 3DGS, and proposes a NeRF-based storage optimization besides
another strategy for rendering acceleration involves model its dedicated hardware system. Post0-VR [143] leverages
pruning and structuring for redundancy removal and effective data similarities to accelerate rendering by eliminating redun-
spatial representation. Methods include [?, 88, 95] pruning dant computations and merging common visual effects into
Gaussians and reducing model parameters after reconstruc- the standard rendering pipeline systematically. [99] utilizes
tion to accelerate the rendering pipeline. Scaffold-GS [97] shared memory and data reuse to enhance the performance of
organizes Gaussians using a structured sparse voxel grid and foveated rendering.
attaches learnable features to each voxel center as an anchor, Our work introduces a novel end-to-end rendering frame-
Octree-GS [118] further employs a structured octree grid for work for large 3DGS models. Optimizations are applied by an
anchors placement. innovative GPU scheduling method, a cache-centric rendering
Large 3D Model Inference Neural reconstruction and pipeline specifically tailored for Gaussian-based rendering,
rendering are also attributed to Novel View Synthesis. In and dedicated CUDA kernels to stabilize high-speed render-
large-scale scenes, it has been a long-standing problem in ing across immersive VR experiences.
research and engineering. First of all, the fidelity of large-
scale rendering is directly contingent upon the quality of
the underlying 3D representation models, particularly when
3 Rendering Pipeline and Framework Design
reconstructed from real-world scenes. Large-scale scene re-
GS-Cache is an innovative and holistic rendering framework
construction primarily utilizes a divide-and-conquer strategy
designed to support the real-time rendering of large-scale
through scene decomposition methods to expand the capa-
3D scene (3DGS) models at the city level. It enables users
bilities of the model [131, 136], while Zip-NeRF [17] and
to roam in aerial or street views in binocular 2K resolution,
Grid-NeRF [150] better refined the effectiveness and perfor-
achieving an average frame rate exceeding 72 FPS. Given
mance of representation for the large-scale scene. Except for
the challenges associated with the real-time photo-realistic
the NeRF-based methods [110] extracts semantic information
rendering of large-scale 3DGS models, particularly on VR
from street-view images and employs panoramic texture map-
applications, we have developed a scheduling framework that
ping method in large-scale scenes novel view synthesis for
supports elastic parallel rendering. Aiming at the Gaussian
realism reproduction. To ensure that novel view synthesis for
derivation rendering pipeline patterns, we also propose an ef-
VR real-time rendering maintains a stable frame rate under
ficient cache-centric rendering pipeline with a dynamic cache
large-scale scenes, an effective method is the Level of Detail
strategy that maintains rendering quality.
(LoD) strategy. Guided by heuristic rules or specific resource
allocation settings, LoD dynamically adjusts the level of de-
tail layers rendered in real-time [98]. [129] first introduced 3.1 Rendering Patterns and Pipeline Bottle-
the concept of LoD into neural radiance fields and neural necks
signed distance, Mip-NeRF [16] and Variable Bitrate Neural
Fields [128] applying it in the context of multi-scale repre- 3DGS represents the structure and color of a scene using a
sentation and geometry compressed streaming. LoD has also series of anisotropic 3D Gaussians, rendering through raster-
been employed in Gaussian-based representations, Hierarchy- ization. Structured Gaussian derivation methods use fewer

3
anchors Gaussians and generate more 3D Gaussians from
anchors to save GPU resources.
Rendering Patterns In a point cloud, the position coor-
dinates of each element serve as the mean µ, generating the
corresponding 3D Gaussian for differential rasterization ren-
dering:

(x − µ)
G(x) = exp(−(x − µ)T Σ−1 ) (1)
2 Figure 2: The basic rendering pipeline and our optimized
pipeline of structured Gaussian derivation methods
Σ = RSST RT (2)
where x represents any position in the scene space, and Σ
sian splatting method; it is still difficult to achieve high-
denotes the covariance matrix of the 3D Gaussian. Σ can be
speed rendering (above 72FPS) and ultra-high-speed ren-
decomposed into a rotation matrix R and a scaling matrix S to
dering (above 120FPS) in large-scale scenes such as city-
maintain its positive definiteness. In addition to the mentioned
scale scenes. There are two bottleneck stages in the rendering
attributes, each 3D Gaussian also includes a color value c and
pipeline of structured Gaussian derivation methods:
an opacity value α. These are used for subsequent opacity
blending operations during the rasterization process. While
• Gaussian Derivation Stage: Decode the feature parame-
rendering, the 3D Gaussians are first projected onto screen
ters of the anchors into neural Gaussian parameters
space using the EWA algorithm [159] and transformed into
2D Gaussians, which is a process commonly referred to as • Gaussian Rasterization Stage: Splat 3D neural Gaussian
Splatting. to 2D and Rasterize neural Gaussians into image
In order to make full use of the structured scene priori in
the SfM results, some related works have been proposed, such The derivation stage and rasterization stage are two com-
as Scaffold-GS and Octree-GS. Scaffold-GS does not recon- putationally intensive stages in the rendering pipeline and
struct directly based on the SfM sparse point cloud but first produce significant temporary GPU memory usage. There-
extracts a sparse voxel grid from the point cloud and con- fore, they affect and shape the main computing patterns of
structs anchors at the center of the voxel grids. The anchors the structured Gaussian derivation method and its rendering
contain feature parameters f , which are used to derive the pipeline in Gaussian splatting scenes of different scales. The
neural Gaussian: following preliminary empirical experiments are shown in the
Figure 3. As the scale of the scene increases and the number
{µ j , Σ j , c j , α j , } j∈M = MLPθ ( fi , dview )i∈N (3) of anchor feature parameters increases, it further hinders the
achievement of the rendering speed required for immersive
Where θ represents sets of learnable weights of the multi- binocular stereo.
layer perceptron (MLP), µ j , Σ j , c j , α j and s j represent the
mean, covariance matrix, color, opacity of the neural Gaus-
sian j derived from anchor i under view direction dview . The
neural Gaussians will then be used for rasterization, which
is no different from native 3D Gaussians. At the same time,
the structured placement of the anchors also allows the de-
rived neural Gaussians to be guided by the scene prior, which
reduces the redundancy of model parameters and improves
robustness in novel view synthesis. Octree-GS goes a step
further in the structuring strategy, using an octree to replace
the sparse voxel grid to retain multi-resolution and structured
scene priori. The multi-resolution grid in the octree makes
it possible to construct layers of detail (LoD) in training and Figure 3: Rendering stage time in different scenes. Although
then reduce the rendering overhead by setting different detail the end-to-end rendering time varies due to the scales of the
levels according to distance, expanding the scene scale appli- Gaussian splatting scenes, the time of the derivation stage and
cability of the structured Gaussian derivation method. The the rasterization stage always dominate.
basic rendering pipeline of Gaussian derivation methods is
shown in Figure 2. We test the proportion of various operators in the model
The structured Gaussian derivation method has higher ren- across two common scenarios, as shown in Figure 4. In the
dering efficiency than the original randomly distributed Gaus- figure, the Rasterizer is the final operator in the rendering

4
can be directly accessed from the cache.

4 Cache-centric Computation Reuse Strategy

in Rendering
Figure 4: GS model inferring time breakdown. Aside from The core of this framework lies in the cache-centric de-
the Rasterizer operator, other components, including the An- redundancy rendering pipeline, which significantly enhances
chorDecoder, consume a significant amount of time in the rendering efficiency. The entire rendering pipeline is built
rendering process. around the cache, which divides it into two main parts. The
anchor decoding part generates Gaussian parameters, in which
slight changes in perspective only result in little color differ-
process, the AnchorDecoder mainly consists of an MLP, and ences. The cache can optimize and eliminate most of the
"Other" refers to some calculations that occur before the An- computations in the anchor decoding part. The rasterizer part
chorDecoder. involves Gaussian rasterization, as even slight changes in
perspective can significantly affect visual perception due to
3.2 Overview of GS-Cache Framework Archi- changes in object occlusion. Therefore, this part cannot be
accelerated using the cache in order to maintain visual quality.
tecture
Cache Design The frame stream generated in real-time
The high fidelity of structured Gaussian derivation methods rendering is obtained through continuous rendering computa-
supports the realism of large-scale scene rendering in immer- tions. The input sequence of cameras has spatial proximity,
sive VR experience, but its basic rendering pipeline is not which is also reflected in the fact that the rendering objec-
enough to support the frame rate in large-scale scenes. There- tives between consecutive frames often overlap. The overlap
fore, we propose the GS-Cache framework, which realizes the between consecutive frames in real-time rendering will also
performance requirements of structured Gaussian derivation cause computation redundancy. Figure 6 illustrates this situ-
methods in large-scale scenes from the perspective of computation, where the middle section can be cached to accelerate
ing framework and hierarchical optimization. The GS-Cache rendering when rendering meets a cache hit. Outdated data
framework includes two primary hierarchical structures: a is chosen by the cache scheduler and will be evicted. For the
scheduling framework that supports elastic parallel rendering continuous rendering of the structured Gaussian derivation
and a de-redundancy rendering pipeline with a cache-centric method, the computation redundancy is manifested in decod-
strategy. As illustrated in Figure 5. ing the same anchor features between consecutive frames,
The scheduling framework is responsible for driving the reducing the overall rendering computation efficiency and
entire system, facilitating communication with VR applica- restraining the upper limit of the frame rate during real-time
tions, and managing the scheduling of rendering workers. It rendering.
manages worker states and performance statistics, such as We propose a method based on computation cache to share
the current displaying frame rate and the camera input queue and reuse intermediate results in the pipeline between mul-
from VR applications. The scheduling framework communi- tiple frames in the rendering process so that single-frame
cates with the VR applications through a VR adapter, which, rendering no longer needs to decode all anchor features but
alongside the control strategy, enables the initiation or termi- reuse derived Gaussian parameters from the cache and enter
nation of rendering workers while providing interfaces for the subsequent rasterization stage. The rendering pipeline
data transfer. optimized by computation cache and reuse is illustrated in
The de-redundancy rendering pipeline features a multi- Figure 7. The camera input is initially used to compute anchor
level de-redundancy process. Once input data enters the ren- indices, which are subsequently utilized to filter anchors to be
dering pipeline, it first undergoes a binocular de-redundancy decoded and locate anchors in the cache. The anchor indices
stage, followed by an anchor filtering stage. The filtered an- are translated to the cache indices by index mapping. The
chors are indexed by the model parameters and Gaussian anchor features in current rendering are no longer obtained
features required for rendering. The cache scheduler then per- by complete decoding computations. Still, the results pre-
forms the scheduling based on the filtered anchors, loading served from the previous frames are reused, and the Gaussian
the necessary model parameters for the subsequent decoding parameters not included in the cache are decoded simultane-
stage. The results of previous decoding computations are up- ously. This method removes inter-frame redundancy in the
dated in the cache, from which the rasterization stage retrieves derivation stage for real-time rendering. With an increase
data for rendering. During continuous rendering, anchors and in cache depth and a corresponding rise in the reuse rate,
Gaussian features that are insensitive to viewpoint changes both the overall rendering performance and the theoretical
will not require re-decoding. Instead, their decoding results upper limit of frames per second (FPS) will undergo further

5
Figure 5: Overview of GS-Cache framework architecture.

Figure 6: Continuous rendering cache: In continuous render-

ing, as the viewpoint changes continuously, there is a signifi-
cant overlap in the view frustum, which can be cached to save Figure 7: Cache-centric Rendering Pipeline: After the model
computational resources and accelerate rendering. performs Gaussian parameter decoding, it will be added to
the cache. When rendering subsequent frames, if the cache
hits, the parameters from the cache will be used.
enhancement. This improvement is essential for ensuring the
feasibility and stability of high frame rate real-time rendering, the anchor set X for decoding computation in the current
which is critical for delivering an immersive experience. frame rendering, Xk′ means the newly decoded anchors from
Optimizing the Quality–Speed Trade-off Since the neu- rendering the previous k frame. This ratio is the update rate:
ral Gaussian appearance derived from the anchor point fea-
|Xk′ |
tures has a certain viewpoint correlation, the inter-frame reuse depth = H( ) (4)
of the results of the derivation stage should be restrained by |X|
certain conditions to ensure that the correctness of the render- In addition, to avoid the same Gaussian appearance param-
ing results within an acceptable range. We design a dynamic eters being maintained in the cache for too long and causing
cache depth scheduling Algorithm 1 and propose a heuristic significant rendering errors (such as rendering on the trajecto-
cache reuse depth scheduling strategy that adjusts the depth ries around a centered target), the cache needs to be flushed
of reuse and cached parameters in subsequent frame render- in time, then cache re-filling and inter-frame reusing need
ings according to the intensity of decoding calculations in to be performed again. For static cache reuse depth, flush-
the derivation stage performed in the current frame rendering the cache will cause a rapid change in scene appearance.
ing. Such a scheduling strategy is not affected by the specific Therefore, another benefit of introducing dynamic cache depth
model and pipeline structure. While making full use of the scheduling is to suppress the appearance of rapid change prob-
computation cache to accelerate the derivation stage, it can lems caused by sudden cache flushing.
maintain the image rendering quality without a significant Binocular Stereo De-redundancy Unlike monocular ren-
decrease or fluctuation due to the use of cached results. As- dering, stereo rendering uses two cameras with sequential
suming the guiding function of the heuristic strategy is H, alternating method rendering. The overlap of the observation
the expected reuse depth has the following relationship with fields further leads to redundancy in stereo rendering; that

6
Algorithm 1 Dynamic Cache Depth Scheduling Algorithm
1: Initialize rendering pipeline, set cache depth and guiding
function
2: while Receiving camera input do
3: Anchors indexing and filtering
4: if Reach max cache reuse depth then
5: Invalidate those computation cache line
6: end if
7: if Anchor duplicate rate == 0% or first frame then
8: Decode total anchors into 3D Gaussians
9: else if Anchor duplicate rate == 100% then
10: Reuse total 3D Gaussians in cache
11: else
12: Decode new anchors
13: Update computation cache
14: Compose new Gaussians and cached Gaussians
15: end if
Figure 8: Binocular stereo de-redundancy through uniformed
16: Configure cache depth based on guiding function and
camera in structured Gaussian derivation method.
duplicate rate
17: Update render buffer
18: Rasterize render buffer into an image redundancy of the derivation stage in the sequential alternat-
19: end while ing method. Then, the binocular cameras can rasterize the
shared Gaussian of the derivation stage independently or in
is, two cameras must render the same objectives of the 3D batching to maintain the binocular stereo parallax. It is worth
scene as images under different perspectives. The mere posi- noting that, unlike the double-wide rendering method [147] in
tion variance and the large field of view result in significant the traditional rasterization of the mesh models, our method
redundancy in binocular stereo rendering. That means twice merges the multi-channel end-to-end pipeline into one in
anchor decoding and cache visiting. stereo rendering. This eliminates the redundancy of the se-
We propose a stereo rendering de-redundancy method suit- quential alternating method and reduces the number of calls
able for structured Gaussian derivation methods, aiming to between multiple camera renderings in the derivation stage,
eliminate computation redundancy in the derivation stage. thereby improving the performance and frame rate upper limit
The core is to utilize the overlap of the stereo cameras to of stereo rendering.
merge the computation process in the derivation stage so
that two cameras can share the Gaussian parameters decoded 5 Multi-GPU Elastic Parallel Rendering
by a set of anchor features for the subsequent rasterization
Schedule
stage. For binocular stereo rendering, assume that there is a
camera group C = {c1 , c2 }, whose positions in the world co- In city scenes, since the scenes are large and the Gaussian
ordinate system and local Z-axis directions are P = {p1 , p2 } distribution between the different parts is relatively significant,
and D = {d1 , d2 } , respectively, sharing the same camera in- the performance (FPS) will fluctuate greatly when roaming
trinsic parameters and expressed as θFOV , then the following across the scenes. For example, the FPS will fluctuate greatly
method can be used to obtain the merged parameters that from an area with dense high-rise buildings to an open square
cover the binocular camera field of view at the same time: or from a ground-level view to a high-altitude bird’s-eye view.
µD Even though we use cache to reduce the computational load
duni f ied = (5) significantly, the rendering FPS decreases as the scene size
||µD ||2
increases. Therefore, based on the cache-centric rendering
||p1 − p2 ||2
puni f ied = µP − duni f ied ∗ (6) pipeline, we further employ elastic parallelism techniques
2 ∗ tan( θFOV
2 ) to stabilize the rendering FPS above a predetermined value.
Where duni f ied and puni f ied represent the direction and posi- That requires that the computing resources be dynamically
tion of the camera, which is equivalent to the combined binoc- scheduled according to the changes in the scene. We design
ular field of view, and its intrinsic parameters are still θ f ov , an elastic parallel scheduling strategy to alleviate the drastic
as shown in Figure 8. Therefore, the binocular cameras can changes in FPS caused by view changes and achieve stable
simultaneously enter the derivation stage through the equiva- rendering.
lent camera and share the results, eliminating the computation We have designed an asynchronous pipeline for VR render-

7
ing, as detailed in Algorithm 2. Rather than directly use the
binocular cameras of the VR HMD device for rendering, we
put it into a shared queue. In a fixed sampling interval, if the
VR HMD device pose changes by more than the threshold,
we put the current camera into the shared queue and stamp
it with time. The camera data that exceeds a timeout will be
discarded by the shared queue. The rendering worker process
accesses the shared queue when it finishes rendering the pre-
vious camera, takes camera data from the head of the queue,
and executes the rendering task. A scheduler is introduced to
flexibly schedule the rendering worker process according to
the change of FPS. A scheduling strategy is used to achieve
a stable rendering. In the strategy, we present a frame rate
range [Min-FPS, Max-FPS]. When the FPS is lower than the
preset Min-FPS, the scheduler starts a new rendering worker
1
process; when the FPS is higher than (1 + Nworkers ) times of
the Max-FPS, a chosen rendering worker process is stopped.
As shown in Figure 9.

Figure 9: Worker control state transition responding to FPS

Since our elastic parallel rendering is an asynchronous Figure 10: An overview of dedicated CUDA kernels
rendering pipeline, inconsistent rendering order may occur
due to inconsistent GPU performance. To solve this problem,
we synchronize when writing to display. By recording the overhead in this stage comes from the combined operators,
timestamp of the last frame written to display and comparing using 43% of the stage duration, and the MLPs, using 25%,
the current frame’s timestamp, we decide whether the current as shown in Figure 10 (a). For the first combine operator, we
frame should be written to the GPU memory for display. A optimize element-wise multiplication within the kernel, elim-
simple principle is that frames with earlier timestamps should inating the need for hard copies of features. For the second
be displayed first, and we should display rendered frames as combine operator, we replace the merge-and-split operations
quickly as possible. So, under this principle, expired frames with a fine-grained parallel method that processes each tensor
will be discarded. individually. These optimizations reduce both memory usage
and processing time. For the MLPs, our optimization method
6 Implementation of model Components fuses two layers into a single fused Matmul. For the entire
process, precomputing mask indices in the mask computation
In structured Gaussian derivation methods, including Scaffold- step can reduce redundant calculations.
GS and Octree-GS, the main bottlenecks are the deriva- RasterizeGaussians Optimization In addition to general
tion stage (performed by AnchorDecoder) and rasterization optimizations such as merging memory accesses and precom-
stage (performed by RasterizeGaussians), as illustrated by putation, a significant amount of computational redundancies
the SCGS and OCGS bars in Figure 13. Designing dedicated in the rasterizing pipeline are eliminated through two methods,
CUDA kernels for these stages significantly improves render- as shown in Figure 10 (b). First, when defining the Gaussian
ing speed while maintaining a high standard of image output distribution, considering the opacity can scale down the size
consistency. of the ellipse, reducing the area of the Axis-Aligned Bounding
AnchorDecoder Optimization The main computational Box (AABB) and the number of key-value pairs, thereby re-

8
ducing the overall computational load. Second, optimizing the scheduling one of the GPU resources according to the change
AABB’s tile coverage determination can eliminate computa- in rendering frame rate.
tions of tiles that are completely outside the ellipse’s coverage To verify the rendering capabilities of different methods
area [43]. The redundancy reduction slightly increases the in large-scale scenes, we use the Matrixcity [93] dataset to
preprocessing duration but significantly reduces the duration build the target scene used in the experiment. When testing
of subsequent computational steps. the rendering from the aerial views, the entire 2.7 square
kilometers of the city scene is used, while when testing the
Algorithm 2 Elastic Parallel Rendering Scheduling Algo- rendering of the street views, only part of the street scene of
rithm the city is used, covering a street range of about 220 meters in
1: Initialize shared queue, set target FPS and timeout length. The sample configurations delivered by Scaffold-GS
2: while VR application is tuning do and Octree-GS on the Matrixcity dataset are trained to 40,000
3: Obtain current VR device pose information iterations and obtain scene models with high reconstruction
4: if HMD device pose change exceeds threshold then quality, as shown in Table 1.
5: Add camera and timestamp into the shared queue
6: end if Table 1: Large-scale Scenes reconstructed from the Matrixcity
7: Calculate current FPS dataset
8: if FPS is below target FPS then
Scene View Scale PSNR Images Anchors
9: Start a new rendering process
1 2.7km2
10: else if FPS is above target FPS (1 + Nworkers ) then City(Scaffold-GS) Aerial 28.84 5621 18,554k
City(Octree-GS) Aerial 2.7km2 28.08 5621 17,122k
11: Choose one rendering process and stop it
Street(Scaffold-GS) Street 0.026km2 30.01 330 1,311k
12: end if Street(Octree-GS) Street 0.026km2 31.20 330 2,848k
13: for Each rendered frame do
14: if Timestamp < Last_written_timestamp then
15: Discard the frame
16: else 7.1 Performance Evaluation
17: Write the frame to GPU Texture memory To simulate the experience of urban roaming, we set an aerial
18: Update Last_written_timestamp ascending curved trajectory in the city scene and a straight
19: end if street trajectory along the road in the street scene. We take the
20: end for consecutive keyframes in the trajectory as input for our render-
21: end while
ing framework. Each frame corresponds to the head-mounted
display’s locomotion input, which contains two poses of the
binocular stereo cameras on it.
7 Experiments Experiments are conducted on Scaffold-GS and Octree-
GS, the SOTA models for modeling cities. The max reuse
Aiming at an immersive VR experience, we choose the depth is set to 10 for caching computation results from the
Meta Quest 3 head-mounted display (HMD) as the human- previous 10 frames. A series of experiments show that setting
computer interaction interface, which supports a display ca- it to 10 balances both performance and quality. To compare
pability of up to 120FPS with a binocular 2K resolution. The their optimal performance, both rendering pipelines are tested
locomotion of the helmet is transmitted through the OpenXR under single GPU and multi-GPU resources. Besides, we
Runtime API, and the rendering results from the other de- compare our results with CityGS [96] and some VR rendering
vice are streamed to the rendering buffer of the helmet. We works.
use consumer-grade components to build our platform that We collect the average FPS for performance evaluation of
performs actual rendering, including an Intel i9-14900 CPU, the rendering pipeline under continuous computing condi-
128GB RAM, and two Nvidia RTX 4090 GPUs connected tions. For a more comprehensive result, we also collect the
and communicated through the PCIe 4.0x8 slots. The custom 99% percentile FPS. At the same time, the time consumption
CUDA renderer performs the rendering tasks in the complete of the anchor decoding stage and the Gaussian rasterization
framework. The computation results will first be placed in stage in the rendering pipeline, where computation optimiza-
the VRAM on GPUs and finally gathered and streamed to tion and dedicated CUDA kernels mainly take effect, are
the rendering buffer of the helmet through the OpenXR Run- collected. The results are shown in Table 2 and Table 3.
time API. The GS-Cache rendering framework is deployed Under a single GPU, compared to the baseline pipeline,
on the consumer-grade workstation mentioned above and im- the optimized rendering pipeline has an average frame rate
plemented with PyTorch. All pipelines can use multi-GPU performance improvement of 2x. The city scene has a higher
resources to improve the total throughput through the elastic speedup gain because the number of anchors involved in de-
parallel rendering scheduling interface and enable or disable coding and the number of Gaussians involved in rasterization

9
Table 2: Rendering performance comparison on Matrixcity city scene

Methods AVG. FPS 99% FPS Decoding(ms) Rasterization(ms) AVG. Speedup Gain
Scaffold-GS(Origin) 27.24 13.33 28.89 10.63 -
Octree-GS(Origin) 18.04 11.50 18.49 14.10 -
Scaffold-GS(Our) 55.81 37.28 11.34 5.42 2.05
Octree-GS(Our) 44.55 25.81 6.39 4.35 2.47
Scaffold-GS(Origin w/ elastic) 50.78 29.07 14.19 5.19 1.86
Octree-GS(Origin w/ elastic) 42.38 30.98 8.01 5.92 2.35
Scaffold-GS(Our w/ elastic) 109.80 78.29 5.57 3.01 4.03
Octree-GS(Our w/ elastic) 96.46 73.27 2.82 2.29 5.35

Table 3: Rendering performance comparison on Matrixcity street scene

Methods AVG. FPS 99% FPS Decoding(ms) Rasterization(ms) AVG. Speedup Gain
Scaffold-GS(Origin) 43.91 8.35 8.67 13.38 -
Octree-GS(Origin) 54.26 21.19 8.62 6.63 -
Scaffold-GS(Our) 80.24 14.80 1.61 9.88 1.83
Octree-GS(Our) 113.97 44.74 2.25 4.37 2.10
Scaffold-GS(Origin w/ elastic) 88.59 18.24 4.36 6.57 2.02
Octree-GS(Origin w/ elastic) 111.98 44.38 4.16 3.11 2.06
Scaffold-GS(Our w/ elastic) 148.30 33.34 0.93 5.59 3.38
Octree-GS(Our w/ elastic) 203.16 93.69 1.18 2.54 3.74

differ in the two scenes. Our framework demonstrates its ad-

vantages more clearly in larger scenes. When the number of
anchors in the scene is low, the decoding stage will become
the main bottleneck in rendering, rather than the rasteriza-
tion stage, in which the gain will be more significant. The
worst performance is relatively close in all pipelines, the real
bottleneck in processing the input sequence. However, the
stable frame rate represented by 99% FPS also has a per-
formance improvement of more than 2x, indicating that the
performance changes between the optimized pipeline and the
baseline pipeline are not instantaneous but rather a steady
increase in performance throughout the entire continuous ren- Figure 11: Rendering frame rate on the trajectory w.r.t height.
dering process. Under multi-GPU, the average frame rate of Elastic scheduling can maintain acceptable FPS
the optimized rendering pipeline can not only achieve the min-
imum 72FPS requirement for an immersive VR experience.
Still, it can even reach more than 120FPS in street scenes, ex- the minimum frame rate requirement for an immersive VR
ceeding the maximum refresh rate limit of the head-mounted experience. However, an elastic resource scheduling strategy
display. The baseline pipeline, with the support of multi-GPU can allocate only part of the GPU resources within the ac-
resources, can also achieve the minimum 72FPS requirement ceptable performance fluctuation range and allocate more
for immersive VR experience in street scenes. GPU resources when the performance drops outside, enabling
As shown in the Figure 11, to verify the scheduling capa- another GPU to participate in rendering and restoring the
bility of our multi-GPU elastic rendering method, we set an performance to the acceptable range. This can improve the
additional zoom-out trajectory on the city scene, pointing to energy efficiency of the rendering framework and maximize
the center of the scene and moving away from it, and compare the utilization of resources scheduled for computing tasks.
the frame rate changes of the Octree-GS optimized rendering Still, it is also friendly to target applications on consumer-
pipeline and the baseline pipeline under fixed single GPU grade devices. We set the performance acceptable range of
resources and elastic multi-GPU resources. When the camera the optimized pipeline to a minimum of 60 FPS, the mini-
gradually moves away from the center of the scene along the mum requirement for a smooth VR experience, but cannot
zoom-out trajectory, more anchors will be involved in the de- achieve an immersive VR experience. At the same time, the
coding stage of the rendering pipeline. Then more Gaussians performance acceptable range of the baseline pipeline is set
will be derived and involved in the rasterization stage. In the to a minimum of 20 FPS, which is the minimum requirement
case of static resource scheduling, the end-to-end throughput for real-time rendering. Frame rates lower than this will not
of the rendering pipeline will gradually degrade, even below even meet the real-time application requirements.

10
In contrast to methods such as VR-NeRF [149], RT-
NeRF [95], and VR-GS [64], which have made significant
contributions to the VR rendering of 3D neural scenes, these
methods have primarily focused on small-scale scenes. The
model sizes employed in these methods differ by an order of
magnitude from those in our experiments, and their average
rendering frame rates are significantly below 72 FPS. Our so-
lution, however, substantially outperforms existing methods in
terms of performance while maintaining rendering quality. At
the same settings, from the viewpoint of 500 meters in height,
the FPS of GS-Cache is double that of CityGaussian [96].
Figure 12: Dynamic cache depth. Due to the scheduling strat-
egy for quality, it may cause rapid shrinkage of depth when
7.2 Quality Evaluation the update rate fluctuates, but overall speedup can still be
achieved.
Performance improvements in rendering pipelines are often
accompanied by trade-offs in quality. The rendering pipeline
optimization method we proposed includes computation de- baseline pipelines. As shown in the Table 4. It is worth noting
redundancy and computation reuse to maintain the pipeline that the mean square error(MSE) and peak signal-to-noise
structure unchanged and optimize the process centered on ratio(PSNR) can only reflect the absolute difference in pixel
rendering quality. Due to the peculiarities of multi-camera values between the images but not the relative difference in
setups in binocular stereo, the impact of redundancy removal perception. Therefore, it is also necessary to refer to metrics
on rendering quality is not significant. In the meantime, reuse such as the structural similarity index(SSIM) [142] and the
requires a dynamic cache depth scheduling strategy to con- learned perceptual image patch similarity(LPIPS) [153]. It is
trol fluctuations in rendering quality. In our experiments, we generally believed that when PSNR surpasses 30, the visual
use a linear guidance function for responding to intensity difference between the two images is difficult to perceive by
changes in the decoding stage and schedule reuse depth in the human eye. SSIM and LPIPS need to be over 0.9 and no
subsequent frame renderings. The linear response is matched more than 0.1, respectively.
with the motion features for the constant speed movement of
the rendered trajectory in the experiment. Figure 12 illustrates
how the cache depth adjusts to maintain rendering quality 7.3 Ablation Evaluation
during the rendering process. The update rate refers to the
In addition to evaluating the overall performance and qual-
percentage of anchors that are decoded and updated to the
ity changes of the GS-Cache rendering framework, we also
cache and is equivalent to the cache miss rate. As the cache
conduct ablation experiments on the performance impact of
depth increases, the cache miss rate decreases, resulting in a
different optimization methods for the rendering pipeline.
corresponding reduction in the update rate. For the movement
Based on the rendering trajectories from the aerial and the
with acceleration and with staged speed changes, exponen-
street, pipelines that ablated different optimization methods
tial response and staged response are needed to match the
are tested for rendering. We compare the end-to-end frame
reuse depth and motion features. Our optimization methods
rate and stage time consumption, and record the highest mem-
do not involve modifications on the rendering pipeline, are
ory usage in the complete rendering process, as Table 5 And
transparent to the original rendering process of the Gaussian
Table 6 shown. We can also calculate the speedup of different
derivation method, and are compatible with pipelines without
methods to achieve full performance through ablation.
LOD (e.g., Scaffol-GS) and pipelines containing LOD (e.g.,
In the city scene, computation de-redundancy and reuse
Octree-GS).
significantly impact the pipeline’s overall performance, and
Table 4: Rendering quality comparison on Matrixcity city and the speedup of the average frame rate can reach 1.8x. In the
street scenes street scene, the impact of dedicated CUDA kernels on the
overall performance of the pipeline is more significant. This is
Methods MSE↓ PSNR↑ SSIM↑ LPIPS↓ because, in the city scene, the time of single-frame rendering
Scaffold-GS(City) 0.00116 38.36 0.98 0.022
is concentrated in the decoding stage. In contrast, in the street
Octree-GS(City) 0.00136 35.38 0.98 0.024 scenes, the time is concentrated in the rasterization stage, cor-
Scaffold-GS(Street) 0.00155 32.68 0.98 0.018 responding to the main effects of computation optimization
Octree-GS(Street) 0.00038 35.53 0.99 0.012 and dedicated kernels. The impact on the average frame rate
is also reflected in the 99% frame rate, which further proves
We evaluate the quality difference caused by computation that our optimization methods are essential to improve the per-
optimizations between images rendered by the optimized and formance of the rendering pipeline, which covers the best and

11
Table 5: Rendering ablation comparison on Matrixcity city scene.

Methods AVG. FPS 99% FPS Decoding(ms) Rasterization(ms) Memory(GiB) AVG. Speedup Loss
Scaffold-GS(Our) 56.61 38.71 11.28 5.39 8.10 -
Our w/o de-redundancy 34.82 26.49 22.72 5.36 8.05 1.62x
Our w/o reuse 31.26 20.52 19.48 5.27 7.85 1.81x
Our w/o kernels 38.46 21.64 14.15 10.77 10.30 1.47x
Octree-GS(Our) 41.15 20.33 6.89 4.68 8.59 -
Our w/o de-redundancy 24.86 16.10 13.27 4.37 8.59 1.65x
Our w/o reuse 22.48 17.66 10.09 4.54 8.51 1.83x
Our w/o kernels 31.51 21.40 8.41 12.57 9.17 1.30x

Table 6: Rendering ablation comparison on Matrixcity street scene.

Methods AVG. FPS 99% FPS Decoding(ms) Rasterization(ms) Memory(GiB) AVG. Speedup Loss
Scaffold-GS(Our) 79.43 14.59 1.68 9.93 2.12 -
Our w/o de-redundancy 72.52 14.59 3.20 9.95 2.16 1.09x
Our w/o reuse 77.88 13.60 1.84 10.01 1.96 1.02x
Our w/o kernels 58.34 11.43 2.90 13.18 3.67 1.36x
Octree-GS(Our) 115.88 47.93 2.19 4.34 2.07 -
Our w/o de-redundancy 87.85 44.87 4.16 4.25 2.12 1.31x
Our w/o reuse 93.02 41.30 2.58 4.35 2.03 1.27x
Our w/o kernels 77.76 33.45 4.20 6.36 2.94 1.49x

worst cases in rendering performance. Removing redundancy As shown in Figure 13, replacing only the dedicated CUDA
in binocular stereo improves the performance of the decoding kernel results in a speedup for both AnchorDecoder and
stage most significantly in different scenes and pipelines. In Rasterizer. Notably, in the city and street scenes, Rasterizer
contrast, dedicated kernels can most significantly improve the achieves a higher speedup in the former, while AnchorDe-
performance of the rasterization stage while reducing memory coder achieves a higher speedup in the latter. This is because
usage. Due to the introduction of the computation cache for the city scene spans a broader area and involves a larger scale,
Gaussian derivation, the reuse optimization of the rendering with denser and more numerous anchor points and Gaussians.
pipeline will cause additional memory usage, not exceeding For Rasterizer, the optimization eliminates redundant com-
3%. Ultimately, our optimization methods jointly support the putations caused by false Gaussian intersections. The opti-
rendering pipeline to meet the performance requirements of mization yields better results since the denser Gaussians in
immersive real-time VR rendering and form compensation the city scene result in more false intersections. As for An-
and balance in memory usage. chorDecoder, its optimization mainly reduces memory access
overhead. In the city scene, the increased density of anchor
points raises the computational overhead for AnchorDecoder,
making memory access optimizations less effective than in
the street scene.

8 Conclusions
We demonstrate the GS-Cache framework, a rendering frame-
work oriented to structured Gaussian derivation methods,
which can achieve real-time rendering of large-scale scenes,
including city and street Gaussian reconstruction scenes, meet-
ing the high-speed and high-fidelity requirements of immer-
sive VR experience. We make several key contributions, in-
cluding the cache-centric de-redundancy rendering pipeline, a
rendering framework that supports multi-GPU parallelism and
elastic scheduling, and dedicated CUDA kernels for the com-
putational bottleneck stage. In the experiments, we verify that
Figure 13: Single-frame rendering kernel duration. Due to the GS-Cache framework achieves significant performance
the LOD-based model structure and layer-switching strategy improvements compared to the baseline methods, and meets
applied to the entire scene, Octree-GS introduces significant the frame rate requirements of binocular 2K resolution of
time before the decoding stage in large-scale scenes. more than 72FPS and more than 120FPS under limited re-

12
sources such as consumer-grade GPUs, and does not result in [11] David A. Anisi. Optimal motion control of a ground
significant quality loss. vehicle. Master’s thesis, Royal Institute of Technology
(KTH), Stockholm, Sweden, 2003.

References [12] Sam Anzaroot and Andrew McCallum. UMass citation

field extraction dataset, 2013.
[1] SIGCOMM Comput. Commun. Rev., 13-14(5-1), 1984.
[13] Sam Anzaroot, Alexandre Passos, David Belanger, and
[2] CHI ’08: CHI ’08 extended abstracts on Human fac- Andrew McCallum. Learning soft linear constraints
tors in computing systems, New York, NY, USA, 2008. with application to citation field extraction, 2014.
ACM. General Chair-Czerwinski, Mary and General
Chair-Lund, Arnie and Program Chair-Tan, Desney. [14] J. E. Archer, Jr., R. Conway, and F. B. Schneider. User
recovery and reversal in interactive systems. ACM
[3] Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Trans. Program. Lang. Syst., 6(1):1–19, January 1984.
Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, San-
jay Ghemawat, Geoffrey Irving, Michael Isard, Man- [15] P. Bahl, R. Chancre, and J. Dungeon. SSCH: Slotted
junath Kudlur, Josh Levenberg, Rajat Monga, Sherry seeded channel hopping for capacity improvement in
Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, IEEE 802.11 ad-hoc wireless networks. In Proceeding
Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, of the 10th International Conference on Mobile Com-
and Xiaoqiang Zheng. Tensorflow: a system for large- puting and Networking (MobiCom’04), pages 112–117,
scale machine learning. In Proceedings of the 12th New York, NY, 2004. ACM.
USENIX Conference on Operating Systems Design
and Implementation, OSDI’16, page 265–283, USA, [16] Jonathan T Barron, Ben Mildenhall, Matthew Tancik,
2016. USENIX Association. Peter Hedman, Ricardo Martin-Brualla, and Pratul P
Srinivasan. Mip-nerf: A multiscale representation for
[4] Rafal Ablamowicz and Bertfried Fauser. Clifford: a anti-aliasing neural radiance fields. In Proceedings of
maple 11 package for clifford algebra computations, the IEEE/CVF international conference on computer
version 11, 2007. vision, pages 5855–5864, 2021.

[5] Patricia S. Abril and Robert Plant. The patent holder’s [17] Jonathan T Barron, Ben Mildenhall, Dor Verbin,
dilemma: Buy, sell, or troll? Communications of the Pratul P Srinivasan, and Peter Hedman. Zip-nerf: Anti-
ACM, 50(1):36–44, January 2007. aliased grid-based neural radiance fields. In Proceed-
ings of the IEEE/CVF International Conference on
[6] A. Adya, P. Bahl, J. Padhye, A.Wolman, and L. Zhou. Computer Vision, pages 19697–19705, 2023.
A multi-radio unification protocol for IEEE 802.11
wireless networks. In Proceedings of the IEEE 1st In- [18] Lutz Bornmann, K. Brad Wray, and Robin Haunschild.
ternational Conference on Broadnets Networks (Broad- Citation concept analysis (CCA)—a new form of ci-
Nets’04), pages 210–217, Los Alamitos, CA, 2004. tation analysis revealing the usefulness of concepts
IEEE. for other researchers illustrated by two exemplary case
studies including classic books by Thomas S. Kuhn
[7] I. F. Akyildiz, T. Melodia, and K. R. Chowdhury. A and Karl R. Popper, May 2019.
survey on wireless multimedia sensor networks. Com-
puter Netw., 51(4):921–960, 2007. [19] Mic Bowman, Saumya K. Debray, and Larry L. Peter-
son. Reasoning about naming systems. ACM Trans.
[8] I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and Program. Lang. Syst., 15(5):795–825, November 1993.
E. Cayirci. Wireless sensor networks: A survey. Comm.
ACM, 38(4):393–422, 2002. [20] Johannes Braams. Babel, a multilingual style-option
system for use with latex’s standard document styles.
[9] American Mathematical Society. Using the amsthm TUGboat, 12(2):291–301, June 1991.
Package, April 2015. http://www.ctan.org/pkg/
amsthm. [21] James Bradbury, Roy Frostig, Peter Hawkins,
Matthew James Johnson, Chris Leary, Dougal Maclau-
[10] Sten Andler. Predicate path expressions. In Proceed- rin, George Necula, Adam Paszke, Jake VanderPlas,
ings of the 6th. ACM SIGACT-SIGPLAN symposium Skye Wanderman-Milne, and Qiao Zhang. JAX:
on Principles of Programming Languages, POPL ’79, composable transformations of Python+NumPy
pages 226–236, New York, NY, 1979. ACM Press. programs, 2018.

13
[22] Jonathan F. Buss, Arnold L. Rosenberg, and Judson D. [35] Nianchen Deng, Zhenyi He, Jiannan Ye, Budmonde
Knott. Vertex types in book-embeddings. Technical Duinkharjav, Praneeth Chakravarthula, Xubo Yang,
report, Amherst, MA, USA, 1987. and Qi Sun. Fov-nerf: Foveated neural radiance fields
for virtual reality. IEEE Transactions on Visualization
[23] Jonathan F. Buss, Arnold L. Rosenberg, and Judson D. and Computer Graphics, 28(11):3854–3864, 2022.
Knott. Vertex types in book-embeddings. Technical
report, Amherst, MA, USA, 1987. [36] E. Dijkstra. Go to statement considered harmful. In
Classics in software engineering (incoll), pages 27–33.
[24] Yang Cao, Tao Jiang, Xu Chen, and Junshan Zhang. Yourdon Press, Upper Saddle River, NJ, USA, 1979.
Social-aware video multicast based on device-to-
device communications. IEEE Transactions on Mobile [37] Bruce P. Douglass, David Harel, and Mark B. Trakhten-
Computing, 15(6):1528–1539, 2015. brot. Statecarts in use: structured analysis and object-
orientation. In Grzegorz Rozenberg and Frits W. Vaan-
[25] Zhiqin Chen, Thomas Funkhouser, Peter Hedman, and drager, editors, Lectures on Embedded Systems, volume
Andrea Tagliasacchi. Mobilenerf: Exploiting the poly- 1494 of Lecture Notes in Computer Science, pages 368–
gon rasterization pipeline for efficient neural field ren- 394. Springer-Verlag, London, 1998.
dering on mobile architectures. In Proceedings of the
[38] D. D. Dunlop and V. R. Basili. Generalizing specifica-
IEEE/CVF Conference on Computer Vision and Pat-
tions for uniformly implemented loops. ACM Trans.
tern Recognition, pages 16569–16578, 2023.
Program. Lang. Syst., 7(1):137–158, January 1985.
[26] Malcolm Clark. Post congress tristesse. In TeX90 Con- [39] Ian Editor, editor. The title of book one, volume 9 of
ference Proceedings, pages 84–89. TeX Users Group, The name of the series one. University of Chicago
March 1991. Press, Chicago, 1st. edition, 2007.
[27] Kenneth L. Clarkson. Algorithms for Closest-Point [40] Ian Editor, editor. The title of book two, chapter 100.
Problems (Computational Geometry). PhD thesis, Stan- The name of the series two. University of Chicago
ford University, Palo Alto, CA, 1985. UMI Order Num- Press, Chicago, 2nd. edition, 2008.
ber: AAT 8506171.
[41] Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu,
[28] Kenneth Lee Clarkson. Algorithms for Closest-Point Dejia Xu, and Zhangyang Wang. Lightgaussian: Un-
Problems (Computational Geometry). PhD thesis, bounded 3d gaussian compression with 15x reduction
Stanford University, Stanford, CA, USA, 1985. AAT and 200+ fps. arXiv preprint arXiv:2311.17245, 2023.
8506171.
[42] Simon Fear. Publication quality tables in LATEX, April
[29] Special issue: Digital libraries, November 1996. 2005. http://www.ctan.org/pkg/booktabs.

[43] Guofeng Feng, Siyan Chen, Rong Fu, Zimu Liao,

[30] Sarah Cohen, Werner Nutt, and Yehoshua Sagic. Decid-
Yi Wang, Tao Liu, Zhilin Pei, Hengjie Li, Xingcheng
ing equivalances among conjunctive aggregate queries.
Zhang, and Bo Dai. Flashgs: Efficient 3d gaussian
J. ACM, 54(2), April 2007.
splatting for large-scale and high-resolution rendering.
arXiv preprint arXiv:2408.07967, 2024.
[31] Mauro Conti, Roberto Di Pietro, Luigi V. Mancini,
and Alessandro Mei. (new) distributed data source [44] Dan Geiger and Christopher Meek. Structured varia-
verification in wireless sensor networks. Inf. Fusion, tional inference procedures and their realizations (as
10(4):342–353, October 2009. incol). In Proceedings of Tenth International Workshop
on Artificial Intelligence and Statistics, The Barbados.
[32] Mauro Conti, Roberto Di Pietro, Luigi V. Mancini, The Society for Artificial Intelligence and Statistics,
and Alessandro Mei. (old) distributed data source January 2005.
verification in wireless sensor networks. Inf. Fusion,
10(4):342–353, 2009. [45] Michael Gerndt. Automatic Parallelization for
Distributed-Memory Multiprocessing Systems. PhD
[33] XBOW sensor motes specifications, 2008. thesis, University of Bonn, Bonn, Germany, December
http://www.xbow.com. 1989.

[34] D. Culler, D. Estrin, and M. Srivastava. Overview of [46] Jayshree Ghorpade. Gpgpu processing in cuda architec-
sensor networks. IEEE Comput., 37(8 (Special Issue ture. Advanced Computing: An International Journal,
on Sensor Networks)):41–49, 2004. 3(1):105–120, January 2012.

14
[47] Michel Goossens, S. P. Rahtz, Ross Moore, and [58] Maurice Herlihy. A methodology for implementing
Robert S. Sutor. The Latex Web Companion: Integrat- highly concurrent data objects. ACM Trans. Program.
ing TEX, HTML, and XML. Addison-Wesley Longman Lang. Syst., 15(5):745–770, November 1993.
Publishing Co., Inc., Boston, MA, USA, 1st edition,
1999. [59] C. A. R. Hoare. Chapter ii: Notes on data structuring.
In O. J. Dahl, E. W. Dijkstra, and C. A. R. Hoare,
[48] Brian Guenter, Mark Finch, Steven Drucker, Desney editors, Structured programming (incoll), pages 83–
Tan, and John Snyder. Foveated 3d graphics. ACM 174. Academic Press Ltd., London, UK, UK, 1972.
transactions on Graphics (tOG), 31(6):1–10, 2012.
[60] Billy S. Hollis. Visual Basic 6: Design, Specification,
[49] Matthew Van Gundy, Davide Balzarotti, and Giovanni and Objects with Other. Prentice Hall PTR, Upper
Vigna. Catch me, if you can: Evading network sig- Saddle River, NJ, USA, 1st edition, 1999.
natures with web-based polymorphic worms. In Pro-
ceedings of the first USENIX workshop on Offensive [61] Lars Hörmander. The analysis of linear partial differ-
Technologies, WOOT ’07, Berkley, CA, 2007. USENIX ential operators. III, volume 275 of Grundlehren der
Association. Mathematischen Wissenschaften [Fundamental Prin-
ciples of Mathematical Sciences]. Springer-Verlag,
[50] Matthew Van Gundy, Davide Balzarotti, and Giovanni Berlin, Germany, 1985. Pseudodifferential operators.
Vigna. Catch me, if you can: Evading network sig-
natures with web-based polymorphic worms. In Pro- [62] Lars Hörmander. The analysis of linear partial differ-
ceedings of the first USENIX workshop on Offensive ential operators. IV, volume 275 of Grundlehren der
Technologies, WOOT ’08, pages 99–100, Berkley, CA, Mathematischen Wissenschaften [Fundamental Prin-
2008. USENIX Association. ciples of Mathematical Sciences]. Springer-Verlag,
Berlin, Germany, 1985. Fourier integral operators.
[51] Matthew Van Gundy, Davide Balzarotti, and Giovanni
Vigna. Catch me, if you can: Evading network sig- [63] Ieee tcsc executive committee. In Proceedings of
natures with web-based polymorphic worms. In Pro- the IEEE International Conference on Web Services,
ceedings of the first USENIX workshop on Offensive ICWS ’04, pages 21–22, Washington, DC, USA, 2004.
Technologies, WOOT ’09, pages 90–100, Berkley, CA, IEEE Computer Society.
2009. USENIX Association.
[64] Ying Jiang, Chang Yu, Tianyi Xie, Xuan Li, Yutao
[52] Torben Hagerup, Kurt Mehlhorn, and J. Ian Munro. Feng, Huamin Wang, Minchen Li, Henry Lau, Feng
Maintaining discrete probability distributions opti- Gao, Yin Yang, and Chenfanfu Jiang. Vr-gs: A physical
mally. In Proceedings of the 20th International Col- dynamics-aware interactive gaussian splatting system
loquium on Automata, Languages and Programming, in virtual reality, 2024.
volume 700 of Lecture Notes in Computer Science,
pages 253–264, Berlin, 1993. Springer-Verlag. [65] Tao Jin, Mallesham Dasa, Connor Smith, Kittipat
Apicharttrisorn, Srinivasan Seshan, and Anthony Rowe.
[53] David Harel. Logics of programs: Axiomatics and Meshreduce: Scalable and bandwidth efficient 3d scene
descriptive power. MIT Research Lab Technical Re- capture. In 2024 IEEE Conference Virtual Reality and
port TR-200, Massachusetts Institute of Technology, 3D User Interfaces (VR), pages 20–30. IEEE, 2024.
Cambridge, MA, 1978.
[66] Bernhard Kerbl, Georgios Kopanas, Thomas
[54] David Harel. First-Order Dynamic Logic, volume 68 of Leimkuehler, and George Drettakis. 3d gaussian
Lecture Notes in Computer Science. Springer-Verlag, splatting for real-time radiance field rendering. ACM
New York, NY, 1979. Transactions on Graphics (TOG), 42:1 – 14, 2023.
[55] CodeBlue: Sensor networks for medical care, 2008.
[67] Bernhard Kerbl, Georgios Kopanas, Thomas Leimküh-
http://www.eecs.harvard.edu/mdw/ proj/codeblue/.
ler, and George Drettakis. 3d gaussian splatting for
[56] Peter Hedman, Pratul P Srinivasan, Ben Mildenhall, real-time radiance field rendering. ACM Trans. Graph.,
Jonathan T Barron, and Paul Debevec. Baking neural 42(4):139–1, 2023.
radiance fields for real-time view synthesis. In Pro-
[68] Bernhard Kerbl, Andreas Meuleman, Georgios
ceedings of the IEEE/CVF international conference on
Kopanas, Michael Wimmer, Alexandre Lanvin,
computer vision, pages 5875–5884, 2021.
and George Drettakis. A hierarchical 3d gaussian
[57] J. Heering and P. Klint. Towards monolingual program- representation for real-time rendering of very large
ming environments. ACM Trans. Program. Lang. Syst., datasets. ACM Transactions on Graphics (TOG),
7(2):183–213, April 1985. 43(4):1–15, 2024.

15
[69] Markus Kirschmer and John Voight. Algorithmic enu- [82] E. Korach, D. Rotem, and N. Santoro. Distributed
meration of ideal classes for quaternion orders. SIAM algorithms for finding centers and medians in networks.
J. Comput., 39(5):1714–1747, January 2010. ACM Trans. Program. Lang. Syst., 6(3):380–401, July
1984.
[70] Donald E. Knuth. Seminumerical Algorithms.
Addison-Wesley, 1981. [83] Jacob Kornerup. Mapping powerlists onto hypercubes.
Master’s thesis, The University of Texas at Austin,
[71] Donald E. Knuth. Seminumerical Algorithms, vol- 1994. (In preparation).
ume 2 of The Art of Computer Programming. Addison-
[84] David Kosiur. Understanding Policy-Based Network-
Wesley, Reading, MA, 2nd edition, 10 January 1981.
ing. Wiley, New York, NY, 2nd. edition, 2001.
[72] Donald E. Knuth. The TEXbook. Addison-Wesley, [85] Brooke Krajancich, Petr Kellnhofer, and Gordon Wet-
Reading, MA., 1984. zstein. A perceptual model for eccentricity-dependent
spatio-temporal flicker fusion and its applications to
[73] Donald E. Knuth. The Art of Computer Programming, foveated graphics. ACM Transactions on Graphics
Vol. 1: Fundamental Algorithms (3rd. ed.). Addison (TOG), 40(4):1–11, 2021.
Wesley Longman Publishing Co., Inc., 1997.
[86] Leslie Lamport. LATEX: A Document Preparation Sys-
[74] Donald E. Knuth. The Art of Computer Programming, tem. Addison-Wesley, Reading, MA., 1986.
volume 1 of Fundamental Algorithms. Addison Wes-
[87] Jan Lee. Transcript of question and answer session. In
ley Longman Publishing Co., Inc., 3rd edition, 1998.
Richard L. Wexelblat, editor, History of programming
(book).
languages I (incoll), pages 68–71. ACM, New York,
[75] Wei-Chang Kong. E-commerce and cultural values, NY, USA, 1981.
name of chapter: The implementation of electronic [88] Joo Chan Lee, Daniel Rho, Xiangyu Sun, Jong Hwan
commerce in SMEs in Singapore (Inbook-w-chap-w- Ko, and Eunbyung Park. Compact 3d gaussian rep-
type), pages 51–74. IGI Publishing, Hershey, PA, USA, resentation for radiance field. In Proceedings of the
2001. IEEE/CVF Conference on Computer Vision and Pat-
tern Recognition, pages 21719–21728, 2024.
[76] Wei-Chang Kong. The implementation of electronic
commerce in smes in singapore (as incoll). In E- [89] Newton Lee. Interview with bill kinder: January 13,
commerce and cultural values, pages 51–74. IGI Pub- 2005. Comput. Entertain., 3(1), Jan.-March 2005.
lishing, Hershey, PA, USA, 2001.
[90] Chaojian Li, Sixu Li, Yang Zhao, Wenbo Zhu, and
[77] Wei-Chang Kong. Chapter 9. In Theerasak Yingyan Lin. Rt-nerf: Real-time on-device neural ra-
Thanasankit, editor, E-commerce and cultural values diance fields towards immersive ar/vr rendering. In
(Incoll-w-text (chap 9) ’title’), pages 51–74. IGI Pub- Proceedings of the 41st IEEE/ACM International Con-
lishing, Hershey, PA, USA, 2002. ference on Computer-Aided Design, pages 1–9, 2022.
[91] Cheng-Lun Li, Ayse G. Buyuktur, David K. Hutchful,
[78] Wei-Chang Kong. The implementation of electronic
Natasha B. Sant, and Satyendra K. Nainwal. Portalis:
commerce in smes in singapore (incoll). In Theerasak
using competitive online interactions to support aid
Thanasankit, editor, E-commerce and cultural values,
initiatives for the homeless. In CHI ’08 extended ab-
pages 51–74. IGI Publishing, Hershey, PA, USA, 2003.
stracts on Human factors in computing systems, pages
3873–3878, New York, NY, USA, 2008. ACM.
[79] Wei-Chang Kong. E-commerce and cultural values -
(InBook-num-in-chap), chapter 9, pages 51–74. IGI [92] Ruilong Li, Hang Gao, Matthew Tancik, and Angjoo
Publishing, Hershey, PA, USA, 2004. Kanazawa. Nerfacc: Efficient sampling accelerates
nerfs. In 2023 IEEE/CVF International Conference on
[80] Wei-Chang Kong. E-commerce and cultural values Computer Vision (ICCV), pages 18491–18500, 2023.
(Inbook-text-in-chap), chapter: The implementation of
electronic commerce in SMEs in Singapore, pages 51– [93] Yixuan Li, Lihan Jiang, Linning Xu, Yuanbo Xiangli,
74. IGI Publishing, Hershey, PA, USA, 2005. Zhenzhi Wang, Dahua Lin, and Bo Dai. Matrixcity:
A large-scale city dataset for city-scale neural render-
[81] Wei-Chang Kong. E-commerce and cultural values ing and beyond. In Proceedings of the IEEE/CVF
(Inbook-num chap), chapter (in type field) 22, pages International Conference on Computer Vision, pages
51–74. IGI Publishing, Hershey, PA, USA, 2006. 3205–3215, 2023.

16
[94] Jiaqi Lin, Zhihao Li, Xiao Tang, Jianzhuang Liu, Shiy- [105] E. Mumford. Managerial expert systems and orga-
ong Liu, Jiayue Liu, Yangdi Lu, Xiaofei Wu, Songcen nizational change: some critical research issues. In
Xu, Youliang Yan, and Wenming Yang. Vastgaussian: Critical issues in information systems research (incoll),
Vast 3d gaussians for large scene reconstruction. In pages 135–155. John Wiley & Sons, Inc., New York,
CVPR, 2024. NY, USA, 1987.

[95] Weikai Lin, Yu Feng, and Yuhao Zhu. Rtgs: En- [106] A. Natarajan, M. Motani, B. de Silva, K. Yap, and K. C.
abling real-time gaussian splatting on mobile devices Chua. Investigating network architectures for body
using efficiency-guided pruning and foveated render- sensor networks. In G. Whitcomb and P. Neece, editors,
ing. arXiv preprint arXiv:2407.00435, 2024. Network Architectures, pages 322–328, Dayton, OH,
2007. Keleuven Press.
[96] Yang Liu, Chuanchen Luo, Lue Fan, Naiyan Wang, Jun-
ran Peng, and Zhaoxiang Zhang. Citygaussian: Real- [107] F. Nielson. Program transformations in a denotational
time high-quality large-scale scene rendering with setting. ACM Trans. Program. Lang. Syst., 7(3):359–
gaussians. In European Conference on Computer Vi- 379, July 1985.
sion, pages 265–282. Springer, 2025.
[108] Dave Novak. Solder man. In ACM SIGGRAPH 2003
[97] Tao Lu, Mulin Yu, Linning Xu, Yuanbo Xiangli, Limin Video Review on Animation theater Program: Part I
Wang, Dahua Lin, and Bo Dai. Scaffold-gs: Structured - Vol. 145 (July 27–27, 2003), page 4, New York, NY,
3d gaussians for view-adaptive rendering. In Proceed- March 21, 2008 2003. ACM Press.
ings of the IEEE/CVF Conference on Computer Vision
[109] Barack Obama. A more perfect union. Video, March
and Pattern Recognition, pages 20654–20664, 2024.
2008.
[98] David P. Luebke, Martin Reddy, Jonathan D. Cohen,
[110] Jinwoo Park, Ik-Beom Jeon, Sung-Eui Yoon, and
Amitabh Varshney, Benjamin Watson, and Robert A.
Woontack Woo. Instant panoramic texture mapping
Huebner. Level of Detail for 3D Graphics. Morgan
with semantic object matching for large-scale urban
Kaufmann Publishers Inc., 2012.
scene reproduction. IEEE Transactions on Visualiza-
[99] Elian Malkin, Arturo Deza, and Tomaso Poggio. Cuda- tion and Computer Graphics, 27(5):2746–2756, 2021.
optimized real-time rendering of a foveated visual sys- [111] Adam Paszke, Sam Gross, Francisco Massa, Adam
tem. arXiv preprint arXiv:2012.08655, 2020. Lerer, James Bradbury, Gregory Chanan, Trevor
Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga,
[100] Rafał K Mantiuk, Gyorgy Denes, Alexandre Chapiro,
Alban Desmaison, Andreas Köpf, Edward Yang, Zach
Anton Kaplanyan, Gizem Rufo, Romain Bachy, Trisha
DeVito, Martin Raison, Alykhan Tejani, Sasank Chil-
Lian, and Anjul Patney. Fovvideovdp: A visible dif-
amkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and
ference predictor for wide field-of-view video. ACM
Soumith Chintala. PyTorch: an imperative style, high-
Transactions on Graphics (TOG), 40(4):1–19, 2021.
performance deep learning library, page 12. Curran
[101] Daniel D. McCracken and Donald G. Golden. Sim- Associates Inc., Red Hook, NY, USA, 2019.
plified Structured COBOL with Microsoft/MicroFocus
[112] Charles J. Petrie. New algorithms for dependency-
COBOL. John Wiley & Sons, Inc., New York, NY,
directed backtracking (master’s thesis). Technical re-
USA, 1990.
port, Austin, TX, USA, 1986.
[102] Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, [113] Charles J. Petrie. New algorithms for dependency-
Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. directed backtracking (master’s thesis). Master’s thesis,
Nerf. Communications of the ACM, 65:99 – 106, 2020. University of Texas at Austin, Austin, TX, USA, 1986.
[103] Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, [114] Poker-Edge.Com. Stats and analysis, March 2006.
Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng.
Nerf: Representing scenes as neural radiance fields for [115] R Core Team. R: A language and environment for
view synthesis. Communications of the ACM, 65(1):99– statistical computing, 2019.
106, 2021.
[116] Brian K. Reid. A high-level approach to computer
[104] Sape Mullender, editor. Distributed systems (2nd document formatting. In Proceedings of the 7th Annual
Ed.). ACM Press/Addison-Wesley Publishing Co., Symposium on Principles of Programming Languages,
New York, NY, USA, 1993. pages 24–31, New York, January 1980. ACM.

17
[117] Christian Reiser, Rick Szeliski, Dor Verbin, Pratul [128] Towaki Takikawa, Alex Evans, Jonathan Tremblay,
Srinivasan, Ben Mildenhall, Andreas Geiger, Jon Bar- Thomas Müller, Morgan McGuire, Alec Jacobson, and
ron, and Peter Hedman. Merf: Memory-efficient Sanja Fidler. Variable bitrate neural fields. In ACM
radiance fields for real-time view synthesis in un- SIGGRAPH 2022 Conference Proceedings, pages 1–9,
bounded scenes. ACM Transactions on Graphics 2022.
(TOG), 42(4):1–12, 2023.
[129] Towaki Takikawa, Joey Litalien, Kangxue Yin, Karsten
[118] Kerui Ren, Lihan Jiang, Tao Lu, Mulin Yu, Linning Kreis, Charles Loop, Derek Nowrouzezahrai, Alec Ja-
Xu, Zhangkai Ni, and Bo Dai. Octree-gs: Towards cobson, Morgan McGuire, and Sanja Fidler. Neural
consistent real-time rendering with lod-structured 3d geometric level of detail: Real-time rendering with
gaussians. arXiv preprint arXiv:2403.17898, 2024. implicit 3d shapes. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recogni-
[119] Sara Rojas, Jesus Zarzar, Juan C Pérez, Artsiom tion, pages 11358–11367, 2021.
Sanakoyeu, Ali Thabet, Albert Pumarola, and Bernard
Ghanem. Re-rend: Real-time rendering of nerfs across [130] Towaki Takikawa, Or Perel, Clement Fuji Tsang,
devices. In Proceedings of the IEEE/CVF Interna- Charles Loop, Joey Litalien, Jonathan Tremblay,
tional Conference on Computer Vision, pages 3632– Sanja Fidler, and Maria Shugrina. Kaolin wisp:
3641, 2023. A pytorch library and engine for neural fields re-
search. https://github.com/NVIDIAGameWorks/
[120] Bernard Rous. The enabling of digital libraries. Digital kaolin-wisp, 2022.
Libraries, 12(3), July 2008. To appear.
[131] Matthew Tancik, Vincent Casser, Xinchen Yan,
[121] Mehdi Saeedi, Morteza Saheb Zamani, and Mehdi Sabeek Pradhan, Ben Mildenhall, Pratul P Srinivasan,
Sedighi. A library-based synthesis methodology for re- Jonathan T Barron, and Henrik Kretzschmar. Block-
versible logic. Microelectron. J., 41(4):185–194, April nerf: Scalable large scene neural view synthesis. In Pro-
2010. ceedings of the IEEE/CVF Conference on Computer Vi-
sion and Pattern Recognition, pages 8248–8258, 2022.
[122] Mehdi Saeedi, Morteza Saheb Zamani, Mehdi Sedighi,
and Zahra Sasanian. Synthesis of reversible circuit us- [132] Matthew Tancik, Ethan Weber, Evonne Ng, Ruilong
ing cycle-based approach. J. Emerg. Technol. Comput. Li, Brent Yi, Justin Kerr, Terrance Wang, Alexan-
Syst., 6(4), December 2010. der Kristoffersen, Jake Austin, Kamyar Salahi, Abhik
Ahuja, David McAllister, and Angjoo Kanazawa. Nerf-
[123] S.L. Salas and Einar Hille. Calculus: One and Several studio: A modular framework for neural radiance field
Variable. John Wiley and Sons, New York, 1978. development. In ACM SIGGRAPH 2023 Conference
Proceedings, SIGGRAPH ’23, 2023.
[124] Joseph Scientist. The fountain of youth, August 2009.
Patent No. 12345, Filed July 1st., 2008, Issued Aug. [133] Jiaxiang Tang, Hang Zhou, Xiaokang Chen, Tianshu
9th., 2009. Hu, Errui Ding, Jingdong Wang, and Gang Zeng. Del-
icate textured mesh recovery from nerf via adaptive
[125] Stan W. Smith. An experiment in bibliographic mark- surface refinement. In Proceedings of the IEEE/CVF
up: Parsing metadata for xml export. In Reginald N. International Conference on Computer Vision, pages
Smythe and Alexander Noble, editors, Proceedings of 17739–17749, 2023.
the 3rd. annual workshop on Librarians and Comput-
[134] Harry Thornburg. Introduction to bayesian statistics,
ers, volume 3 of LAC ’10, pages 422–431, Milan Italy,
March 2001.
2010. Paparazzi Press.
[135] Institutional members of the TEX users group, 2017.
[126] Liangchen Song, Anpei Chen, Zhong Li, Zhang Chen,
Lele Chen, Junsong Yuan, Yi Xu, and Andreas Geiger. [136] Haithem Turki, Deva Ramanan, and Mahadev Satya-
Nerfplayer: A streamable dynamic scene representa- narayanan. Mega-nerf: Scalable construction of large-
tion with decomposed neural radiance fields. IEEE scale nerfs for virtual fly-throughs. In Proceedings of
Transactions on Visualization and Computer Graphics, the IEEE/CVF Conference on Computer Vision and
29(5):2732–2742, 2023. Pattern Recognition, pages 12922–12931, 2022.

[127] Asad Z. Spector. Achieving application requirements. [137] Okan Tarhan Tursun, Elena Arabadzhiyska-Koleva,
In Sape Mullender, editor, Distributed Systems, pages Marek Wernikowski, Radosław Mantiuk, Hans-
19–33. ACM Press, New York, NY, 2nd. edition, 1990. Peter Seidel, Karol Myszkowski, and Piotr Didyk.

18
Luminance-contrast-aware foveated rendering. ACM Lorenzo Porzi, Peter Kontschieder, Aljaž Božič, et al.
Transactions on Graphics (TOG), 38(4):1–14, 2019. Vr-nerf: High-fidelity virtualized walkable spaces. In
SIGGRAPH Asia 2023 Conference Papers, pages 1–12,
[138] A. Tzamaloukas and J. J. Garcia-Luna-Aceves. 2023.
Channel-hopping multiple access. Technical Report I-
CA2301, Department of Computer Science, University [150] Linning Xu, Yuanbo Xiangli, Sida Peng, Xingang Pan,
of California, Berkeley, CA, 2000. Nanxuan Zhao, Christian Theobalt, Bo Dai, and Dahua
Lin. Grid-guided neural radiance fields for large urban
[139] Nisarg Ujjainkar, Ethan Shahan, Kenneth Chen, Bud- scenes. In Proceedings of the IEEE/CVF Conference
monde Duinkharjav, Qi Sun, and Yuhao Zhu. Ex- on Computer Vision and Pattern Recognition, pages
ploiting human color discrimination for memory-and 8296–8306, 2023.
energy-efficient image encoding in virtual reality. In
Proceedings of the 29th ACM International Confer- [151] Lior Yariv, Peter Hedman, Christian Reiser, Dor Verbin,
ence on Architectural Support for Programming Lan- Pratul P Srinivasan, Richard Szeliski, Jonathan T Bar-
guages and Operating Systems, Volume 1, pages 166– ron, and Ben Mildenhall. Bakedsdf: Meshing neural
180, 2024. sdfs for real-time view synthesis. In ACM SIGGRAPH
2023 Conference Proceedings, pages 1–9, 2023.
[140] Boris Veytsman. acmart—Class for typesetting publi-
cations of ACM, 2017. [152] Alex Yu, Ruilong Li, Matthew Tancik, Hao Li, Ren
Ng, and Angjoo Kanazawa. Plenoctrees for real-time
[141] Lili Wang, Xuehuai Shi, and Yi Liu. Foveated render- rendering of neural radiance fields. In Proceedings of
ing: A state-of-the-art survey. Computational Visual the IEEE/CVF International Conference on Computer
Media, 9(2):195–228, 2023. Vision, pages 5752–5761, 2021.

[142] Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simon- [153] Richard Zhang, Phillip Isola, Alexei A Efros, Eli
celli. Image quality assessment: from error visibility Shechtman, and Oliver Wang. The unreasonable ef-
to structural similarity. IEEE Transactions on Image fectiveness of deep features as a perceptual metric. In
Processing, 13(4):600–612, 2004. CVPR, 2018.

[143] Yu Wen, Chenhao Xie, Shuaiwen Leon Song, and Xin [154] Yuqi Zhang, Guanying Chen, and Shuguang Cui. Effi-
Fu. Post0-vr: Enabling universal realistic rendering for cient large-scale scene representation with a hybrid of
modern vr via exploiting architectural similarity and high-resolution grid and plane features. arXiv preprint
data sharing. In 2023 IEEE International Symposium arXiv:2303.03003, 2023.
on High-Performance Computer Architecture (HPCA), [155] Hexu Zhao, Haoyang Weng, Daohan Lu, Ang Li,
pages 390–402. IEEE, 2023. Jinyang Li, Aurojit Panda, and Saining Xie. On scaling
up 3d gaussian splatting training, 2024.
[144] Elizabeth M. Wenzel. Three-dimensional virtual acous-
tic displays. In Multimedia interface design (incoll), [156] Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie,
pages 257–288. ACM, New York, NY, USA, 1992. Chuyue Sun, Jeff Huang, Cody Hao Yu, Shiyi Cao,
Christos Kozyrakis, Ion Stoica, Joseph E Gonzalez,
[145] Renato Werneck, João Setubal, and Arlindo da Con-
et al. Sglang: Efficient execution of structured language
ceicão. (new) finding minimum congestion spanning
model programs. arXiv preprint arXiv:2312.07104,
trees. J. Exp. Algorithmics, 5, December 2000.
2024.
[146] Renato Werneck, João Setubal, and Arlindo da Con- [157] G. Zhou, J. Lu, C.-Y. Wan, M. D. Yarvis, and J. A.
ceicão. (old) finding minimum congestion spanning Stankovic. Body Sensor Networks. MIT Press, Cam-
trees. J. Exp. Algorithmics, 5:11, 2000. bridge, MA, 2008.
[147] Thomas Winklehner and Renato Pajarola. Single-pass [158] Gang Zhou, Yafeng Wu, Ting Yan, Tian He, Chengdu
multi-view volume rendering. 07 2007. Huang, John A. Stankovic, and Tarek F. Abdelzaher.
A multifrequency mac specially designed for wireless
[148] Tong Wu, Yu-Jie Yuan, Ling-Xiao Zhang, Jie Yang,
sensor network applications. ACM Trans. Embed. Com-
Yan-Pei Cao, Ling-Qi Yan, and Lin Gao. Recent ad-
put. Syst., 9(4):39:1–39:41, April 2010.
vances in 3d gaussian splatting. Computational Visual
Media, pages 1–30, 2024. [159] M. Zwicker, H. Pfister, J. van Baar, and M. Gross. Ewa
volume splatting. In Proceedings Visualization, 2001.
[149] Linning Xu, Vasu Agrawal, William Laney, Tony Gar- VIS ’01., pages 29–538, 2001.
cia, Aayush Bansal, Changil Kim, Samuel Rota Bulò,

Int&Fun&Med&Nut&the&1 ST
100% (23)
Int&Fun&Med&Nut&the&1 ST
1,088 pages
Sce Gastro Sample Qs
100% (8)
Sce Gastro Sample Qs
36 pages
Taller 2
No ratings yet
Taller 2
11 pages
Muslim Morning and Evening Routine
100% (1)
Muslim Morning and Evening Routine
23 pages
Research Paper (1)
No ratings yet
Research Paper (1)
15 pages
2506.05682v1
No ratings yet
2506.05682v1
15 pages
Lu Scaffold-GS Structured 3D Gaussians for View-Adaptive Rendering CVPR 2024 Paper
No ratings yet
Lu Scaffold-GS Structured 3D Gaussians for View-Adaptive Rendering CVPR 2024 Paper
11 pages
Creating Virtual Environments with 3D Gaussian Splatting
No ratings yet
Creating Virtual Environments with 3D Gaussian Splatting
2 pages
THESIS
No ratings yet
THESIS
20 pages
2503.16681v1
No ratings yet
2503.16681v1
7 pages
Xu 等 - 2023 - 4K4D Real-Time 4D View Synthesis at 4K Resolution
No ratings yet
Xu 等 - 2023 - 4K4D Real-Time 4D View Synthesis at 4K Resolution
17 pages
Vastgaussian: Vast 3D Gaussians For Large Scene Reconstruction
No ratings yet
Vastgaussian: Vast 3D Gaussians For Large Scene Reconstruction
12 pages
Tutorial CGASE
No ratings yet
Tutorial CGASE
18 pages
2311.16121v2
No ratings yet
2311.16121v2
13 pages
3DGS-Enhancer: Enhancing Unbounded 3D Gaussian Splatting With View-Consistent 2D Diffusion Priors
No ratings yet
3DGS-Enhancer: Enhancing Unbounded 3D Gaussian Splatting With View-Consistent 2D Diffusion Priors
16 pages
State of the Art on Deep Learning-enhanced Rendering Methods
No ratings yet
State of the Art on Deep Learning-enhanced Rendering Methods
23 pages
Shin 等 - 2025 - Locality-Aware Gaussian Compression for Fast and H
No ratings yet
Shin 等 - 2025 - Locality-Aware Gaussian Compression for Fast and H
28 pages
What You See Is What You GAN: Rendering Every Pixel For High-Fidelity Geometry in 3D GANs
No ratings yet
What You See Is What You GAN: Rendering Every Pixel For High-Fidelity Geometry in 3D GANs
27 pages
SWinGS- Sliding Windows for Dynamic 3D Gaussian Splatting
No ratings yet
SWinGS- Sliding Windows for Dynamic 3D Gaussian Splatting
18 pages
BakedSDF - Meshing Neural SDFs For Real-Time View Synthesis
No ratings yet
BakedSDF - Meshing Neural SDFs For Real-Time View Synthesis
9 pages
Turki Mega-NERF Scalable Construction of Large-Scale NeRFs For Virtual Fly-Throughs CVPR 2022 Paper
No ratings yet
Turki Mega-NERF Scalable Construction of Large-Scale NeRFs For Virtual Fly-Throughs CVPR 2022 Paper
10 pages
3d Gaussian Splatting High
No ratings yet
3d Gaussian Splatting High
14 pages
NeuManifold: Neural Watertight Manifold Reconstruction With Efficient and High-Quality Rendering Support
No ratings yet
NeuManifold: Neural Watertight Manifold Reconstruction With Efficient and High-Quality Rendering Support
16 pages
3D Gaussian Splatting for Real-Time Radiance Field Rendering - 3d_gaussian_splatting_high
No ratings yet
3D Gaussian Splatting for Real-Time Radiance Field Rendering - 3d_gaussian_splatting_high
14 pages
S3032 Advanced Scenegraph Rendering Pipeline PDF
No ratings yet
S3032 Advanced Scenegraph Rendering Pipeline PDF
42 pages
2025 HPCA: Gaussian Accelerator
No ratings yet
2025 HPCA: Gaussian Accelerator
13 pages
Ray Tracing On GPU
No ratings yet
Ray Tracing On GPU
44 pages
Interactive Rendering Using The Render Cache: Abstract
No ratings yet
Interactive Rendering Using The Render Cache: Abstract
13 pages
2406.12080v1
No ratings yet
2406.12080v1
15 pages
ARVRVR
No ratings yet
ARVRVR
14 pages
Point Based Rendering Enhancement Via Deep Learning: December 16, 2018
No ratings yet
Point Based Rendering Enhancement Via Deep Learning: December 16, 2018
13 pages
F3D-Gaus
No ratings yet
F3D-Gaus
15 pages
Games: Mesh-Based Adapting and Modification of Gaussian Splatting
No ratings yet
Games: Mesh-Based Adapting and Modification of Gaussian Splatting
13 pages
Icase: IIII11l11ll Ill
No ratings yet
Icase: IIII11l11ll Ill
26 pages
Front End Vision: A Multiscale Geometry Engine
No ratings yet
Front End Vision: A Multiscale Geometry Engine
36 pages
2022-ToG-Instant Neural Graphics Primitives With a Multiresolution
No ratings yet
2022-ToG-Instant Neural Graphics Primitives With a Multiresolution
15 pages
GRM: Large Gaussian Reconstruction Model For Efficient 3D Reconstruction and Generation
No ratings yet
GRM: Large Gaussian Reconstruction Model For Efficient 3D Reconstruction and Generation
29 pages
Abhi-report
No ratings yet
Abhi-report
9 pages
Gaussian Splashing
No ratings yet
Gaussian Splashing
11 pages
ARVR UNIT 2
No ratings yet
ARVR UNIT 2
15 pages
2409.10041v1
No ratings yet
2409.10041v1
7 pages
s7459 VR Rendering Improvements Featuring Autodesk Vred
No ratings yet
s7459 VR Rendering Improvements Featuring Autodesk Vred
47 pages
06_gpuarch
No ratings yet
06_gpuarch
78 pages
AVR_Unit-3 Notes
No ratings yet
AVR_Unit-3 Notes
19 pages
Sun 等 - 2024 - 3DGStream On-the-Fly Training of 3D Gaussians for
No ratings yet
Sun 等 - 2024 - 3DGStream On-the-Fly Training of 3D Gaussians for
14 pages
SUNDAE: Spectrally Pruned Gaussian Fields With Neural Compensation 2405.00676v1
No ratings yet
SUNDAE: Spectrally Pruned Gaussian Fields With Neural Compensation 2405.00676v1
10 pages
Deformable Beta Splatting
No ratings yet
Deformable Beta Splatting
14 pages
Deep Fourier-based Arbitrary-scale Super-resolution for Real-time Rendering
No ratings yet
Deep Fourier-based Arbitrary-scale Super-resolution for Real-time Rendering
11 pages
Asplos24 Gscore
No ratings yet
Asplos24 Gscore
15 pages
School of Computer Science & Engineering: Unit-III Environment Modelling and Interactive Techniques in VR
No ratings yet
School of Computer Science & Engineering: Unit-III Environment Modelling and Interactive Techniques in VR
68 pages
Sun 3DGStream on-The-Fly Training of 3D Gaussians for Efficient Streaming of CVPR 2024 Paper
No ratings yet
Sun 3DGStream on-The-Fly Training of 3D Gaussians for Efficient Streaming of CVPR 2024 Paper
11 pages
Ray Tracing On GPU: University of Applied Sciences Basel (FHBB) Diploma Thesis
No ratings yet
Ray Tracing On GPU: University of Applied Sciences Basel (FHBB) Diploma Thesis
44 pages
Unit 5 Notes
No ratings yet
Unit 5 Notes
23 pages
Azinovic_Neural_RGB-D_Surface_Reconstruction_CVPR_2022_paper
No ratings yet
Azinovic_Neural_RGB-D_Surface_Reconstruction_CVPR_2022_paper
12 pages
C G V2: E G A R L - S S: ITY Aussian Fficient and Eometrically Ccurate Econstruction FOR Arge Cale Cenes
No ratings yet
C G V2: E G A R L - S S: ITY Aussian Fficient and Eometrically Ccurate Econstruction FOR Arge Cale Cenes
17 pages
GIDS
No ratings yet
GIDS
14 pages
2409.10041v1
No ratings yet
2409.10041v1
7 pages
Semantics-Controlled Gaussian Splatting
No ratings yet
Semantics-Controlled Gaussian Splatting
11 pages
2211.09869v4
No ratings yet
2211.09869v4
15 pages
01 Introreview PDF
No ratings yet
01 Introreview PDF
130 pages
Mip-Splatting_Alias-free_3D_Gaussian_Splatting_CVPR_2024_paper
No ratings yet
Mip-Splatting_Alias-free_3D_Gaussian_Splatting_CVPR_2024_paper
10 pages
Advanced Techniques in GSAP Animation: Definitive Reference for Developers and Engineers
From Everand
Advanced Techniques in GSAP Animation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Volume Rendering: Exploring Visual Realism in Computer Vision
From Everand
Volume Rendering: Exploring Visual Realism in Computer Vision
Fouad Sabry
No ratings yet
GPU Assembly and Shader Programming for Compute: Low-Level Optimization Techniques for High-Performance Parallel Processing
From Everand
GPU Assembly and Shader Programming for Compute: Low-Level Optimization Techniques for High-Performance Parallel Processing
Robert Johnson
No ratings yet
How To Avoid: Office Politics
No ratings yet
How To Avoid: Office Politics
14 pages
SSLC Mathematics Workbook Old
No ratings yet
SSLC Mathematics Workbook Old
368 pages
Dimensional Homogeneity & Dimensionless Numbers
No ratings yet
Dimensional Homogeneity & Dimensionless Numbers
74 pages
Let Us Learn Gurmukhi - Book II
No ratings yet
Let Us Learn Gurmukhi - Book II
53 pages
Sayeeb Saleh
No ratings yet
Sayeeb Saleh
1 page
Design and Implementation of A Low Cost 3D Printed Humanoid Robotic Platform
No ratings yet
Design and Implementation of A Low Cost 3D Printed Humanoid Robotic Platform
6 pages
IAS101 OBE Syllabus
No ratings yet
IAS101 OBE Syllabus
2 pages
Granny Style Cardigancomp
No ratings yet
Granny Style Cardigancomp
19 pages
CH 13 10math
No ratings yet
CH 13 10math
16 pages
A3 Signature Card
100% (1)
A3 Signature Card
2 pages
New Debit Card Application Form
No ratings yet
New Debit Card Application Form
1 page
Cisco Nexus 9000 Series NX-OS Interfaces Configuration Guide, Release 7.x
No ratings yet
Cisco Nexus 9000 Series NX-OS Interfaces Configuration Guide, Release 7.x
368 pages
Trade by The Book - A Guide To Reading Order Flow
0% (1)
Trade by The Book - A Guide To Reading Order Flow
3 pages
ACI 233 (2003) Slag Cement in Concrete and Mortar
100% (1)
ACI 233 (2003) Slag Cement in Concrete and Mortar
24 pages
Example of Petrographic Report
No ratings yet
Example of Petrographic Report
3 pages
Syllabus - Master Programme in Health Informatics
No ratings yet
Syllabus - Master Programme in Health Informatics
8 pages
From Traditional Grammar To Functional Grammar
No ratings yet
From Traditional Grammar To Functional Grammar
57 pages
Linear Regression With Multiple Features
No ratings yet
Linear Regression With Multiple Features
7 pages
Timeline Activity
No ratings yet
Timeline Activity
2 pages
TUTU Review
No ratings yet
TUTU Review
6 pages
Phoenix BIOS Beep and Error Codes
No ratings yet
Phoenix BIOS Beep and Error Codes
12 pages
Dopamine Speech
No ratings yet
Dopamine Speech
1 page
Foster ch04 4th Edition
No ratings yet
Foster ch04 4th Edition
28 pages
Definition - What Does Mean?: Artificial Intelligence (AI)
No ratings yet
Definition - What Does Mean?: Artificial Intelligence (AI)
3 pages
Life Cycles of Humans and Animals Presentation
No ratings yet
Life Cycles of Humans and Animals Presentation
16 pages
1 Organic and Biomolecular Chemistry, 2018, 16, 1402 Compressed
No ratings yet
1 Organic and Biomolecular Chemistry, 2018, 16, 1402 Compressed
18 pages

Tao 等 - 2025 - GS-Cache a GS-Cache Inference Framework for Large

Uploaded by

Tao 等 - 2025 - GS-Cache a GS-Cache Inference Framework for Large

Uploaded by

GS-Cache: A GS-Cache Inference Framework for Large-scale Gaussian Splatting

2 Tsinghua University, Beijing, China

Abstract computation and memory intensity of real-time rendering in

4 Cache-centric Computation Reuse Strategy

Figure 6: Continuous rendering cache: In continuous render-

Figure 9: Worker control state transition responding to FPS

Table 3: Rendering performance comparison on Matrixcity street scene

differ in the two scenes. Our framework demonstrates its ad-

Table 6: Rendering ablation comparison on Matrixcity street scene.

References [12] Sam Anzaroot and Andrew McCallum. UMass citation

[43] Guofeng Feng, Siyan Chen, Rong Fu, Zimu Liao,

You might also like