Tao 等 - 2025 - GS-Cache a GS-Cache Inference Framework for Large
Tao 等 - 2025 - GS-Cache a GS-Cache Inference Framework for Large
Models
Miao Tao1* , Yuanzhen Zhou1* , Haoran Xu1* , Zeyu He1 , Zhenyu Yang1 , Yuchang Zhang1 , Zhongling Su1 ,
Linning Xu1 , Zhenxiang Ma1 , Rong Fu1 , Hengjie Li1 , Xingcheng Zhang1 , and Jidong Zhai2
1 Shanghai Artificial Intelligence Laboratory, Shanghai, China
arXiv:2502.14938v1 [cs.CV] 20 Feb 2025
1
makes it difficult to support the real-time rendering frame rate
requirements for immersive VR experiences in large-scale
scenes. GS-Cache framework provides a new solution from
the perspective of computing systems compatible with various
rendering pipelines based on Gaussian derivation strategies.
The optimized computing pipeline eliminates the computing
redundancy, performs effective computing reuse for immer-
sive VR experience, and flexibly schedules GPU computing
resources during the rendering process to ensure stable and
high rendering frame rates and optimize the energy efficiency
of consumer-grade GPU resources. It accelerates main com-
Figure 1: FPS of the basic rending pipeline. As the viewpoint puting bottlenecks in the pipeline through dedicated CUDA
shifts from the ground to a higher altitude and farther away, kernels, further improving the performance of VR rendering.
the height increases, the scene becomes larger, and the FPS Our main contributions include:
decreases accordingly.
• A cache-centric computation de-redundancy rendering
pipeline that effectively eliminates redundancy in stereo
tire framework, the right side shows its elastic parallel sched- continuous rendering, enabling dynamic cache depth that
uler structure, and the left side illustrates the cache-centric balances performance and quality.
rendering pipeline structure. The elastic parallel scheduler
schedules GPU resources dynamically, which can steady the • A multi-GPU elastic parallel rendering scheduler that
FPS and avoid resource waste. For structured 3DGS mod- dynamically allocates consumer-grade GPU resources,
els, we transform the original pipeline into the cache-centric ensuring stable and high rendering frame rates while
pipeline, which aims to improve rendering speed based on the enhancing energy efficiency.
principles of de-redundancy and reuse. Additionally, aiming
at the bottleneck stages in general computing patterns of the • An end-to-end rendering framework designed for immer-
structured Gaussian derivation rendering pipeline, we intro- sive VR experiences, the first holistic system that meets
duce some dedicated CUDA [46] kernels for further accelera- binocular 2K photo-realistic quality rendering require-
tion, which enhance the frame rate performance of real-time ments of 72 FPS for aerial views and 120 FPS for street
rendering during long-time rendering. views in city-level scenes with dedicated efficient CUDA
implementation.
The process of rendering a 3D reconstruction scene in-
volves inference and transformation of the learned 3D spatial
features, which makes the conventional computing frame- 2 Related Work
works that focus on one-dimensional or two-dimensional fea-
tures such as text and images weak in related tasks, such Our work focuses primarily on the real-time and photo-
as Pytorch [111], Tensorflow [3], JAX [21], etc. These deep realistic rendering of large-scale Gaussian splatting scenes,
learning frameworks are versatile and scalable, enough to encompassing city-level scenes with several square kilome-
implement the basic computing pipeline of neural rendering ters. Although novel view synthesis based on neural rendering
methods such as NeRF and 3DGS. Still, it is challenging has made significant achievements in various applications in
to achieve ease of use for further development in rendering recent years, there remains a gap in meeting the demands of
applications, and there is a lack of dedicated operators to sup- rendering performance, quality fidelity, and computational
port sparse computing in high-dimensional space, resulting efficiency required for VR rendering. We provide a brief
in the computing speed of the rendering pipeline being un- overview of the most relevant works, focusing on real-time
able to achieve real-time rendering. A series of dedicated photo-realistic rendering, large-scale novel view synthesis,
frameworks for neural rendering, such as NeRFStudio [132] and rendering framework optimizations.
and Kaolin-Wisp [130], have improved the ease of use for Real-Time Photo-realistic Rendering VR rendering is
model structure experimental research through modulariza- computationally expensive, requiring high-speed and high-
tion; and dedicated operator libraries for sparse computing, quality real-time rendering, which may be hindered by quality
such as Nerfacc [92], have improved the overall rendering degradation and latency overhead in the general rendering
speed by accelerating some stages in the NeRF computing pipeline [126]. To achieve high-fidelity rendering with mini-
pipeline. These works have built a strong community influ- mal latency under relatively low computation resources, var-
ence and quickly promoted related work on neural render- ious optimization methods have been proposed. Foveated
ing such as NeRF and 3DGS, and expanded the applications rendering is a rendering accelerated method, the pioneer
based on neural rendering. However, the rendering speed still work [48] providing a foundational theory and approach,
2
the subsequent works [85, 100, 137] etc., exploring differ- GS [68] designed a hierarchical structure for multi-resolution
ent enhancements and applications. Leveraging eye-tracking representation to improve rendering speed. Other large-scale
technology, foveated rendering tends to allocate more compu- scene reconstruction and rendering works [96, 97, 118, 149]
tational resources in rendering the focus area of the images have also adopted LoD to accelerate the rendering pipeline.
while less the periphery area [141]. To speed up the neural ren- Rendering Framework Optimization In large-scale novel
dering like NeRF, in order to fulfill requirements for real-time view synthesis and city-level scene rendering, the stability of
rendering, including VR rendering, some works have shifted high-speed rendering frame rates remains an intractable prob-
from pure implicit neural representation towards hybrid or lem due to variations such as viewpoint and field of view
explicit primitive-based neural representations and hardware- (FOV), as well as the limitations of computational resources.
based acceleration [25, 56, 133]. VR-NeRF [149] achieves However, little research has focused on optimizations for
high-quality VR rendering using multiple GPUs for parallel large-scale VR rendering from the perspective of a compu-
computation, RT-NeRF [90] realize real-time VR rendering tation system, and most existing methods concentrate pri-
both on cloud and edge devices through efficient pipeline and marily on mesh-based rendering rather than neural render-
dedicated hardware accelerator. Re-ReND [119] presents a ing pipelines. MeshReduce [65] optimizes communication
low resource consumption real-time NeRF rendering method strategy and efficiently converts the scene geometry into the
available on resource-constrained devices. [117, 151, 152] dis- meshes without restraints from computation and memory, yet
till a pretrained NeRF into a sparse structure, enhancing the the stability of rendering frame rates is still difficult to main-
real-time rendering performance. Different from the afore- tain. RT-NeRF [90] employs a hybrid sparse encoding method
mentioned methods, to speed up neural rendering like 3DGS, and proposes a NeRF-based storage optimization besides
another strategy for rendering acceleration involves model its dedicated hardware system. Post0-VR [143] leverages
pruning and structuring for redundancy removal and effective data similarities to accelerate rendering by eliminating redun-
spatial representation. Methods include [?, 88, 95] pruning dant computations and merging common visual effects into
Gaussians and reducing model parameters after reconstruc- the standard rendering pipeline systematically. [99] utilizes
tion to accelerate the rendering pipeline. Scaffold-GS [97] shared memory and data reuse to enhance the performance of
organizes Gaussians using a structured sparse voxel grid and foveated rendering.
attaches learnable features to each voxel center as an anchor, Our work introduces a novel end-to-end rendering frame-
Octree-GS [118] further employs a structured octree grid for work for large 3DGS models. Optimizations are applied by an
anchors placement. innovative GPU scheduling method, a cache-centric rendering
Large 3D Model Inference Neural reconstruction and pipeline specifically tailored for Gaussian-based rendering,
rendering are also attributed to Novel View Synthesis. In and dedicated CUDA kernels to stabilize high-speed render-
large-scale scenes, it has been a long-standing problem in ing across immersive VR experiences.
research and engineering. First of all, the fidelity of large-
scale rendering is directly contingent upon the quality of
the underlying 3D representation models, particularly when
3 Rendering Pipeline and Framework Design
reconstructed from real-world scenes. Large-scale scene re-
GS-Cache is an innovative and holistic rendering framework
construction primarily utilizes a divide-and-conquer strategy
designed to support the real-time rendering of large-scale
through scene decomposition methods to expand the capa-
3D scene (3DGS) models at the city level. It enables users
bilities of the model [131, 136], while Zip-NeRF [17] and
to roam in aerial or street views in binocular 2K resolution,
Grid-NeRF [150] better refined the effectiveness and perfor-
achieving an average frame rate exceeding 72 FPS. Given
mance of representation for the large-scale scene. Except for
the challenges associated with the real-time photo-realistic
the NeRF-based methods [110] extracts semantic information
rendering of large-scale 3DGS models, particularly on VR
from street-view images and employs panoramic texture map-
applications, we have developed a scheduling framework that
ping method in large-scale scenes novel view synthesis for
supports elastic parallel rendering. Aiming at the Gaussian
realism reproduction. To ensure that novel view synthesis for
derivation rendering pipeline patterns, we also propose an ef-
VR real-time rendering maintains a stable frame rate under
ficient cache-centric rendering pipeline with a dynamic cache
large-scale scenes, an effective method is the Level of Detail
strategy that maintains rendering quality.
(LoD) strategy. Guided by heuristic rules or specific resource
allocation settings, LoD dynamically adjusts the level of de-
tail layers rendered in real-time [98]. [129] first introduced 3.1 Rendering Patterns and Pipeline Bottle-
the concept of LoD into neural radiance fields and neural necks
signed distance, Mip-NeRF [16] and Variable Bitrate Neural
Fields [128] applying it in the context of multi-scale repre- 3DGS represents the structure and color of a scene using a
sentation and geometry compressed streaming. LoD has also series of anisotropic 3D Gaussians, rendering through raster-
been employed in Gaussian-based representations, Hierarchy- ization. Structured Gaussian derivation methods use fewer
3
anchors Gaussians and generate more 3D Gaussians from
anchors to save GPU resources.
Rendering Patterns In a point cloud, the position coor-
dinates of each element serve as the mean µ, generating the
corresponding 3D Gaussian for differential rasterization ren-
dering:
(x − µ)
G(x) = exp(−(x − µ)T Σ−1 ) (1)
2 Figure 2: The basic rendering pipeline and our optimized
pipeline of structured Gaussian derivation methods
Σ = RSST RT (2)
where x represents any position in the scene space, and Σ
sian splatting method; it is still difficult to achieve high-
denotes the covariance matrix of the 3D Gaussian. Σ can be
speed rendering (above 72FPS) and ultra-high-speed ren-
decomposed into a rotation matrix R and a scaling matrix S to
dering (above 120FPS) in large-scale scenes such as city-
maintain its positive definiteness. In addition to the mentioned
scale scenes. There are two bottleneck stages in the rendering
attributes, each 3D Gaussian also includes a color value c and
pipeline of structured Gaussian derivation methods:
an opacity value α. These are used for subsequent opacity
blending operations during the rasterization process. While
• Gaussian Derivation Stage: Decode the feature parame-
rendering, the 3D Gaussians are first projected onto screen
ters of the anchors into neural Gaussian parameters
space using the EWA algorithm [159] and transformed into
2D Gaussians, which is a process commonly referred to as • Gaussian Rasterization Stage: Splat 3D neural Gaussian
Splatting. to 2D and Rasterize neural Gaussians into image
In order to make full use of the structured scene priori in
the SfM results, some related works have been proposed, such The derivation stage and rasterization stage are two com-
as Scaffold-GS and Octree-GS. Scaffold-GS does not recon- putationally intensive stages in the rendering pipeline and
struct directly based on the SfM sparse point cloud but first produce significant temporary GPU memory usage. There-
extracts a sparse voxel grid from the point cloud and con- fore, they affect and shape the main computing patterns of
structs anchors at the center of the voxel grids. The anchors the structured Gaussian derivation method and its rendering
contain feature parameters f , which are used to derive the pipeline in Gaussian splatting scenes of different scales. The
neural Gaussian: following preliminary empirical experiments are shown in the
Figure 3. As the scale of the scene increases and the number
{µ j , Σ j , c j , α j , } j∈M = MLPθ ( fi , dview )i∈N (3) of anchor feature parameters increases, it further hinders the
achievement of the rendering speed required for immersive
Where θ represents sets of learnable weights of the multi- binocular stereo.
layer perceptron (MLP), µ j , Σ j , c j , α j and s j represent the
mean, covariance matrix, color, opacity of the neural Gaus-
sian j derived from anchor i under view direction dview . The
neural Gaussians will then be used for rasterization, which
is no different from native 3D Gaussians. At the same time,
the structured placement of the anchors also allows the de-
rived neural Gaussians to be guided by the scene prior, which
reduces the redundancy of model parameters and improves
robustness in novel view synthesis. Octree-GS goes a step
further in the structuring strategy, using an octree to replace
the sparse voxel grid to retain multi-resolution and structured
scene priori. The multi-resolution grid in the octree makes
it possible to construct layers of detail (LoD) in training and Figure 3: Rendering stage time in different scenes. Although
then reduce the rendering overhead by setting different detail the end-to-end rendering time varies due to the scales of the
levels according to distance, expanding the scene scale appli- Gaussian splatting scenes, the time of the derivation stage and
cability of the structured Gaussian derivation method. The the rasterization stage always dominate.
basic rendering pipeline of Gaussian derivation methods is
shown in Figure 2. We test the proportion of various operators in the model
The structured Gaussian derivation method has higher ren- across two common scenarios, as shown in Figure 4. In the
dering efficiency than the original randomly distributed Gaus- figure, the Rasterizer is the final operator in the rendering
4
can be directly accessed from the cache.
5
Figure 5: Overview of GS-Cache framework architecture.
6
Algorithm 1 Dynamic Cache Depth Scheduling Algorithm
1: Initialize rendering pipeline, set cache depth and guiding
function
2: while Receiving camera input do
3: Anchors indexing and filtering
4: if Reach max cache reuse depth then
5: Invalidate those computation cache line
6: end if
7: if Anchor duplicate rate == 0% or first frame then
8: Decode total anchors into 3D Gaussians
9: else if Anchor duplicate rate == 100% then
10: Reuse total 3D Gaussians in cache
11: else
12: Decode new anchors
13: Update computation cache
14: Compose new Gaussians and cached Gaussians
15: end if
Figure 8: Binocular stereo de-redundancy through uniformed
16: Configure cache depth based on guiding function and
camera in structured Gaussian derivation method.
duplicate rate
17: Update render buffer
18: Rasterize render buffer into an image redundancy of the derivation stage in the sequential alternat-
19: end while ing method. Then, the binocular cameras can rasterize the
shared Gaussian of the derivation stage independently or in
is, two cameras must render the same objectives of the 3D batching to maintain the binocular stereo parallax. It is worth
scene as images under different perspectives. The mere posi- noting that, unlike the double-wide rendering method [147] in
tion variance and the large field of view result in significant the traditional rasterization of the mesh models, our method
redundancy in binocular stereo rendering. That means twice merges the multi-channel end-to-end pipeline into one in
anchor decoding and cache visiting. stereo rendering. This eliminates the redundancy of the se-
We propose a stereo rendering de-redundancy method suit- quential alternating method and reduces the number of calls
able for structured Gaussian derivation methods, aiming to between multiple camera renderings in the derivation stage,
eliminate computation redundancy in the derivation stage. thereby improving the performance and frame rate upper limit
The core is to utilize the overlap of the stereo cameras to of stereo rendering.
merge the computation process in the derivation stage so
that two cameras can share the Gaussian parameters decoded 5 Multi-GPU Elastic Parallel Rendering
by a set of anchor features for the subsequent rasterization
Schedule
stage. For binocular stereo rendering, assume that there is a
camera group C = {c1 , c2 }, whose positions in the world co- In city scenes, since the scenes are large and the Gaussian
ordinate system and local Z-axis directions are P = {p1 , p2 } distribution between the different parts is relatively significant,
and D = {d1 , d2 } , respectively, sharing the same camera in- the performance (FPS) will fluctuate greatly when roaming
trinsic parameters and expressed as θFOV , then the following across the scenes. For example, the FPS will fluctuate greatly
method can be used to obtain the merged parameters that from an area with dense high-rise buildings to an open square
cover the binocular camera field of view at the same time: or from a ground-level view to a high-altitude bird’s-eye view.
µD Even though we use cache to reduce the computational load
duni f ied = (5) significantly, the rendering FPS decreases as the scene size
||µD ||2
increases. Therefore, based on the cache-centric rendering
||p1 − p2 ||2
puni f ied = µP − duni f ied ∗ (6) pipeline, we further employ elastic parallelism techniques
2 ∗ tan( θFOV
2 ) to stabilize the rendering FPS above a predetermined value.
Where duni f ied and puni f ied represent the direction and posi- That requires that the computing resources be dynamically
tion of the camera, which is equivalent to the combined binoc- scheduled according to the changes in the scene. We design
ular field of view, and its intrinsic parameters are still θ f ov , an elastic parallel scheduling strategy to alleviate the drastic
as shown in Figure 8. Therefore, the binocular cameras can changes in FPS caused by view changes and achieve stable
simultaneously enter the derivation stage through the equiva- rendering.
lent camera and share the results, eliminating the computation We have designed an asynchronous pipeline for VR render-
7
ing, as detailed in Algorithm 2. Rather than directly use the
binocular cameras of the VR HMD device for rendering, we
put it into a shared queue. In a fixed sampling interval, if the
VR HMD device pose changes by more than the threshold,
we put the current camera into the shared queue and stamp
it with time. The camera data that exceeds a timeout will be
discarded by the shared queue. The rendering worker process
accesses the shared queue when it finishes rendering the pre-
vious camera, takes camera data from the head of the queue,
and executes the rendering task. A scheduler is introduced to
flexibly schedule the rendering worker process according to
the change of FPS. A scheduling strategy is used to achieve
a stable rendering. In the strategy, we present a frame rate
range [Min-FPS, Max-FPS]. When the FPS is lower than the
preset Min-FPS, the scheduler starts a new rendering worker
1
process; when the FPS is higher than (1 + Nworkers ) times of
the Max-FPS, a chosen rendering worker process is stopped.
As shown in Figure 9.
Since our elastic parallel rendering is an asynchronous Figure 10: An overview of dedicated CUDA kernels
rendering pipeline, inconsistent rendering order may occur
due to inconsistent GPU performance. To solve this problem,
we synchronize when writing to display. By recording the overhead in this stage comes from the combined operators,
timestamp of the last frame written to display and comparing using 43% of the stage duration, and the MLPs, using 25%,
the current frame’s timestamp, we decide whether the current as shown in Figure 10 (a). For the first combine operator, we
frame should be written to the GPU memory for display. A optimize element-wise multiplication within the kernel, elim-
simple principle is that frames with earlier timestamps should inating the need for hard copies of features. For the second
be displayed first, and we should display rendered frames as combine operator, we replace the merge-and-split operations
quickly as possible. So, under this principle, expired frames with a fine-grained parallel method that processes each tensor
will be discarded. individually. These optimizations reduce both memory usage
and processing time. For the MLPs, our optimization method
6 Implementation of model Components fuses two layers into a single fused Matmul. For the entire
process, precomputing mask indices in the mask computation
In structured Gaussian derivation methods, including Scaffold- step can reduce redundant calculations.
GS and Octree-GS, the main bottlenecks are the deriva- RasterizeGaussians Optimization In addition to general
tion stage (performed by AnchorDecoder) and rasterization optimizations such as merging memory accesses and precom-
stage (performed by RasterizeGaussians), as illustrated by putation, a significant amount of computational redundancies
the SCGS and OCGS bars in Figure 13. Designing dedicated in the rasterizing pipeline are eliminated through two methods,
CUDA kernels for these stages significantly improves render- as shown in Figure 10 (b). First, when defining the Gaussian
ing speed while maintaining a high standard of image output distribution, considering the opacity can scale down the size
consistency. of the ellipse, reducing the area of the Axis-Aligned Bounding
AnchorDecoder Optimization The main computational Box (AABB) and the number of key-value pairs, thereby re-
8
ducing the overall computational load. Second, optimizing the scheduling one of the GPU resources according to the change
AABB’s tile coverage determination can eliminate computa- in rendering frame rate.
tions of tiles that are completely outside the ellipse’s coverage To verify the rendering capabilities of different methods
area [43]. The redundancy reduction slightly increases the in large-scale scenes, we use the Matrixcity [93] dataset to
preprocessing duration but significantly reduces the duration build the target scene used in the experiment. When testing
of subsequent computational steps. the rendering from the aerial views, the entire 2.7 square
kilometers of the city scene is used, while when testing the
Algorithm 2 Elastic Parallel Rendering Scheduling Algo- rendering of the street views, only part of the street scene of
rithm the city is used, covering a street range of about 220 meters in
1: Initialize shared queue, set target FPS and timeout length. The sample configurations delivered by Scaffold-GS
2: while VR application is tuning do and Octree-GS on the Matrixcity dataset are trained to 40,000
3: Obtain current VR device pose information iterations and obtain scene models with high reconstruction
4: if HMD device pose change exceeds threshold then quality, as shown in Table 1.
5: Add camera and timestamp into the shared queue
6: end if Table 1: Large-scale Scenes reconstructed from the Matrixcity
7: Calculate current FPS dataset
8: if FPS is below target FPS then
Scene View Scale PSNR Images Anchors
9: Start a new rendering process
1 2.7km2
10: else if FPS is above target FPS (1 + Nworkers ) then City(Scaffold-GS) Aerial 28.84 5621 18,554k
City(Octree-GS) Aerial 2.7km2 28.08 5621 17,122k
11: Choose one rendering process and stop it
Street(Scaffold-GS) Street 0.026km2 30.01 330 1,311k
12: end if Street(Octree-GS) Street 0.026km2 31.20 330 2,848k
13: for Each rendered frame do
14: if Timestamp < Last_written_timestamp then
15: Discard the frame
16: else 7.1 Performance Evaluation
17: Write the frame to GPU Texture memory To simulate the experience of urban roaming, we set an aerial
18: Update Last_written_timestamp ascending curved trajectory in the city scene and a straight
19: end if street trajectory along the road in the street scene. We take the
20: end for consecutive keyframes in the trajectory as input for our render-
21: end while
ing framework. Each frame corresponds to the head-mounted
display’s locomotion input, which contains two poses of the
binocular stereo cameras on it.
7 Experiments Experiments are conducted on Scaffold-GS and Octree-
GS, the SOTA models for modeling cities. The max reuse
Aiming at an immersive VR experience, we choose the depth is set to 10 for caching computation results from the
Meta Quest 3 head-mounted display (HMD) as the human- previous 10 frames. A series of experiments show that setting
computer interaction interface, which supports a display ca- it to 10 balances both performance and quality. To compare
pability of up to 120FPS with a binocular 2K resolution. The their optimal performance, both rendering pipelines are tested
locomotion of the helmet is transmitted through the OpenXR under single GPU and multi-GPU resources. Besides, we
Runtime API, and the rendering results from the other de- compare our results with CityGS [96] and some VR rendering
vice are streamed to the rendering buffer of the helmet. We works.
use consumer-grade components to build our platform that We collect the average FPS for performance evaluation of
performs actual rendering, including an Intel i9-14900 CPU, the rendering pipeline under continuous computing condi-
128GB RAM, and two Nvidia RTX 4090 GPUs connected tions. For a more comprehensive result, we also collect the
and communicated through the PCIe 4.0x8 slots. The custom 99% percentile FPS. At the same time, the time consumption
CUDA renderer performs the rendering tasks in the complete of the anchor decoding stage and the Gaussian rasterization
framework. The computation results will first be placed in stage in the rendering pipeline, where computation optimiza-
the VRAM on GPUs and finally gathered and streamed to tion and dedicated CUDA kernels mainly take effect, are
the rendering buffer of the helmet through the OpenXR Run- collected. The results are shown in Table 2 and Table 3.
time API. The GS-Cache rendering framework is deployed Under a single GPU, compared to the baseline pipeline,
on the consumer-grade workstation mentioned above and im- the optimized rendering pipeline has an average frame rate
plemented with PyTorch. All pipelines can use multi-GPU performance improvement of 2x. The city scene has a higher
resources to improve the total throughput through the elastic speedup gain because the number of anchors involved in de-
parallel rendering scheduling interface and enable or disable coding and the number of Gaussians involved in rasterization
9
Table 2: Rendering performance comparison on Matrixcity city scene
Methods AVG. FPS 99% FPS Decoding(ms) Rasterization(ms) AVG. Speedup Gain
Scaffold-GS(Origin) 27.24 13.33 28.89 10.63 -
Octree-GS(Origin) 18.04 11.50 18.49 14.10 -
Scaffold-GS(Our) 55.81 37.28 11.34 5.42 2.05
Octree-GS(Our) 44.55 25.81 6.39 4.35 2.47
Scaffold-GS(Origin w/ elastic) 50.78 29.07 14.19 5.19 1.86
Octree-GS(Origin w/ elastic) 42.38 30.98 8.01 5.92 2.35
Scaffold-GS(Our w/ elastic) 109.80 78.29 5.57 3.01 4.03
Octree-GS(Our w/ elastic) 96.46 73.27 2.82 2.29 5.35
Methods AVG. FPS 99% FPS Decoding(ms) Rasterization(ms) AVG. Speedup Gain
Scaffold-GS(Origin) 43.91 8.35 8.67 13.38 -
Octree-GS(Origin) 54.26 21.19 8.62 6.63 -
Scaffold-GS(Our) 80.24 14.80 1.61 9.88 1.83
Octree-GS(Our) 113.97 44.74 2.25 4.37 2.10
Scaffold-GS(Origin w/ elastic) 88.59 18.24 4.36 6.57 2.02
Octree-GS(Origin w/ elastic) 111.98 44.38 4.16 3.11 2.06
Scaffold-GS(Our w/ elastic) 148.30 33.34 0.93 5.59 3.38
Octree-GS(Our w/ elastic) 203.16 93.69 1.18 2.54 3.74
10
In contrast to methods such as VR-NeRF [149], RT-
NeRF [95], and VR-GS [64], which have made significant
contributions to the VR rendering of 3D neural scenes, these
methods have primarily focused on small-scale scenes. The
model sizes employed in these methods differ by an order of
magnitude from those in our experiments, and their average
rendering frame rates are significantly below 72 FPS. Our so-
lution, however, substantially outperforms existing methods in
terms of performance while maintaining rendering quality. At
the same settings, from the viewpoint of 500 meters in height,
the FPS of GS-Cache is double that of CityGaussian [96].
Figure 12: Dynamic cache depth. Due to the scheduling strat-
egy for quality, it may cause rapid shrinkage of depth when
7.2 Quality Evaluation the update rate fluctuates, but overall speedup can still be
achieved.
Performance improvements in rendering pipelines are often
accompanied by trade-offs in quality. The rendering pipeline
optimization method we proposed includes computation de- baseline pipelines. As shown in the Table 4. It is worth noting
redundancy and computation reuse to maintain the pipeline that the mean square error(MSE) and peak signal-to-noise
structure unchanged and optimize the process centered on ratio(PSNR) can only reflect the absolute difference in pixel
rendering quality. Due to the peculiarities of multi-camera values between the images but not the relative difference in
setups in binocular stereo, the impact of redundancy removal perception. Therefore, it is also necessary to refer to metrics
on rendering quality is not significant. In the meantime, reuse such as the structural similarity index(SSIM) [142] and the
requires a dynamic cache depth scheduling strategy to con- learned perceptual image patch similarity(LPIPS) [153]. It is
trol fluctuations in rendering quality. In our experiments, we generally believed that when PSNR surpasses 30, the visual
use a linear guidance function for responding to intensity difference between the two images is difficult to perceive by
changes in the decoding stage and schedule reuse depth in the human eye. SSIM and LPIPS need to be over 0.9 and no
subsequent frame renderings. The linear response is matched more than 0.1, respectively.
with the motion features for the constant speed movement of
the rendered trajectory in the experiment. Figure 12 illustrates
how the cache depth adjusts to maintain rendering quality 7.3 Ablation Evaluation
during the rendering process. The update rate refers to the
In addition to evaluating the overall performance and qual-
percentage of anchors that are decoded and updated to the
ity changes of the GS-Cache rendering framework, we also
cache and is equivalent to the cache miss rate. As the cache
conduct ablation experiments on the performance impact of
depth increases, the cache miss rate decreases, resulting in a
different optimization methods for the rendering pipeline.
corresponding reduction in the update rate. For the movement
Based on the rendering trajectories from the aerial and the
with acceleration and with staged speed changes, exponen-
street, pipelines that ablated different optimization methods
tial response and staged response are needed to match the
are tested for rendering. We compare the end-to-end frame
reuse depth and motion features. Our optimization methods
rate and stage time consumption, and record the highest mem-
do not involve modifications on the rendering pipeline, are
ory usage in the complete rendering process, as Table 5 And
transparent to the original rendering process of the Gaussian
Table 6 shown. We can also calculate the speedup of different
derivation method, and are compatible with pipelines without
methods to achieve full performance through ablation.
LOD (e.g., Scaffol-GS) and pipelines containing LOD (e.g.,
In the city scene, computation de-redundancy and reuse
Octree-GS).
significantly impact the pipeline’s overall performance, and
Table 4: Rendering quality comparison on Matrixcity city and the speedup of the average frame rate can reach 1.8x. In the
street scenes street scene, the impact of dedicated CUDA kernels on the
overall performance of the pipeline is more significant. This is
Methods MSE↓ PSNR↑ SSIM↑ LPIPS↓ because, in the city scene, the time of single-frame rendering
Scaffold-GS(City) 0.00116 38.36 0.98 0.022
is concentrated in the decoding stage. In contrast, in the street
Octree-GS(City) 0.00136 35.38 0.98 0.024 scenes, the time is concentrated in the rasterization stage, cor-
Scaffold-GS(Street) 0.00155 32.68 0.98 0.018 responding to the main effects of computation optimization
Octree-GS(Street) 0.00038 35.53 0.99 0.012 and dedicated kernels. The impact on the average frame rate
is also reflected in the 99% frame rate, which further proves
We evaluate the quality difference caused by computation that our optimization methods are essential to improve the per-
optimizations between images rendered by the optimized and formance of the rendering pipeline, which covers the best and
11
Table 5: Rendering ablation comparison on Matrixcity city scene.
Methods AVG. FPS 99% FPS Decoding(ms) Rasterization(ms) Memory(GiB) AVG. Speedup Loss
Scaffold-GS(Our) 56.61 38.71 11.28 5.39 8.10 -
Our w/o de-redundancy 34.82 26.49 22.72 5.36 8.05 1.62x
Our w/o reuse 31.26 20.52 19.48 5.27 7.85 1.81x
Our w/o kernels 38.46 21.64 14.15 10.77 10.30 1.47x
Octree-GS(Our) 41.15 20.33 6.89 4.68 8.59 -
Our w/o de-redundancy 24.86 16.10 13.27 4.37 8.59 1.65x
Our w/o reuse 22.48 17.66 10.09 4.54 8.51 1.83x
Our w/o kernels 31.51 21.40 8.41 12.57 9.17 1.30x
Methods AVG. FPS 99% FPS Decoding(ms) Rasterization(ms) Memory(GiB) AVG. Speedup Loss
Scaffold-GS(Our) 79.43 14.59 1.68 9.93 2.12 -
Our w/o de-redundancy 72.52 14.59 3.20 9.95 2.16 1.09x
Our w/o reuse 77.88 13.60 1.84 10.01 1.96 1.02x
Our w/o kernels 58.34 11.43 2.90 13.18 3.67 1.36x
Octree-GS(Our) 115.88 47.93 2.19 4.34 2.07 -
Our w/o de-redundancy 87.85 44.87 4.16 4.25 2.12 1.31x
Our w/o reuse 93.02 41.30 2.58 4.35 2.03 1.27x
Our w/o kernels 77.76 33.45 4.20 6.36 2.94 1.49x
worst cases in rendering performance. Removing redundancy As shown in Figure 13, replacing only the dedicated CUDA
in binocular stereo improves the performance of the decoding kernel results in a speedup for both AnchorDecoder and
stage most significantly in different scenes and pipelines. In Rasterizer. Notably, in the city and street scenes, Rasterizer
contrast, dedicated kernels can most significantly improve the achieves a higher speedup in the former, while AnchorDe-
performance of the rasterization stage while reducing memory coder achieves a higher speedup in the latter. This is because
usage. Due to the introduction of the computation cache for the city scene spans a broader area and involves a larger scale,
Gaussian derivation, the reuse optimization of the rendering with denser and more numerous anchor points and Gaussians.
pipeline will cause additional memory usage, not exceeding For Rasterizer, the optimization eliminates redundant com-
3%. Ultimately, our optimization methods jointly support the putations caused by false Gaussian intersections. The opti-
rendering pipeline to meet the performance requirements of mization yields better results since the denser Gaussians in
immersive real-time VR rendering and form compensation the city scene result in more false intersections. As for An-
and balance in memory usage. chorDecoder, its optimization mainly reduces memory access
overhead. In the city scene, the increased density of anchor
points raises the computational overhead for AnchorDecoder,
making memory access optimizations less effective than in
the street scene.
8 Conclusions
We demonstrate the GS-Cache framework, a rendering frame-
work oriented to structured Gaussian derivation methods,
which can achieve real-time rendering of large-scale scenes,
including city and street Gaussian reconstruction scenes, meet-
ing the high-speed and high-fidelity requirements of immer-
sive VR experience. We make several key contributions, in-
cluding the cache-centric de-redundancy rendering pipeline, a
rendering framework that supports multi-GPU parallelism and
elastic scheduling, and dedicated CUDA kernels for the com-
putational bottleneck stage. In the experiments, we verify that
Figure 13: Single-frame rendering kernel duration. Due to the GS-Cache framework achieves significant performance
the LOD-based model structure and layer-switching strategy improvements compared to the baseline methods, and meets
applied to the entire scene, Octree-GS introduces significant the frame rate requirements of binocular 2K resolution of
time before the decoding stage in large-scale scenes. more than 72FPS and more than 120FPS under limited re-
12
sources such as consumer-grade GPUs, and does not result in [11] David A. Anisi. Optimal motion control of a ground
significant quality loss. vehicle. Master’s thesis, Royal Institute of Technology
(KTH), Stockholm, Sweden, 2003.
[5] Patricia S. Abril and Robert Plant. The patent holder’s [17] Jonathan T Barron, Ben Mildenhall, Dor Verbin,
dilemma: Buy, sell, or troll? Communications of the Pratul P Srinivasan, and Peter Hedman. Zip-nerf: Anti-
ACM, 50(1):36–44, January 2007. aliased grid-based neural radiance fields. In Proceed-
ings of the IEEE/CVF International Conference on
[6] A. Adya, P. Bahl, J. Padhye, A.Wolman, and L. Zhou. Computer Vision, pages 19697–19705, 2023.
A multi-radio unification protocol for IEEE 802.11
wireless networks. In Proceedings of the IEEE 1st In- [18] Lutz Bornmann, K. Brad Wray, and Robin Haunschild.
ternational Conference on Broadnets Networks (Broad- Citation concept analysis (CCA)—a new form of ci-
Nets’04), pages 210–217, Los Alamitos, CA, 2004. tation analysis revealing the usefulness of concepts
IEEE. for other researchers illustrated by two exemplary case
studies including classic books by Thomas S. Kuhn
[7] I. F. Akyildiz, T. Melodia, and K. R. Chowdhury. A and Karl R. Popper, May 2019.
survey on wireless multimedia sensor networks. Com-
puter Netw., 51(4):921–960, 2007. [19] Mic Bowman, Saumya K. Debray, and Larry L. Peter-
son. Reasoning about naming systems. ACM Trans.
[8] I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and Program. Lang. Syst., 15(5):795–825, November 1993.
E. Cayirci. Wireless sensor networks: A survey. Comm.
ACM, 38(4):393–422, 2002. [20] Johannes Braams. Babel, a multilingual style-option
system for use with latex’s standard document styles.
[9] American Mathematical Society. Using the amsthm TUGboat, 12(2):291–301, June 1991.
Package, April 2015. http://www.ctan.org/pkg/
amsthm. [21] James Bradbury, Roy Frostig, Peter Hawkins,
Matthew James Johnson, Chris Leary, Dougal Maclau-
[10] Sten Andler. Predicate path expressions. In Proceed- rin, George Necula, Adam Paszke, Jake VanderPlas,
ings of the 6th. ACM SIGACT-SIGPLAN symposium Skye Wanderman-Milne, and Qiao Zhang. JAX:
on Principles of Programming Languages, POPL ’79, composable transformations of Python+NumPy
pages 226–236, New York, NY, 1979. ACM Press. programs, 2018.
13
[22] Jonathan F. Buss, Arnold L. Rosenberg, and Judson D. [35] Nianchen Deng, Zhenyi He, Jiannan Ye, Budmonde
Knott. Vertex types in book-embeddings. Technical Duinkharjav, Praneeth Chakravarthula, Xubo Yang,
report, Amherst, MA, USA, 1987. and Qi Sun. Fov-nerf: Foveated neural radiance fields
for virtual reality. IEEE Transactions on Visualization
[23] Jonathan F. Buss, Arnold L. Rosenberg, and Judson D. and Computer Graphics, 28(11):3854–3864, 2022.
Knott. Vertex types in book-embeddings. Technical
report, Amherst, MA, USA, 1987. [36] E. Dijkstra. Go to statement considered harmful. In
Classics in software engineering (incoll), pages 27–33.
[24] Yang Cao, Tao Jiang, Xu Chen, and Junshan Zhang. Yourdon Press, Upper Saddle River, NJ, USA, 1979.
Social-aware video multicast based on device-to-
device communications. IEEE Transactions on Mobile [37] Bruce P. Douglass, David Harel, and Mark B. Trakhten-
Computing, 15(6):1528–1539, 2015. brot. Statecarts in use: structured analysis and object-
orientation. In Grzegorz Rozenberg and Frits W. Vaan-
[25] Zhiqin Chen, Thomas Funkhouser, Peter Hedman, and drager, editors, Lectures on Embedded Systems, volume
Andrea Tagliasacchi. Mobilenerf: Exploiting the poly- 1494 of Lecture Notes in Computer Science, pages 368–
gon rasterization pipeline for efficient neural field ren- 394. Springer-Verlag, London, 1998.
dering on mobile architectures. In Proceedings of the
[38] D. D. Dunlop and V. R. Basili. Generalizing specifica-
IEEE/CVF Conference on Computer Vision and Pat-
tions for uniformly implemented loops. ACM Trans.
tern Recognition, pages 16569–16578, 2023.
Program. Lang. Syst., 7(1):137–158, January 1985.
[26] Malcolm Clark. Post congress tristesse. In TeX90 Con- [39] Ian Editor, editor. The title of book one, volume 9 of
ference Proceedings, pages 84–89. TeX Users Group, The name of the series one. University of Chicago
March 1991. Press, Chicago, 1st. edition, 2007.
[27] Kenneth L. Clarkson. Algorithms for Closest-Point [40] Ian Editor, editor. The title of book two, chapter 100.
Problems (Computational Geometry). PhD thesis, Stan- The name of the series two. University of Chicago
ford University, Palo Alto, CA, 1985. UMI Order Num- Press, Chicago, 2nd. edition, 2008.
ber: AAT 8506171.
[41] Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu,
[28] Kenneth Lee Clarkson. Algorithms for Closest-Point Dejia Xu, and Zhangyang Wang. Lightgaussian: Un-
Problems (Computational Geometry). PhD thesis, bounded 3d gaussian compression with 15x reduction
Stanford University, Stanford, CA, USA, 1985. AAT and 200+ fps. arXiv preprint arXiv:2311.17245, 2023.
8506171.
[42] Simon Fear. Publication quality tables in LATEX, April
[29] Special issue: Digital libraries, November 1996. 2005. http://www.ctan.org/pkg/booktabs.
[34] D. Culler, D. Estrin, and M. Srivastava. Overview of [46] Jayshree Ghorpade. Gpgpu processing in cuda architec-
sensor networks. IEEE Comput., 37(8 (Special Issue ture. Advanced Computing: An International Journal,
on Sensor Networks)):41–49, 2004. 3(1):105–120, January 2012.
14
[47] Michel Goossens, S. P. Rahtz, Ross Moore, and [58] Maurice Herlihy. A methodology for implementing
Robert S. Sutor. The Latex Web Companion: Integrat- highly concurrent data objects. ACM Trans. Program.
ing TEX, HTML, and XML. Addison-Wesley Longman Lang. Syst., 15(5):745–770, November 1993.
Publishing Co., Inc., Boston, MA, USA, 1st edition,
1999. [59] C. A. R. Hoare. Chapter ii: Notes on data structuring.
In O. J. Dahl, E. W. Dijkstra, and C. A. R. Hoare,
[48] Brian Guenter, Mark Finch, Steven Drucker, Desney editors, Structured programming (incoll), pages 83–
Tan, and John Snyder. Foveated 3d graphics. ACM 174. Academic Press Ltd., London, UK, UK, 1972.
transactions on Graphics (tOG), 31(6):1–10, 2012.
[60] Billy S. Hollis. Visual Basic 6: Design, Specification,
[49] Matthew Van Gundy, Davide Balzarotti, and Giovanni and Objects with Other. Prentice Hall PTR, Upper
Vigna. Catch me, if you can: Evading network sig- Saddle River, NJ, USA, 1st edition, 1999.
natures with web-based polymorphic worms. In Pro-
ceedings of the first USENIX workshop on Offensive [61] Lars Hörmander. The analysis of linear partial differ-
Technologies, WOOT ’07, Berkley, CA, 2007. USENIX ential operators. III, volume 275 of Grundlehren der
Association. Mathematischen Wissenschaften [Fundamental Prin-
ciples of Mathematical Sciences]. Springer-Verlag,
[50] Matthew Van Gundy, Davide Balzarotti, and Giovanni Berlin, Germany, 1985. Pseudodifferential operators.
Vigna. Catch me, if you can: Evading network sig-
natures with web-based polymorphic worms. In Pro- [62] Lars Hörmander. The analysis of linear partial differ-
ceedings of the first USENIX workshop on Offensive ential operators. IV, volume 275 of Grundlehren der
Technologies, WOOT ’08, pages 99–100, Berkley, CA, Mathematischen Wissenschaften [Fundamental Prin-
2008. USENIX Association. ciples of Mathematical Sciences]. Springer-Verlag,
Berlin, Germany, 1985. Fourier integral operators.
[51] Matthew Van Gundy, Davide Balzarotti, and Giovanni
Vigna. Catch me, if you can: Evading network sig- [63] Ieee tcsc executive committee. In Proceedings of
natures with web-based polymorphic worms. In Pro- the IEEE International Conference on Web Services,
ceedings of the first USENIX workshop on Offensive ICWS ’04, pages 21–22, Washington, DC, USA, 2004.
Technologies, WOOT ’09, pages 90–100, Berkley, CA, IEEE Computer Society.
2009. USENIX Association.
[64] Ying Jiang, Chang Yu, Tianyi Xie, Xuan Li, Yutao
[52] Torben Hagerup, Kurt Mehlhorn, and J. Ian Munro. Feng, Huamin Wang, Minchen Li, Henry Lau, Feng
Maintaining discrete probability distributions opti- Gao, Yin Yang, and Chenfanfu Jiang. Vr-gs: A physical
mally. In Proceedings of the 20th International Col- dynamics-aware interactive gaussian splatting system
loquium on Automata, Languages and Programming, in virtual reality, 2024.
volume 700 of Lecture Notes in Computer Science,
pages 253–264, Berlin, 1993. Springer-Verlag. [65] Tao Jin, Mallesham Dasa, Connor Smith, Kittipat
Apicharttrisorn, Srinivasan Seshan, and Anthony Rowe.
[53] David Harel. Logics of programs: Axiomatics and Meshreduce: Scalable and bandwidth efficient 3d scene
descriptive power. MIT Research Lab Technical Re- capture. In 2024 IEEE Conference Virtual Reality and
port TR-200, Massachusetts Institute of Technology, 3D User Interfaces (VR), pages 20–30. IEEE, 2024.
Cambridge, MA, 1978.
[66] Bernhard Kerbl, Georgios Kopanas, Thomas
[54] David Harel. First-Order Dynamic Logic, volume 68 of Leimkuehler, and George Drettakis. 3d gaussian
Lecture Notes in Computer Science. Springer-Verlag, splatting for real-time radiance field rendering. ACM
New York, NY, 1979. Transactions on Graphics (TOG), 42:1 – 14, 2023.
[55] CodeBlue: Sensor networks for medical care, 2008.
[67] Bernhard Kerbl, Georgios Kopanas, Thomas Leimküh-
http://www.eecs.harvard.edu/mdw/ proj/codeblue/.
ler, and George Drettakis. 3d gaussian splatting for
[56] Peter Hedman, Pratul P Srinivasan, Ben Mildenhall, real-time radiance field rendering. ACM Trans. Graph.,
Jonathan T Barron, and Paul Debevec. Baking neural 42(4):139–1, 2023.
radiance fields for real-time view synthesis. In Pro-
[68] Bernhard Kerbl, Andreas Meuleman, Georgios
ceedings of the IEEE/CVF international conference on
Kopanas, Michael Wimmer, Alexandre Lanvin,
computer vision, pages 5875–5884, 2021.
and George Drettakis. A hierarchical 3d gaussian
[57] J. Heering and P. Klint. Towards monolingual program- representation for real-time rendering of very large
ming environments. ACM Trans. Program. Lang. Syst., datasets. ACM Transactions on Graphics (TOG),
7(2):183–213, April 1985. 43(4):1–15, 2024.
15
[69] Markus Kirschmer and John Voight. Algorithmic enu- [82] E. Korach, D. Rotem, and N. Santoro. Distributed
meration of ideal classes for quaternion orders. SIAM algorithms for finding centers and medians in networks.
J. Comput., 39(5):1714–1747, January 2010. ACM Trans. Program. Lang. Syst., 6(3):380–401, July
1984.
[70] Donald E. Knuth. Seminumerical Algorithms.
Addison-Wesley, 1981. [83] Jacob Kornerup. Mapping powerlists onto hypercubes.
Master’s thesis, The University of Texas at Austin,
[71] Donald E. Knuth. Seminumerical Algorithms, vol- 1994. (In preparation).
ume 2 of The Art of Computer Programming. Addison-
[84] David Kosiur. Understanding Policy-Based Network-
Wesley, Reading, MA, 2nd edition, 10 January 1981.
ing. Wiley, New York, NY, 2nd. edition, 2001.
[72] Donald E. Knuth. The TEXbook. Addison-Wesley, [85] Brooke Krajancich, Petr Kellnhofer, and Gordon Wet-
Reading, MA., 1984. zstein. A perceptual model for eccentricity-dependent
spatio-temporal flicker fusion and its applications to
[73] Donald E. Knuth. The Art of Computer Programming, foveated graphics. ACM Transactions on Graphics
Vol. 1: Fundamental Algorithms (3rd. ed.). Addison (TOG), 40(4):1–11, 2021.
Wesley Longman Publishing Co., Inc., 1997.
[86] Leslie Lamport. LATEX: A Document Preparation Sys-
[74] Donald E. Knuth. The Art of Computer Programming, tem. Addison-Wesley, Reading, MA., 1986.
volume 1 of Fundamental Algorithms. Addison Wes-
[87] Jan Lee. Transcript of question and answer session. In
ley Longman Publishing Co., Inc., 3rd edition, 1998.
Richard L. Wexelblat, editor, History of programming
(book).
languages I (incoll), pages 68–71. ACM, New York,
[75] Wei-Chang Kong. E-commerce and cultural values, NY, USA, 1981.
name of chapter: The implementation of electronic [88] Joo Chan Lee, Daniel Rho, Xiangyu Sun, Jong Hwan
commerce in SMEs in Singapore (Inbook-w-chap-w- Ko, and Eunbyung Park. Compact 3d gaussian rep-
type), pages 51–74. IGI Publishing, Hershey, PA, USA, resentation for radiance field. In Proceedings of the
2001. IEEE/CVF Conference on Computer Vision and Pat-
tern Recognition, pages 21719–21728, 2024.
[76] Wei-Chang Kong. The implementation of electronic
commerce in smes in singapore (as incoll). In E- [89] Newton Lee. Interview with bill kinder: January 13,
commerce and cultural values, pages 51–74. IGI Pub- 2005. Comput. Entertain., 3(1), Jan.-March 2005.
lishing, Hershey, PA, USA, 2001.
[90] Chaojian Li, Sixu Li, Yang Zhao, Wenbo Zhu, and
[77] Wei-Chang Kong. Chapter 9. In Theerasak Yingyan Lin. Rt-nerf: Real-time on-device neural ra-
Thanasankit, editor, E-commerce and cultural values diance fields towards immersive ar/vr rendering. In
(Incoll-w-text (chap 9) ’title’), pages 51–74. IGI Pub- Proceedings of the 41st IEEE/ACM International Con-
lishing, Hershey, PA, USA, 2002. ference on Computer-Aided Design, pages 1–9, 2022.
[91] Cheng-Lun Li, Ayse G. Buyuktur, David K. Hutchful,
[78] Wei-Chang Kong. The implementation of electronic
Natasha B. Sant, and Satyendra K. Nainwal. Portalis:
commerce in smes in singapore (incoll). In Theerasak
using competitive online interactions to support aid
Thanasankit, editor, E-commerce and cultural values,
initiatives for the homeless. In CHI ’08 extended ab-
pages 51–74. IGI Publishing, Hershey, PA, USA, 2003.
stracts on Human factors in computing systems, pages
3873–3878, New York, NY, USA, 2008. ACM.
[79] Wei-Chang Kong. E-commerce and cultural values -
(InBook-num-in-chap), chapter 9, pages 51–74. IGI [92] Ruilong Li, Hang Gao, Matthew Tancik, and Angjoo
Publishing, Hershey, PA, USA, 2004. Kanazawa. Nerfacc: Efficient sampling accelerates
nerfs. In 2023 IEEE/CVF International Conference on
[80] Wei-Chang Kong. E-commerce and cultural values Computer Vision (ICCV), pages 18491–18500, 2023.
(Inbook-text-in-chap), chapter: The implementation of
electronic commerce in SMEs in Singapore, pages 51– [93] Yixuan Li, Lihan Jiang, Linning Xu, Yuanbo Xiangli,
74. IGI Publishing, Hershey, PA, USA, 2005. Zhenzhi Wang, Dahua Lin, and Bo Dai. Matrixcity:
A large-scale city dataset for city-scale neural render-
[81] Wei-Chang Kong. E-commerce and cultural values ing and beyond. In Proceedings of the IEEE/CVF
(Inbook-num chap), chapter (in type field) 22, pages International Conference on Computer Vision, pages
51–74. IGI Publishing, Hershey, PA, USA, 2006. 3205–3215, 2023.
16
[94] Jiaqi Lin, Zhihao Li, Xiao Tang, Jianzhuang Liu, Shiy- [105] E. Mumford. Managerial expert systems and orga-
ong Liu, Jiayue Liu, Yangdi Lu, Xiaofei Wu, Songcen nizational change: some critical research issues. In
Xu, Youliang Yan, and Wenming Yang. Vastgaussian: Critical issues in information systems research (incoll),
Vast 3d gaussians for large scene reconstruction. In pages 135–155. John Wiley & Sons, Inc., New York,
CVPR, 2024. NY, USA, 1987.
[95] Weikai Lin, Yu Feng, and Yuhao Zhu. Rtgs: En- [106] A. Natarajan, M. Motani, B. de Silva, K. Yap, and K. C.
abling real-time gaussian splatting on mobile devices Chua. Investigating network architectures for body
using efficiency-guided pruning and foveated render- sensor networks. In G. Whitcomb and P. Neece, editors,
ing. arXiv preprint arXiv:2407.00435, 2024. Network Architectures, pages 322–328, Dayton, OH,
2007. Keleuven Press.
[96] Yang Liu, Chuanchen Luo, Lue Fan, Naiyan Wang, Jun-
ran Peng, and Zhaoxiang Zhang. Citygaussian: Real- [107] F. Nielson. Program transformations in a denotational
time high-quality large-scale scene rendering with setting. ACM Trans. Program. Lang. Syst., 7(3):359–
gaussians. In European Conference on Computer Vi- 379, July 1985.
sion, pages 265–282. Springer, 2025.
[108] Dave Novak. Solder man. In ACM SIGGRAPH 2003
[97] Tao Lu, Mulin Yu, Linning Xu, Yuanbo Xiangli, Limin Video Review on Animation theater Program: Part I
Wang, Dahua Lin, and Bo Dai. Scaffold-gs: Structured - Vol. 145 (July 27–27, 2003), page 4, New York, NY,
3d gaussians for view-adaptive rendering. In Proceed- March 21, 2008 2003. ACM Press.
ings of the IEEE/CVF Conference on Computer Vision
[109] Barack Obama. A more perfect union. Video, March
and Pattern Recognition, pages 20654–20664, 2024.
2008.
[98] David P. Luebke, Martin Reddy, Jonathan D. Cohen,
[110] Jinwoo Park, Ik-Beom Jeon, Sung-Eui Yoon, and
Amitabh Varshney, Benjamin Watson, and Robert A.
Woontack Woo. Instant panoramic texture mapping
Huebner. Level of Detail for 3D Graphics. Morgan
with semantic object matching for large-scale urban
Kaufmann Publishers Inc., 2012.
scene reproduction. IEEE Transactions on Visualiza-
[99] Elian Malkin, Arturo Deza, and Tomaso Poggio. Cuda- tion and Computer Graphics, 27(5):2746–2756, 2021.
optimized real-time rendering of a foveated visual sys- [111] Adam Paszke, Sam Gross, Francisco Massa, Adam
tem. arXiv preprint arXiv:2012.08655, 2020. Lerer, James Bradbury, Gregory Chanan, Trevor
Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga,
[100] Rafał K Mantiuk, Gyorgy Denes, Alexandre Chapiro,
Alban Desmaison, Andreas Köpf, Edward Yang, Zach
Anton Kaplanyan, Gizem Rufo, Romain Bachy, Trisha
DeVito, Martin Raison, Alykhan Tejani, Sasank Chil-
Lian, and Anjul Patney. Fovvideovdp: A visible dif-
amkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and
ference predictor for wide field-of-view video. ACM
Soumith Chintala. PyTorch: an imperative style, high-
Transactions on Graphics (TOG), 40(4):1–19, 2021.
performance deep learning library, page 12. Curran
[101] Daniel D. McCracken and Donald G. Golden. Sim- Associates Inc., Red Hook, NY, USA, 2019.
plified Structured COBOL with Microsoft/MicroFocus
[112] Charles J. Petrie. New algorithms for dependency-
COBOL. John Wiley & Sons, Inc., New York, NY,
directed backtracking (master’s thesis). Technical re-
USA, 1990.
port, Austin, TX, USA, 1986.
[102] Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, [113] Charles J. Petrie. New algorithms for dependency-
Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. directed backtracking (master’s thesis). Master’s thesis,
Nerf. Communications of the ACM, 65:99 – 106, 2020. University of Texas at Austin, Austin, TX, USA, 1986.
[103] Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, [114] Poker-Edge.Com. Stats and analysis, March 2006.
Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng.
Nerf: Representing scenes as neural radiance fields for [115] R Core Team. R: A language and environment for
view synthesis. Communications of the ACM, 65(1):99– statistical computing, 2019.
106, 2021.
[116] Brian K. Reid. A high-level approach to computer
[104] Sape Mullender, editor. Distributed systems (2nd document formatting. In Proceedings of the 7th Annual
Ed.). ACM Press/Addison-Wesley Publishing Co., Symposium on Principles of Programming Languages,
New York, NY, USA, 1993. pages 24–31, New York, January 1980. ACM.
17
[117] Christian Reiser, Rick Szeliski, Dor Verbin, Pratul [128] Towaki Takikawa, Alex Evans, Jonathan Tremblay,
Srinivasan, Ben Mildenhall, Andreas Geiger, Jon Bar- Thomas Müller, Morgan McGuire, Alec Jacobson, and
ron, and Peter Hedman. Merf: Memory-efficient Sanja Fidler. Variable bitrate neural fields. In ACM
radiance fields for real-time view synthesis in un- SIGGRAPH 2022 Conference Proceedings, pages 1–9,
bounded scenes. ACM Transactions on Graphics 2022.
(TOG), 42(4):1–12, 2023.
[129] Towaki Takikawa, Joey Litalien, Kangxue Yin, Karsten
[118] Kerui Ren, Lihan Jiang, Tao Lu, Mulin Yu, Linning Kreis, Charles Loop, Derek Nowrouzezahrai, Alec Ja-
Xu, Zhangkai Ni, and Bo Dai. Octree-gs: Towards cobson, Morgan McGuire, and Sanja Fidler. Neural
consistent real-time rendering with lod-structured 3d geometric level of detail: Real-time rendering with
gaussians. arXiv preprint arXiv:2403.17898, 2024. implicit 3d shapes. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recogni-
[119] Sara Rojas, Jesus Zarzar, Juan C Pérez, Artsiom tion, pages 11358–11367, 2021.
Sanakoyeu, Ali Thabet, Albert Pumarola, and Bernard
Ghanem. Re-rend: Real-time rendering of nerfs across [130] Towaki Takikawa, Or Perel, Clement Fuji Tsang,
devices. In Proceedings of the IEEE/CVF Interna- Charles Loop, Joey Litalien, Jonathan Tremblay,
tional Conference on Computer Vision, pages 3632– Sanja Fidler, and Maria Shugrina. Kaolin wisp:
3641, 2023. A pytorch library and engine for neural fields re-
search. https://github.com/NVIDIAGameWorks/
[120] Bernard Rous. The enabling of digital libraries. Digital kaolin-wisp, 2022.
Libraries, 12(3), July 2008. To appear.
[131] Matthew Tancik, Vincent Casser, Xinchen Yan,
[121] Mehdi Saeedi, Morteza Saheb Zamani, and Mehdi Sabeek Pradhan, Ben Mildenhall, Pratul P Srinivasan,
Sedighi. A library-based synthesis methodology for re- Jonathan T Barron, and Henrik Kretzschmar. Block-
versible logic. Microelectron. J., 41(4):185–194, April nerf: Scalable large scene neural view synthesis. In Pro-
2010. ceedings of the IEEE/CVF Conference on Computer Vi-
sion and Pattern Recognition, pages 8248–8258, 2022.
[122] Mehdi Saeedi, Morteza Saheb Zamani, Mehdi Sedighi,
and Zahra Sasanian. Synthesis of reversible circuit us- [132] Matthew Tancik, Ethan Weber, Evonne Ng, Ruilong
ing cycle-based approach. J. Emerg. Technol. Comput. Li, Brent Yi, Justin Kerr, Terrance Wang, Alexan-
Syst., 6(4), December 2010. der Kristoffersen, Jake Austin, Kamyar Salahi, Abhik
Ahuja, David McAllister, and Angjoo Kanazawa. Nerf-
[123] S.L. Salas and Einar Hille. Calculus: One and Several studio: A modular framework for neural radiance field
Variable. John Wiley and Sons, New York, 1978. development. In ACM SIGGRAPH 2023 Conference
Proceedings, SIGGRAPH ’23, 2023.
[124] Joseph Scientist. The fountain of youth, August 2009.
Patent No. 12345, Filed July 1st., 2008, Issued Aug. [133] Jiaxiang Tang, Hang Zhou, Xiaokang Chen, Tianshu
9th., 2009. Hu, Errui Ding, Jingdong Wang, and Gang Zeng. Del-
icate textured mesh recovery from nerf via adaptive
[125] Stan W. Smith. An experiment in bibliographic mark- surface refinement. In Proceedings of the IEEE/CVF
up: Parsing metadata for xml export. In Reginald N. International Conference on Computer Vision, pages
Smythe and Alexander Noble, editors, Proceedings of 17739–17749, 2023.
the 3rd. annual workshop on Librarians and Comput-
[134] Harry Thornburg. Introduction to bayesian statistics,
ers, volume 3 of LAC ’10, pages 422–431, Milan Italy,
March 2001.
2010. Paparazzi Press.
[135] Institutional members of the TEX users group, 2017.
[126] Liangchen Song, Anpei Chen, Zhong Li, Zhang Chen,
Lele Chen, Junsong Yuan, Yi Xu, and Andreas Geiger. [136] Haithem Turki, Deva Ramanan, and Mahadev Satya-
Nerfplayer: A streamable dynamic scene representa- narayanan. Mega-nerf: Scalable construction of large-
tion with decomposed neural radiance fields. IEEE scale nerfs for virtual fly-throughs. In Proceedings of
Transactions on Visualization and Computer Graphics, the IEEE/CVF Conference on Computer Vision and
29(5):2732–2742, 2023. Pattern Recognition, pages 12922–12931, 2022.
[127] Asad Z. Spector. Achieving application requirements. [137] Okan Tarhan Tursun, Elena Arabadzhiyska-Koleva,
In Sape Mullender, editor, Distributed Systems, pages Marek Wernikowski, Radosław Mantiuk, Hans-
19–33. ACM Press, New York, NY, 2nd. edition, 1990. Peter Seidel, Karol Myszkowski, and Piotr Didyk.
18
Luminance-contrast-aware foveated rendering. ACM Lorenzo Porzi, Peter Kontschieder, Aljaž Božič, et al.
Transactions on Graphics (TOG), 38(4):1–14, 2019. Vr-nerf: High-fidelity virtualized walkable spaces. In
SIGGRAPH Asia 2023 Conference Papers, pages 1–12,
[138] A. Tzamaloukas and J. J. Garcia-Luna-Aceves. 2023.
Channel-hopping multiple access. Technical Report I-
CA2301, Department of Computer Science, University [150] Linning Xu, Yuanbo Xiangli, Sida Peng, Xingang Pan,
of California, Berkeley, CA, 2000. Nanxuan Zhao, Christian Theobalt, Bo Dai, and Dahua
Lin. Grid-guided neural radiance fields for large urban
[139] Nisarg Ujjainkar, Ethan Shahan, Kenneth Chen, Bud- scenes. In Proceedings of the IEEE/CVF Conference
monde Duinkharjav, Qi Sun, and Yuhao Zhu. Ex- on Computer Vision and Pattern Recognition, pages
ploiting human color discrimination for memory-and 8296–8306, 2023.
energy-efficient image encoding in virtual reality. In
Proceedings of the 29th ACM International Confer- [151] Lior Yariv, Peter Hedman, Christian Reiser, Dor Verbin,
ence on Architectural Support for Programming Lan- Pratul P Srinivasan, Richard Szeliski, Jonathan T Bar-
guages and Operating Systems, Volume 1, pages 166– ron, and Ben Mildenhall. Bakedsdf: Meshing neural
180, 2024. sdfs for real-time view synthesis. In ACM SIGGRAPH
2023 Conference Proceedings, pages 1–9, 2023.
[140] Boris Veytsman. acmart—Class for typesetting publi-
cations of ACM, 2017. [152] Alex Yu, Ruilong Li, Matthew Tancik, Hao Li, Ren
Ng, and Angjoo Kanazawa. Plenoctrees for real-time
[141] Lili Wang, Xuehuai Shi, and Yi Liu. Foveated render- rendering of neural radiance fields. In Proceedings of
ing: A state-of-the-art survey. Computational Visual the IEEE/CVF International Conference on Computer
Media, 9(2):195–228, 2023. Vision, pages 5752–5761, 2021.
[142] Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simon- [153] Richard Zhang, Phillip Isola, Alexei A Efros, Eli
celli. Image quality assessment: from error visibility Shechtman, and Oliver Wang. The unreasonable ef-
to structural similarity. IEEE Transactions on Image fectiveness of deep features as a perceptual metric. In
Processing, 13(4):600–612, 2004. CVPR, 2018.
[143] Yu Wen, Chenhao Xie, Shuaiwen Leon Song, and Xin [154] Yuqi Zhang, Guanying Chen, and Shuguang Cui. Effi-
Fu. Post0-vr: Enabling universal realistic rendering for cient large-scale scene representation with a hybrid of
modern vr via exploiting architectural similarity and high-resolution grid and plane features. arXiv preprint
data sharing. In 2023 IEEE International Symposium arXiv:2303.03003, 2023.
on High-Performance Computer Architecture (HPCA), [155] Hexu Zhao, Haoyang Weng, Daohan Lu, Ang Li,
pages 390–402. IEEE, 2023. Jinyang Li, Aurojit Panda, and Saining Xie. On scaling
up 3d gaussian splatting training, 2024.
[144] Elizabeth M. Wenzel. Three-dimensional virtual acous-
tic displays. In Multimedia interface design (incoll), [156] Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie,
pages 257–288. ACM, New York, NY, USA, 1992. Chuyue Sun, Jeff Huang, Cody Hao Yu, Shiyi Cao,
Christos Kozyrakis, Ion Stoica, Joseph E Gonzalez,
[145] Renato Werneck, João Setubal, and Arlindo da Con-
et al. Sglang: Efficient execution of structured language
ceicão. (new) finding minimum congestion spanning
model programs. arXiv preprint arXiv:2312.07104,
trees. J. Exp. Algorithmics, 5, December 2000.
2024.
[146] Renato Werneck, João Setubal, and Arlindo da Con- [157] G. Zhou, J. Lu, C.-Y. Wan, M. D. Yarvis, and J. A.
ceicão. (old) finding minimum congestion spanning Stankovic. Body Sensor Networks. MIT Press, Cam-
trees. J. Exp. Algorithmics, 5:11, 2000. bridge, MA, 2008.
[147] Thomas Winklehner and Renato Pajarola. Single-pass [158] Gang Zhou, Yafeng Wu, Ting Yan, Tian He, Chengdu
multi-view volume rendering. 07 2007. Huang, John A. Stankovic, and Tarek F. Abdelzaher.
A multifrequency mac specially designed for wireless
[148] Tong Wu, Yu-Jie Yuan, Ling-Xiao Zhang, Jie Yang,
sensor network applications. ACM Trans. Embed. Com-
Yan-Pei Cao, Ling-Qi Yan, and Lin Gao. Recent ad-
put. Syst., 9(4):39:1–39:41, April 2010.
vances in 3d gaussian splatting. Computational Visual
Media, pages 1–30, 2024. [159] M. Zwicker, H. Pfister, J. van Baar, and M. Gross. Ewa
volume splatting. In Proceedings Visualization, 2001.
[149] Linning Xu, Vasu Agrawal, William Laney, Tony Gar- VIS ’01., pages 29–538, 2001.
cia, Aayush Bansal, Changil Kim, Samuel Rota Bulò,
19