3721 InNeRF Learning Interpret

InNeRF introduces a unified Transformer-based framework for interpretable neural radiance fields, enhancing the generalizability of 3D scene representation and rendering. It addresses limitations in previous models by integrating source-view fusion and target-view rendering processes, improving interpretability and performance in novel view synthesis. Experimental results demonstrate that InNeRF outperforms state-of-the-art methods, particularly in scenarios with significant disparities between source and rendering views.

Uploaded by

david teacher

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views9 pages

3721 InNeRF Learning Interpret

Uploaded by

david teacher

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

1

2
InNeRF: Learning Interpretable Radiance Fields for Generalizable 59
60
3 3D Scene Representation and Rendering 61
4 62
5
Anonymous Authors 63
6 64
7 ABSTRACT pooling-based multi-view feature as the conditional input. These 65
8
We propose Interpretable Neural Radiance Fields (InNeRF) for gen- prior NeRFs generally contain three basic components: a CNN- 66
9
eralizable 3D scene representation and rendering. In contrast to pre- based single-view feature extraction module, a pooling-based multi- 67
10
vious image-based rendering, which used two independent working view fusion module, and an MLP-based NeRF module. 68
11
processes of pooling-based fusion and MLP-based rendering, our Despite the intrinsic connection between these modules, each 69
12
framework unifies source-view fusion and target-view rendering module is designed and studied independently, making the overall 70
13
processes via an end-to-end interpretable Transformer-based net- framework disjointed. This incoherent framework design damages 71
14
work. InNeRF enables the investigation of deep relationships be- the model interpretability from three aspects: 1) Separating feature 72
15
tween the target-rendering view and source views that were previ- extraction of each source view overlooks their relevancy in repre- 73
16
ously neglected by pooling-based fusion and fragmented rendering senting 3D scenes. 2) Pooling-based fusion cannot fully explore the 74
17
procedures. As a result, InNeRF improves model interpretability complicated relationship among source views. 3) The MLP network 75
18
by enhancing the shape and appearance consistency of a 3D scene rendering the color and density from a single aggregated feature 76
19
in both the surrounding view space and the ray-cast space. For a struggles to decode intricate relationships between observed views 77
20
query rendering 3D point, InNeRF integrates both its projected 2D and the rendering view. The reason for this framework design is that 78
21
pixels from the surrounding source views and its adjacent 3D points previous NeRFs are built on MLPs that are incapable of processing 79
22
along the query ray and simultaneously decodes this information an arbitrary number of observed views. Consequently, they need 80
23
into the query 3D point representation. Experiments show that an auxiliary fusion model to aggregate multi-view information, and 81
24
InNeRF outperforms state-of-the-art image-based neural rendering pooling-based fusion provides such a straightforward technique. 82
25
methods in both scene-agnostic and per-scene finetuning scenarios, This limitation also impairs the capability of NeRFs to learn a 83
26
especially when there is a considerable disparity between source view-consistent 3D scene representation from observed views, espe- 84
27
views and rendering views. The interpretation experiment shows cially for the scenario where source views have a more complicated 85
28
that InNeRF can explain a query rendering process. relationship with the target view, e.g. the observed source views are 86
29 captured at camera poses that are very different from the camera 87
30
CCS CONCEPTS pose of the target view. When camera poses of source views are 88
31 similar to the rendering view, source views and the target view 89
• Computing methodologies → Computer vision; Rendering.
32 are distributed in a local region in 3D scene representation space, 90
33 making it possible to approximate their relationship by a linear 91
KEYWORDS
34 function as in previous work [4, 22, 25, 29]. However, as the differ- 92
35 Neural Rendering, Network Interpretability ence between observed views and the rendering view increases, the 93
36 correlation becomes more complicated, making it challenging for 94
37 1 INTRODUCTION these approaches to synthesize a realistic novel view. In this sce- 95
38 Novel view synthesis is a long-standing open problem concerned nario, existing MLP-based NeRFs, using a pooling-based function 96
39 with the rendering of unseen views of a 3D scene given a set of to fuse the multi-view, are insufficient to tackle this challenge. 97
40 observed views [16, 21]. Recent remarkable NeRF research [11, 12, Therefore, the fundamental issue is how to free the intrinsic in- 98
41 14, 18, 30] introduces neural radiance field scene representations, terpretability of NeRFs from the previously fragmented frameworks 99
42 which use multi-layer perceptrons (MLPs) to map a continuous 3D for learning generalizable radiance fields. To tackle this unmet need, 100
43 location and view direction to its density and color. we present Interpretable Neural Radiance Fields (InNeRF), an end- 101
44 However, these models need to optimize a specific 3D repre- to-end Transformer-based architecture that unifies source-view 102
45 sentation for each scene, which is time-consuming and does not fusion and target-view rendering processes for generalizable 3D 103
46 learn the shared information among scenes. Subsequently, to learn scene representation and rendering. In the rendering process of a 104
47 prior knowledge in diverse scenes, researchers [4, 22, 25, 29] gen- query 3D point, InNeRF is divided into two stages: the first works 105
48 eralize the radiance field scene representation by incorporating a in the surrounding-view space, integrating information of the pro- 106
49 jected 2D pixels at the surrounding source views for the query 3D 107
Permission to make digital or hard copies of all or part of this work for personal or
50
Unpublished working
classroom use is granted without feedraft.
providedNot for distribution.
that copies are not made or distributed point; and the second works in the ray-cast space, fusing the neigh- 108
51 for profit or commercial advantage and that copies bear this notice and the full citation boring 3D points along the query ray into the representation of the 109
52 on the first page. Copyrights for components of this work owned by others than the query 3D point, as shown in Fig 1. This design provides our model 110
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
53
republish, to post on servers or to redistribute to lists, requires prior specific permission with a comprehensive understanding of the shape and appearance 111
54 and/or a fee. Request permissions from [email protected]. consistency of a 3D scene in both the surrounding-view space and 112
55 ACM MM, 2024, Melbourne, Australia ray-cast spaces. Furthermore, the Transformer-based framework 113
© 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.
56
ACM ISBN 978-x-xxxx-xxxx-x/YY/MM taking advantage of the attention mechanism enables our rendering 114
57 https://doi.org/10.1145/nnnnnnn.nnnnnnn 115
58 116
ACM MM, 2024, Melbourne, Australia Anonymous Authors

117 process to learn in-depth and complicated relationships between appearance in a scene by considering the corresponding informa- 175
118 source views and the rendering view, which is essential for novel tion in the surrounding view space and the ray-cast space. 176
119 view synthesis. Therefore, InNeRF has improved interpretability Transformer. Transformer recently emerged as a promising 177
120 and learns a more comprehensive general neural radiance field. network framework and has achieved impressive performance in 178
121 Our contributions can be summarized as follows: natural language processing [2, 20, 27] and computer vision [3, 5, 6, 179
122 9, 13, 31]. The main idea behind this approach is to utilize the multi- 180
• We propose Interpretable Neural Radiance Fields (InNeRF), a
123 head self-attention operation to explore the dependence within 181
unified Transformer-based framework, to study deep correla-
124 input tokens and learn a global feature representation. In the object 182
tions between observed and rendering views and simultane-
125 detection task, DETR [3] presents a new framework combining 183
ously integrate this intricate information into a generalizable
126 a 2D CNN with a Transformer and predicts object detection in 184
neural radiance field.
127 parallel as a sequence of output tokens. In image classification, ViT 185
• InNeRF exploits geometry and appearance consistency of
128 [6] demonstrates the impressive ability to learn global contexts in 186
a neural radiance field in both the surrounding view space
129 Transformer even without using CNN features [23]. In 3D scene 187
and the ray-cast space, strengthening its interpretability.
130 understanding, FlatFormer [13] introduces a new window attention 188
• Experiments show that InNeRF achieves more realistic ren-
131 mechanism to optimize the computational efficiency and achieve 189
dering results than state-of-the-art methods in both scene-
132 improved performance in reconstruction. 190
agnostic and per-scene fine-tuning settings, especially when
133 For novel view synthesis, we introduce an end-to-end Trans- 191
source views are captured at camera poses that differ signifi-
134 former framework to implicitly model the continuous 3D scene 192
cantly from the rendering view.
135 as a neural radiance field representation. Our model leverages the 193
• InNeRF explains a query rendering process by utilizing its
136 advantage of Transformer in exploring deep relationships among 194
attention layers. Experiments show that the interpretation
137 observed images to learn a consistent generalizable 3D scene repre- 195
of InNeRF is consistent with human perception.
138 sentation. 196
139 197
140 2 RELATED WORK 3 METHODOLOGY 198
141 Novel View Synthesis. The goal of novel view synthesis is to ren- 3.1 Framework 199
142 der unseen views of a scene from its multiple observed images. The 200
143 essence of novel view synthesis is exploring and learning a view- We propose InNeRF to learn an interpretable generic radiance field 201
144 consistent 3D scene representation from a sparse set of input views. representation for novel scenes. Given captured multi-view images 202
145 The early work focused on modeling 3D shapes by discrete geomet- {I𝑚 }𝑚=1
𝑀 (𝑀 source views) of diverse scenes and their camera pa-
203
146 ric 3D representations, such as mesh surface [7, 8, 17], point cloud rameters {Θ𝑚 }𝑚=1
𝑀 (camera poses, intrinsic parameters and scene
204
147 [10, 19] and voxel grid [1, 24, 28]. Although explicit 3D geometry- bounds), InNeRF reconstructs a generic radiance field 𝐹 InNeRF to 205
148 based representations are intuitive, they are discrete and sparse, learn the prior knowledge: 206
149 making them incapable of learning high-resolution renderings with (𝜎, c) ← 𝐹 InNeRF ((𝑥, 𝑦, 𝑧), d; {I𝑚 , Θ𝑚 }𝑚 ) , (1) 207
150 sufficient quality for complex scenes. 208
151 More recently, the impressive neural radiance field (NeRF) [16] where (𝑥, 𝑦, 𝑧) is a 3D point location, d denotes a unit-length di- 209
152 has shown a solid ability to synthesize novel views by represent- rection of a viewing ray and outputs are a differential volumetric 210
153 ing continuous scenes as 5D radiance fields in MLPs. Nevertheless, density 𝜎 and a directional emitted color c. 211
154 NeRF optimizes each scene representation independently, not ex- As shown in Fig. 1, for rendering a query 3D point on a target- 212
155 ploring the shared information amongst scenes and being time- viewing ray, the proposed InNeRF includes two stages: 1) In the 213
156 consuming. Subsequently, researchers proposed models, such as surrounding-view space, our Decoder𝜎𝑣𝑖𝑒𝑤𝑠 (Sec. 3.2) and Decoder𝑐𝑣𝑖𝑒𝑤𝑠 214
157 PixelNeRF [29], MVSNeRF [4], IBRNet [25], which receives as con- (Sec. 3.4) fuse source views and query spatial information ((𝑥, 𝑦, 𝑧), 215
158 ditional inputs multiple observed views to learn a general neural d) into the latent density and color representations for the query 216
𝑟𝑎𝑦
159 radiance field. These methods are proposed using the divide-and- point; 2) In the ray-cast space, we use Decoder𝜎 (Sec. 3.3) and 217
𝑟𝑎𝑦
160 conquer strategy and have two separate components: a CNN feature Decoder𝑐 (Sec. 3.5) to enhance the query density and color repre- 218
161 extractor for each observed image and an MLP as the NeRF network. sentations by considering neighboring points along the target ray. 219
162 However, pooling-based fusion models in these methods barely ex- Finally, we obtain the density and color for the query point on a 220
163 plore the complex relationship across multiple views for 3D scene target-viewing ray. 221
164 understanding. Furthermore, processing each 3D point indepen- 222
165 dently ignores the geometry consistency of a 5D radiance field of a 3.2 Density Decoder in Surrounding-view Space 223
166 scene. We first present our density decoder in surrounding-view space 224
167 Here, we propose an encoder-decoder Transformer framework, (Decoder𝜎𝑣𝑖𝑒𝑤𝑠 ) decoding the projected pixels at source views into 225
168 InNeRF, to represent the neural radiance field scene for novel view the query latent density code. 226
169 synthesis. Compared with the pooling-based fusion in previous For each source view, we first extract its feature volume by a 227
170 work, InNeRF can explore deep relationships among multiple views pre-trained view-shared U-Net. A query 3D point (𝑥, 𝑦, 𝑧) is then 228
171 and aggregate multi-view information into the coordinate-based projected into each source view I𝑚 by its camera projection matrix 229
172 scene representation by the attention mechanism in a unified net- Θ𝑚 to extract the corresponding RGB color {c𝑚 𝑀
𝑠𝑟𝑐 }𝑚=1 and fea- 230
173 work. Meanwhile, InNeRF can learn the consistency of shape and 𝑚 }𝑀 at the projected 2D pixel {p𝑚 }𝑀 location
ture vector {f𝑠𝑟𝑐 231
𝑚=1 𝑚=1
174 232
InNeRF: Learning Interpretable Radiance Fields for Generalizable 3D Scene Representation and Rendering ACM MM, 2024, Melbourne, Australia

233 291
234 292
235 293
236 294
237 295
238 296
239 297
240 298
241 299
242 300
243 301
244 302
245 303
246 304
247 305
248 306
249 307
250 308
251 309
252 310
253 311
254
Figure 1: Workflow of the proposed InNeRF. Module A is the density decoder in surrounding-view space (Sec. 3.2). Module B is 312
255
the density decoder in ray-cast space (Sec. 3.3). Module C is the color decoder in surrounding-view space (Sec. 3.4). Module D is 313
256
the color decoder in ray-cast space (Sec. 3.5). 314
257 315
258 through bilinear interpolation. In each source view, we also record in each head). The Attention function is computed by 316
259 its viewing direction {d𝑚 𝑀 317
𝑠𝑟𝑐 }𝑚=1 for the projected pixel from the
260 source camera pose. Based on it, we obtain the initial source-view 318
261 QK𝑇 319
embeddings {x𝑚 𝑀
0 }𝑚=1 for source views. Attention(Q, K, V) = softmax( √︁ )V , (7)
262 𝑑𝑘 320
For the query point, Decoder𝜎𝑣𝑖𝑒𝑤𝑠 receives the initial source-
263 321
view embeddings {x𝑚 𝑀
0 }𝑚=1 and the learnable query density embed-
264 322
ding x0 as inputs X0 . Decoder𝜎𝑣𝑖𝑒𝑤𝑠 can be formulated as follows:
𝜎
Here, 𝑁𝑞 queries are stacked in Q = [q1 ; q2 ; · · · ; q𝑁𝑞 ] ∈ R𝑁𝑞 ×𝑑𝑘 ,
265 323
266 a set of 𝑁𝑘 key-value pairs are stacked in K = [k1 ; k2 ; · · · ; ; k𝑁𝑘 ] ∈ 324
X0 = [x𝜎0 ; x10 ; x20 ; · · · ; x𝑀
0 ] , (2)
267 R𝑁𝑘 ×𝑑𝑘 and V = [v1 ; v2 ; · · · ; v𝑁𝑘 ] ∈ R𝑁𝑘 ×𝑑 𝑣 , 𝑑𝑘 is used as a scalar 325
268 X̃𝑙+1 = Norm(Pixels×Query𝜎 (X𝑙 ) + X𝑙 ) , (3) for normalization. Our Decoder𝜎𝑣𝑖𝑒𝑤𝑠 is invariant to permutations 326
269
X𝑙+1 = Norm(FFN( X̃𝑙+1 ) + X̃𝑙+1 ) , (4) of source views and can receive an arbitrary number of source 327
270 views. 328
271 where 𝑙 denotes the index of a basic block (𝑙 = 1, · · · , 𝐿), “Norm” 329
272 is a layer normalization function, and “FFN” is a position-wise 330
273 feed-forward network. At the 𝐿-th block, we can obtain X𝐿 =
3.3 Density Decoder in Ray-cast Space 331
𝑟𝑎𝑦
274 [x𝜎𝐿 ; x𝐿1 ; x𝐿2 ; · · · ; x𝐿𝑀 ]. In Decoder𝜎𝑣𝑖𝑒𝑤𝑠 , we concatenate the embed- The density decoder in ray-cast space (Decoder𝜎 ) decodes the 332
275 ding x𝜎𝐿 and its 3D coordinate location (𝑥, 𝑦, 𝑧) as the latent density density information of the query point by aggregating the density 333
276 code for the query point. features of the neighboring 3D points along the target-view ray. 334
277 Pixels×Query Density Attention layers explore deep relation- For the query point and neighboring 2𝑛 points along the target- 335
278 ships among source views, defined as follows: viewing ray, we denote [𝜎0𝑖 −𝑛 ; · · · 𝜎0𝑖 · · · ; 𝜎0𝑖+𝑛 ] as their initial den- 336
𝑟𝑎𝑦
279 sity representations at the input end of the Decoder𝜎 , where 337
280 Pixels×Query𝜎 (X) = MH-Attn(X, X, X) , (5) 𝑖
the query point is denoted as 𝑃 and neighboring 2𝑛 points are 338
281 {𝑃 𝑖 −𝑛 , · · · , 𝑃 𝑖 −1, 𝑃 𝑖+1, · · · , 𝑃 𝑖+𝑛 }. Here, the initial density represen- 339
where the multi-head attention function is defined as:
282 tation for each 3D point is computed via an FC layer based on the 340
283
MH-Attn(Q, K, V) = Cat(A1, · · · , A𝐻 )W , (6) Decoder𝜎𝑣𝑖𝑒𝑤𝑠 output for the corresponding point (𝜎0 = FC(x𝜎𝐿 ⊙ 341
284 (𝑥, 𝑦, 𝑧)), where ⊙ is the concatenation operation). Then positional 342
where Aℎ = Attention(Qℎ , Kℎ , Vℎ ) ,
285 encodings E𝑝𝑜𝑠 are added to density representations of neighboring 343
𝑞
286 Qℎ = QWℎ ; Kℎ = KWℎ𝑘 ; Vℎ = VWℎ𝑣 . points to keep their position information in the ray-cast space. Each 344
287 positional encoding informs each point of its 3D spatial location, 345
Here, Wℎ , Wℎ𝑘 ∈ R𝑑𝑘 ×𝑑ℎ ; Wℎ𝑣 ∈ R𝑑 𝑣 ×𝑑ℎ and W ∈ R𝐻𝑑ℎ ×𝑑𝑘 are
𝑞
288 which is computed by utilizing sine and cosine functions of different 346
289 parameter matrices (𝐻 × 𝑑ℎ = 𝑑𝑘 and 𝑑ℎ is the feature dimension frequencies as [3]. 347
290 348
ACM MM, 2024, Melbourne, Australia Anonymous Authors

𝑟𝑎𝑦
349 Decoder𝜎 is formulated as follows: • Scene-agnostic setting: we train a single scene-agnostic model 407
350 on a large training dataset that includes various camera se- 408
D0 = [𝜎0𝑖 −𝑛 ; · · · 𝜎0𝑖 · · · ; 𝜎0𝑖+𝑛 ] +E 𝑝𝑜𝑠
, (8)
351 tups and scene types. We test its generalization ability to 409
352 D̃𝑙+1 = Norm(Points×Query𝜎 (D𝑙 ) + D𝑙 ) , (9) unseen scenes on all test scenes. 410
353
D𝑙+1 = Norm(FFN( D̃𝑙+1 ) + D̃𝑙+1 ) , (10) • Per-scene fine-tuning setting: our pretrained scene-agnostic 411
354 model is finetuned on each test scene. We evaluate each 412
355 where the Points×Query Density Attention layer is computed as finetuned scene-specific model separately. 413
356 Points×Query𝜎 = MH-Attn(D, D, D) fusing information of sur- 414
rounding 3D points on the target-viewing ray. At the end block, We train and evaluate our method on a collection of multi-view
357 415
𝑟𝑎𝑦
the Decoder𝜎 outputs the density representation 𝜎𝐿𝑖 of the query datasets containing both synthetic data and real data, as in IBRNet
358 416
3D point, and then we use an FC layer to project it to the density [25]. For novel view synthesis, we quantitatively evaluate the ren-
359 417
value. dered image quality based on PSNR, SSIM [26] (higher is better),
360 418
and LPIPS [32] (lower is better).
361 419
362 3.4 Color Decoder in Surrounding-view Space 420
4.1 Conditional Source-view Set
363 The color decoder in surrounding-view space (Decoder𝑐𝑣𝑖𝑒𝑤𝑠 ) de- 421
364 codes the projected pixels’ information from source views into the Experiments are designed to examine whether the proposed InNeRF 422
365 query color representation. can effectively learn a neural radiance field scene representation 423
366 Decoder𝑐𝑣𝑖𝑒𝑤𝑠 can be formulated as follows: in scenarios where the variation degree between the conditional 424
367 source view set and the target rendering view changes. Here, we 425
Ỹ𝑙+1 = Norm(Pixels×Query𝑐 (Y𝑙 , X̂, Ĉ) + Y𝑙 ) , (11) sample 10 views from the surrounding view set as the conditional
368 426
369 Y𝑙+1 = Norm(FFN( Ỹ𝑙+1 ) + Ỹ𝑙+1 ) . (12) source-view set to render a target view. Given the camera pose, 427
370 we can compute and sort the difference between each surrounding 428
In Pixels×Query Color Attention layers, the initial query color
371 view and the target rendering view. 429
embedding is represented as Y0 = FC(𝜎𝐿𝑖 ) ⊙ d𝑡𝑔𝑡 , where 𝜎𝐿𝑖 is
372 𝑟𝑎𝑦 Based on the sorted order, we construct 𝑁𝑠 conditional source- 430
the latent density representation from Decoder𝜎 and d𝑡𝑔𝑡 is the 𝑁𝑠
view sets ({S𝑖 }𝑖=1 ) from the surrounding-view set to render each test
373 431
target-viewing direction for the query point. Pixels×Query Color view. For the real evaluation dataset, there are 𝑁𝑠 = 3 sets, i.e. top
374 432
Attention layer is calculated as: 10 (S1 ), middle 10 (S2 ), and bottom 10 (S3 ) views. For the synthetic
375 433
376 Pixels×Query𝑐 (Y, X̂, Ĉ) = MH-Attn(Y, X̂, Ĉ) , (13) evaluation dataset, there are 𝑁𝑠 = 4 sets which are the top 10 (S1 ), 434
377 middle 10 (S2 ), 3/4th 10 (S3 ), and bottom 10 (S4 ) views, respectively. 435
where the value is Ĉ = 1 ); · · ·
[𝛾 (c𝑠𝑟𝑐 𝑀 )] (𝛾 (·) is the embedding
; 𝛾 (c𝑠𝑟𝑐
378 Fig. 4 shows visual examples of S1 and S4 for illustration. 436
function) and the key is X̂ = 1 1 ; · · · ; FC(x𝑀 ) ⊙ d𝑀 ]
[FC(x𝐿 ) ⊙ d𝑠𝑟𝑐
379 𝐿 𝑠𝑟𝑐 437
380
representing the projected pixels’ representations in source views. 4.2 Results 438
381
The output Y𝐿 is the latent color code for the query 3D point. 439
In both the scene-agnostic (Sec. 4.2.1) and per-scene fine-tuning
382 experiments (Sec. 4.2.2), we evaluate competing methods in sce- 440
383
3.5 Color Decoder in Ray-cast Space 441
𝑟𝑎𝑦 narios where the source views belong to different source view sets
384 The color decoder in ray-cast space (Decoder𝑐 ) learns a query 𝑁𝑠 442
{S𝑖 }𝑖=1 defined in Sec. 4.1. To render a testing view, each competing
385 color by fusing latent color codes of adjacent 3D points along the tar- 443
approach receives as input the same source-view set. In Sec. 4.2.3,
386 get ray in Points×Query Color Attention layers (Points×Query𝑐 (Z) = 444
𝑟𝑎𝑦 we provide the interpretation results of InNeRF.
387 MH-Attn(Z, Z, Z)). Decoder𝑐 is represented as: 445
388
[z𝑖0−𝑛 ; · · · z𝑖0 · · · ; z𝑖+𝑛 𝑝𝑜𝑠 4.2.1 Scene-agnostic Experiments. In scene-agnostic experiments, 446
Z0 = 0 ] +E , (14)
389 InNeRF is compared with PixelNeRF [29], MVSNeRF [4] and IBR- 447
390 Z̃𝑙+1 = Norm(Points×Query𝑐 (Z𝑙 ) + Z𝑙 ) , (15) Net [25] on the real forward-facing dataset [15] and the realistic 448
391 Z𝑙+1 = Norm(FFN( Z̃𝑙+1 ) + Z̃𝑙+1 ) . (16) synthetic dataset [25]. 449
392 Tab. 1 shows that the proposed InNeRF outperforms other meth- 450
393 where the query latent color code from Decoder𝑐𝑣𝑖𝑒𝑤𝑠 is assigned to ods on both datasets under the scene-agnostic setting. To facilitate 451
394 the corresponding z𝑖0 and likewise for adjacent 2𝑛 points in ray-cast the quantitative comparison in each metric, the best scores are 452
395 space. marked in bold. It shows that InNeRF has a better generalization 453
𝑟𝑎𝑦
396 Subsequently, after the Decoder𝑐 , we use an FC layer to project ability to novel scenes though it is trained on datasets with notice- 454
𝑖
the output color embedding z𝐿 to its output predicted color value.
397 ably different scenes and view distribution. The detailed results in 455
398 Then the predicted density and color of each query point along a the supplementary material also reveal that InNeRF has a better 456
399 ray of the desired virtual camera are put forward to the classical performance for each scene. 457
400 volume rendering. The implementation details of the network and The superior generalization ability of InNeRF is also reflected in 458
401 training are described in the supplementary material. qualitative results. As shown in Fig. 2, we compare the performance 459
402 of methods on rendering the same randomly-selected testing view 460
403 4 EXPERIMENTS based on different source-view sets. The results of other approaches 461
404 The proposed approach is evaluated in the following experimental contain more obvious artifacts than InNeRF and even become worse 462
405 settings: in the S3 scenario where the difference between source views and 463
406 464
InNeRF: Learning Interpretable Radiance Fields for Generalizable 3D Scene Representation and Rendering ACM MM, 2024, Melbourne, Australia

465 Table 1: Quantitative comparison of methods on the scene-agnostic setting for the realistic synthetic dataset [16] and the real 523
466 forward-facing dataset [15]. 524
467 525
468 PSNR ↑ SSIM ↑ LPIPS ↓ 526
469 Dataset S𝑖 PixelNeRF MVSNeRF IBRNet InNeRF PixelNeRF MVSNeRF IBRNet InNeRF PixelNeRF MVSNeRF IBRNet InNeRF 527
S1 21.20 22.47 25.31 26.45 0.857 0.874 0.913 0.922 0.161 0.143 0.104 0.092
470 528
realistic S2 17.00 18.44 21.80 23.16 0.732 0.755 0.805 0.842 0.295 0.286 0.236 0.183
471 529
synthetic S3 15.88 17.43 20.99 22.70 0.660 0.687 0.749 0.810 0.355 0.328 0.270 0.211
472 S4 14.67 16.25 19.97 21.72 0.567 0.597 0.672 0.758 0.440 0.400 0.322 0.248 530
473 S1 19.02 20.09 24.96 24.97 0.651 0.680 0.813 0.816 0.380 0.347 0.208 0.205 531
real
474 S2 16.30 17.68 22.69 22.94 0.576 0.614 0.749 0.760 0.459 0.422 0.273 0.260 532
forward-facing
475 S3 13.56 15.21 20.33 20.81 0.489 0.543 0.683 0.701 0.551 0.504 0.340 0.318 533
476 534
Table 2: Quantitative comparisons of methods on the per-scene fine-tuning setting for the realistic synthetic dataset [16] and
477 535
the real forward-facing dataset [15].
478 536
479 537
PSNR ↑ SSIM ↑ LPIPS ↓
480 538
Dataset S𝑖 PixelNeRF MVSNeRF IBRNet InNeRF PixelNeRF MVSNeRF IBRNet InNeRF PixelNeRF MVSNeRF IBRNet InNeRF
481 S1 24.06 27.04 29.27 30.79 0.877 0.913 0.940 0.952 0.140 0.103 0.076 0.064 539
482 realistic S2 20.15 23.30 25.91 27.76 0.770 0.813 0.847 0.881 0.263 0.221 0.187 0.142 540
483 synthetic S3 19.27 22.56 25.23 27.35 0.714 0.759 0.802 0.849 0.301 0.256 0.216 0.165 541
484 S4 18.23 21.57 24.33 26.65 0.639 0.689 0.739 0.803 0.358 0.306 0.254 0.195 542
485
S1 20.72 23.32 26.61 26.65 0.693 0.758 0.847 0.853 0.325 0.260 0.177 0.173 543
real
S2 18.28 21.11 24.69 24.99 0.625 0.696 0.788 0.811 0.384 0.313 0.225 0.212
486 forward-facing 544
S3 15.66 18.62 22.62 23.25 0.544 0.623 0.727 0.767 0.458 0.377 0.276 0.256
487 545
488 546
489 547
490 548
491 549
492 550
493 551
494 552
495 553
496 554
497 555
498 556
499 557
500 558
501 559
502 560
503 561
504 562
505 563
506 564
507 565
508 566
509 567
510 568
511 569
512 570
513 571
514 572
515 573
516 574
517 575
518 576
519 577
520
Figure 2: Qualitative results for the Trex and the Fern scenes [15] under the scene-agnostic setting. 578
521 579
522 580
ACM MM, 2024, Melbourne, Australia Anonymous Authors

581 639
582 640
583 641
584 642
585 643
586 644
587 645
588 646
589 647
590 648
591 649
592 650
593 651
594 652
595 653
596 654
597 Figure 3: Qualitative results for the Fern scene [15] under the per-scene finetuning setting. 655
598 656
599 657
600 658
601 659
602 660
603 661
604 662
605 663
606 664
607 665
608 666
609 667
610 668
611 669
612 670
613 671
614 672
615 673
616 674
617 675
618 676
619 Figure 4: Qualitative results for the Hotdog scene under the per-scene finetuning setting. The source-view sets S1 and S4 are 677
620 listed in the yellow frame. 678
621 679
622 680
the target view is larger than that in S1 and S2 . As highlighted in and learn a better scene representation in challenging scenarios.
623 681
colored frames, other methods cannot synthesize clean boundaries More results are provided in the supplementary material.
624 682
of guardrails and fronds and recover thin structures.
625 683
From the above qualitative results, we observe that there exists
626 4.2.2 Per-scene Finetuning Experiments. In the per-scene finetun- 684
a gradual degradation in the synthesized view when the difference
627 ing experiment, pretrained models of competing methods are fine- 685
between source views and the target rendering view increases from
628 tuned for each scene. 686
S1 to S3 . Similarly, in quantitative results from S1 to S3 , PSNR and
629 As shown in Tab. 2, InNeRF outperforms other methods after 687
SSIM values both decrease while LPIPS increases for all competing
630 per-scene finetuning. Similar to scene-agnostic results, per-scene 688
methods. It reveals that the more different the source views are
631 finetuning results further validate that InNeRF can provide more 689
with respect to the target rendering view, the more difficult novel
632 satisfactory novel view rendering than other methods in differ- 690
view synthesis becomes. Tab. 1 also indicates that the advantage
633 ent source-view settings. Meanwhile, performance gaps between 691
of InNeRF becomes more significant than other methods with the
634 InNeRF and other methods become larger in contrast with that 692
increase of the difference between source views and the target
635 in the scene-agnostic setting, which indicates that per-scene fine- 693
view. It demonstrates that InNeRF has a strong ability to explore
636 tuning can further fulfill the potential of InNeRF. Similar to quan- 694
complicated relationships between source views and the target view
637 titative results, Fig. 3 shows that InNeRF provides more realistic 695
638 696
InNeRF: Learning Interpretable Radiance Fields for Generalizable 3D Scene Representation and Rendering ACM MM, 2024, Melbourne, Australia

697 755
698 756
699 757
700 758
701 759
702 760
703 761
704 762
705 763
706 764
707 765
708 766
709 767
710 768
711 769
712 770
713 771
714 772
715 773
716 774
717 775
718 Figure 5: Interpretation results of finetuned InNeRF for a target view of Chair scene based on source-view set S4. 776
719 777
720 778
721 in both the surrounding-view space and the ray-cast space, thus 779
722 improving the model interpretability. Here, we evaluate the in- 780
723 terpretability of InNeRF to examine whether it is consistent with 781
724 human perception. 782
725 In the surrounding-view space, we visualize the attention of 783
726 different source views to a target 3D point to interpret its render- 784
727 ing in Decoder𝜎𝑣𝑖𝑒𝑤𝑠 and Decoder𝑐𝑣𝑖𝑒𝑤𝑠 . Similarly, in the ray-cast 785
𝑟𝑎𝑦 𝑟𝑎𝑦
728 space, the rendering process of Decoder𝜎 and Decoder𝑐 can 786
729 be explored by visualizing the attention of surrounding 3D points 787
730 on the target-viewing ray to the target 3D point. Specifically, for 788
731 a 2D region (a 5 × 5 pixel region) in the rendering view, we first 789
Figure 6: Interpretation results of fine-tuned InNeRF for a compute the average depth value of the corresponding view direc-
732 790
target view of the Lego scene. tions for the target pixels based on our learned neural radiance field.
733 791
734 Then we retrieve the 3D point that is located closest to the average 792
735 depth in the average viewing direction as the target-interpreted 3D 793
view synthesis results with fewer artifacts in comparison to other point. For the target 3D point, we can explain its rendering process
736 794
approaches. in both surrounding-view and ray-cast spaces by visualizing the
737 795
In Fig. 4, InNeRF is compared with IBRNet in four source-view corresponding attention layers in InNeRF.
738 796
sets (S1 , S2 , S3 and S4 ). Here, we randomly select one view from the To analyze the interpretability of InNeRF, we provide interpreta-
739 797
hotdog scene as the target rendering view. To show the difference tion to a randomly selected testing view of the chair scene based on
740 798
between source view sets, we display overall source views in S1 source-view sets S4 in Fig. 5. The target rendering view is shown
741 799
and S4 at the bottom of Fig. 4. It is obvious that the view angles of in Fig. 5 (a) and the target location for interpretation is marked
742 800
source views in S1 are closer to the rendering view compared with as a red dot. For human visual perception, the source views are
743 801
those in S4 . In Fig. 4, the top two rows display the rendering views divided into two groups depending on whether they capture the
744 802
of competing methods based on four source-view sets. The artifacts target location (red dot) in the rendering view. Fig. 5 (b-1) shows
745 803
in the rendering views of IBRNet are perceptible in S2 and become source views that capture the target location, and Fig. 5 (b-2) shows
746 804
worse in S3 and S4 . In contrast, the artifacts in rendering views of source views that fail to capture.
747 805
InNeRF remain at a low degree in four source-view sets. It illustrates For the target location (red dot) in Fig. 5 (a), Fig. 5 (c) and (d)
748 806
InNeRF can obtain better rendering results than IBRNet in different display attention of source views to the target location for rendering
749 807
source-view sets, especially when there is a large difference between the query density and color in Decoder𝜎𝑣𝑖𝑒𝑤𝑠 and Decoder𝑐𝑣𝑖𝑒𝑤𝑠 ,
750 808
the source views and the rendering view. respectively. In Fig. 5 (c) and (d), attention of the visible source
751 809
752 4.2.3 Analysis of Interpretability in InNeRF. Based on the atten- views in Fig. 5 (b) are colored blue for clarity. Source views (85, 810
753 tion mechanism, InNeRF utilizes shape and appearance consistency 41, and 61) with high attention values are consistent with those 811
754 812
ACM MM, 2024, Melbourne, Australia Anonymous Authors

813 871
814 872
815 873
816 874
817 875
818 876
819 877
820 878
821 879
822 880
823 881
824 882
825 883
826 884
827 885
828 886
829 887
830 888
831 889
832 890
833 891
834 892
835 893
836 894
837 895
838 896
839 897
840 898
841 899
842 900
843 901
844 902
845
Figure 7: Interpretation results of InNeRF for (top) the wings of the nose spot, (middle) the leaves on the left side, and (bottom) 903
846
the black tile in the novel views under the per-scene finetuning setting. 904
847 905
848 with the visible target location. It indicates that the attention layers show the top two source views that are of high-density attention 906
849 in Decoder𝜎𝑣𝑖𝑒𝑤𝑠 and Decoder𝑐𝑣𝑖𝑒𝑤𝑠 can learn the important source for the target location, and the last two columns show the last two 907
850 views that meet human perception. Fig. 5 (e) depicts the density source views with low-density attention. For different source view 908
851 attention (green) and color attention (orange) among 3D points sets, the top two source views of the framed leaf region both include 909
852 along the target-viewing ray for rendering the query 3D point in the corresponding leaf region while the last two source views do 910
853 𝑟𝑎𝑦 𝑟𝑎𝑦 911
Decoder𝜎 and Decoder𝑐 . Here, the red index (83) denotes the not. It indicates that the interpretation results are reasonable for
854 human perception. 912
retrieved 3D point for the target location in the rendering view.
855 913
As shown in Fig. 5 (e), both density attention and color attention
856 𝑟𝑎𝑦 𝑟𝑎𝑦 914
in Decoder𝜎 and Decoder𝑐 exist a crest near the query 3D
857
point, which illustrates that InNeRF in the ray-cast space takes into
5 CONCLUSION 915
858
account the consistency of neighbor points when rendering the We propose a unified Transformer-based NeRF framework to learn a 916
859
query point. general neural radiance field for novel view synthesis. The proposed 917
860
Fig. 6 shows the top two source views that are of high-density framework can explore complex relationships between source views 918
861
attention for the target location of Lego in the realistic synthetic and the target rendering view. Meanwhile, the framework improves 919
862
dataset, and the last two columns show the last two source views intrinsic interpretability by utilizing the shape and appearance 920
863
with low-density attention. Given that the top-attention source consistency of 3D scenes. Experiments demonstrate that InNeRF 921
864
views capture the target location (red frame), it is reasonable that achieves state-of-the-art performance on real and synthetic datasets 922
865
they receive more attention for the query rendering. in both scene-agnostic and per-scene finetuning settings. In the 923
866
Fig. 7 provides the interpretation results in the forward-facing future, we intend to extend InNeRF to conditional generative ra- 924
867
dataset. The leftmost column shows the rendering view and an diance fields, employing learned prior knowledge to generate a 925
868
enlarged region framed by a blue box. The second and third columns more expressive and interpretable 3D scene representation for the 926
869 conditional information. 927
870 928
InNeRF: Learning Interpretable Radiance Fields for Generalizable 3D Scene Representation and Rendering ACM MM, 2024, Melbourne, Australia

929 REFERENCES Conference on Computer Vision. 15182–15192. 987

930 [1] Haotian Bai, Yiqi Lin, Yize Chen, and Lin Wang. 2023. Dynamic PlenOctree for [23] Dan Wang, Xinrui Cui, Xun Chen, Rabab Ward, and Z. Jane Wang. 2021. Inter- 988
Adaptive Sampling Refinement in Explicit NeRF. In Proceedings of the IEEE/CVF preting Bottom-Up Decision-Making of CNNs via Hierarchical Inference. IEEE
931 Transactions on Image Processing 30 (2021), 6701–6714. 989
International Conference on Computer Vision. 8785–8795.
932 [2] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, [24] Dan Wang, Xinrui Cui, Xun Chen, Zhengxia Zou, Tianyang Shi, Septimiu Sal- 990
933 Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda cudean, Z. Jane Wang, and Rabab Ward. 2021. Multi-View 3D Reconstruction 991
Askell, et al. 2020. Language models are few-shot learners. Advances in neural With Transformers. In Proceedings of the IEEE/CVF International Conference on
934 Computer Vision (ICCV). 5722–5731. 992
information processing systems 33 (2020), 1877–1901.
935 [3] Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexan- [25] Qianqian Wang, Zhicheng Wang, Kyle Genova, Pratul P Srinivasan, Howard 993
der Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with Zhou, Jonathan T Barron, Ricardo Martin-Brualla, Noah Snavely, and Thomas
936 994
transformers. In European Conference on Computer Vision. Springer, Springer Funkhouser. 2021. Ibrnet: Learning multi-view image-based rendering. In Pro-
937 ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 995
International Publishing, Cham, 213–229.
938 [4] Anpei Chen, Zexiang Xu, Fuqiang Zhao, Xiaoshuai Zhang, Fanbo Xiang, Jingyi 4690–4699. 996
Yu, and Hao Su. 2021. Mvsnerf: Fast generalizable radiance field reconstruction [26] Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image
939 quality assessment: from error visibility to structural similarity. IEEE transactions 997
from multi-view stereo. In Proceedings of the IEEE/CVF International Conference
940 on Computer Vision. 14124–14133. on image processing 13, 4 (2004), 600–612. 998
941 [5] Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, [27] Jilan Xu, Junlin Hou, Yuejie Zhang, Rui Feng, Yi Wang, Yu Qiao, and Weidi Xie. 999
and Ilya Sutskever. 2020. Generative pretraining from pixels. In International 2023. Learning open-vocabulary semantic segmentation models from natural
942 language supervision. In Proceedings of the IEEE/CVF Conference on Computer 1000
Conference on Machine Learning. PMLR, 1691–1703.
943 [6] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xi- Vision and Pattern Recognition. 2935–2944. 1001
aohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg [28] Bo Yang, Stefano Rosa, Andrew Markham, Niki Trigoni, and Hongkai Wen. 2018.
944 1002
Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Dense 3D object reconstruction from a single depth view. IEEE transactions on
945 pattern analysis and machine intelligence 41, 12 (2018), 2820–2834. 1003
Worth 16x16 Words: Transformers for Image Recognition at Scale. In International
946 Conference on Learning Representations (ICLR). [29] Alex Yu, Vickie Ye, Matthew Tancik, and Angjoo Kanazawa. 2021. pixelnerf: 1004
[7] Yiming Gao, Yan-Pei Cao, and Ying Shan. 2023. SurfelNeRF: Neural Surfel Ra- Neural radiance fields from one or few images. In Proceedings of the IEEE/CVF
947 Conference on Computer Vision and Pattern Recognition. 4578–4587. 1005
diance Fields for Online Photorealistic Reconstruction of Indoor Scenes. arXiv
948 preprint arXiv:2304.08971 (2023). [30] Heng Yu, Joel Julin, Zoltan A Milacski, Koichiro Niinuma, and László A Jeni. 2023. 1006
949 [8] Georgia Gkioxari, Jitendra Malik, and Justin Johnson. 2019. Mesh r-cnn. In DyLiN: Making Light Field Networks Dynamic. In Proceedings of the IEEE/CVF 1007
Proceedings of the IEEE/CVF International Conference on Computer Vision. 9785– Conference on Computer Vision and Pattern Recognition. 12397–12406.
950 [31] H. Yu, Z. Qin, J. Hou, M. Saleh, D. Li, B. Busam, and S. Ilic. 2023. Rotation-Invariant 1008
9795.
951 [9] L. Jiang, Z. Yang, S. Shi, V. Golyanik, D. Dai, and B. Schiele. 2023. Self-Supervised Transformer for Point Cloud Matching. In 2023 IEEE/CVF Conference on Computer 1009
Pre-Training with Masked Shape Prediction for 3D Scene Understanding. In Vision and Pattern Recognition (CVPR). 5384–5393.
952 1010
2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). [32] Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang.
953 2018. The unreasonable effectiveness of deep features as a perceptual metric. 1011
1168–1178.
954 [10] Xin Lai, Jianhui Liu, Li Jiang, Liwei Wang, Hengshuang Zhao, Shu Liu, Xiaojuan In Proceedings of the IEEE conference on computer vision and pattern recognition. 1012
Qi, and Jiaya Jia. 2022. Stratified transformer for 3d point cloud segmentation. In 586–595.
955 1013
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
956 8500–8509. 1014
957 [11] Gengyan Li, Abhimitra Meka, Franziska Mueller, Marcel C. Buehler, Otmar 1015
Hilliges, and Thabo Beeler. 2022. EyeNeRF: A Hybrid Representation for Photore-
958 1016
alistic Synthesis, Animation and Relighting of Human Eyes. ACM Trans. Graph.
959 41, 4 (2022). 1017
960 [12] Zhengqi Li, Qianqian Wang, Forrester Cole, Richard Tucker, and Noah Snavely. 1018
2023. Dynibar: Neural dynamic image-based rendering. In Proceedings of the
961 1019
IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4273–4284.
962 [13] Zhijian Liu, Xinyu Yang, Haotian Tang, Shang Yang, and Song Han. 2023. Flat- 1020
963 Former: Flattened Window Attention for Efficient Point Cloud Transformer. In 1021
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
964 [14] Xiaoxu Meng, Weikai Chen, and Bo Yang. 2023. NeAT: Learning Neural Implicit 1022
965 Surfaces with Arbitrary Topologies from Multi-view Images. Proceedings of the 1023
IEEE/CVF Conference on Computer Vision and Pattern Recognition (June 2023).
966 1024
[15] Ben Mildenhall, Pratul P. Srinivasan, Rodrigo Ortiz-Cayon, Nima Khademi Kalan-
967 tari, Ravi Ramamoorthi, Ren Ng, and Abhishek Kar. 2019. Local Light Field 1025
968 Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines. ACM 1026
Transactions on Graphics (TOG) (2019).
969 1027
[16] Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi
970 Ramamoorthi, and Ren Ng. 2020. Nerf: Representing scenes as neural radiance 1028
971 fields for view synthesis. In European conference on computer vision. Springer, 1029
405–421.
972 [17] Charlie Nash, Yaroslav Ganin, SM Ali Eslami, and Peter Battaglia. 2020. Polygen: 1030
973 An autoregressive generative model of 3d meshes. In International Conference on 1031
Machine Learning. PMLR, 7220–7229.
974 1032
[18] Thu Nguyen-Phuoc, Feng Liu, and Lei Xiao. 2022. SNeRF: Stylized Neural Implicit
975 Representations for 3D Scenes. ACM Trans. Graph. 41, 4 (2022). 1033
976 [19] Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. 2017. Pointnet: Deep 1034
learning on point sets for 3d classification and segmentation. In Proceedings of
977 1035
the IEEE conference on computer vision and pattern recognition. 652–660.
978 [20] Ari Seff, Brian Cera, Dian Chen, Mason Ng, Aurick Zhou, Nigamaa Nayakanti, 1036
979 Khaled S. Refaat, Rami Al-Rfou, and Benjamin Sapp. 2023. MotionLM: Multi- 1037
Agent Motion Forecasting as Language Modeling. arXiv:2309.16534 [cs.CV]
980 [21] Qing Shuai, Chen Geng, Qi Fang, Sida Peng, Wenhao Shen, Xiaowei Zhou, and 1038
981 Hujun Bao. 2022. Novel View Synthesis of Human Interactions from Sparse 1039
Multi-View Videos. In ACM SIGGRAPH 2022 Conference Proceedings (Vancouver,
982 1040
BC, Canada) (SIGGRAPH ’22). Association for Computing Machinery, New York,
983 NY, USA. 1041
984 [22] Alex Trevithick and Bo Yang. 2021. Grf: Learning a general radiance field for 1042
3d representation and rendering. In Proceedings of the IEEE/CVF International
985 1043
986 1044

Nautical Science Syllabus
100% (1)
Nautical Science Syllabus
77 pages
Profession Through Vedic Astrology by DR S C Kursija
88% (8)
Profession Through Vedic Astrology by DR S C Kursija
22 pages
Editing Implicit and Explicit Representations of Radiance Fields: A Survey
No ratings yet
Editing Implicit and Explicit Representations of Radiance Fields: A Survey
34 pages
Sg-Nerf: Sparse-Input Generalized Neural Radiance Fields For Novel View Synthesis
No ratings yet
Sg-Nerf: Sparse-Input Generalized Neural Radiance Fields For Novel View Synthesis
13 pages
Deep Review and Analysis of Recent Nerfs: Original Paper
No ratings yet
Deep Review and Analysis of Recent Nerfs: Original Paper
32 pages
Shap E: Generating Conditional 3D Implicit Functions: Heewoo Jun Alex Nichol
No ratings yet
Shap E: Generating Conditional 3D Implicit Functions: Heewoo Jun Alex Nichol
23 pages
Matdecompsdf: High-Fidelity 3D Shape and PBR Material Decomposition From Multi-View Images
No ratings yet
Matdecompsdf: High-Fidelity 3D Shape and PBR Material Decomposition From Multi-View Images
12 pages
Efficient Geometry-Aware 3D Generative Adversarial Networks
No ratings yet
Efficient Geometry-Aware 3D Generative Adversarial Networks
27 pages
NeRF Seminar Report Part3
No ratings yet
NeRF Seminar Report Part3
13 pages
NeRF - Neural Radiance Field in 3D Vision, A Comprehensive Review
No ratings yet
NeRF - Neural Radiance Field in 3D Vision, A Comprehensive Review
28 pages
Beyondpixels: A Comprehensive Review of The Evolution of Neural Radiance Fields
No ratings yet
Beyondpixels: A Comprehensive Review of The Evolution of Neural Radiance Fields
33 pages
Nerf Weekly Report March
No ratings yet
Nerf Weekly Report March
13 pages
01 7ke Ei
No ratings yet
01 7ke Ei
42 pages
NeRF Seminar Report Part2
No ratings yet
NeRF Seminar Report Part2
9 pages
Nerf Lets
No ratings yet
Nerf Lets
13 pages
Nerf: Neural Radiance Field in 3D Vision, Introduction and Review
No ratings yet
Nerf: Neural Radiance Field in 3D Vision, Introduction and Review
26 pages
Ambient-NeRF Light Train Enhancing Neural Radiance Fields in Low-Light Conditions With Ambient-Illumination
No ratings yet
Ambient-NeRF Light Train Enhancing Neural Radiance Fields in Low-Light Conditions With Ambient-Illumination
17 pages
Embed Any NERF
No ratings yet
Embed Any NERF
13 pages
GRF: L G R F 3DS R R: Earning A Eneral Adiance Ield For Cene Epresentation and Endering
No ratings yet
GRF: L G R F 3DS R R: Earning A Eneral Adiance Ield For Cene Epresentation and Endering
28 pages
Accelerating NeRF With The Visual Hull
No ratings yet
Accelerating NeRF With The Visual Hull
15 pages
Point2Pix: Photo-Realistic Point Cloud Rendering Via Neural Radiance Fields
No ratings yet
Point2Pix: Photo-Realistic Point Cloud Rendering Via Neural Radiance Fields
10 pages
GIRAFFE Representing Scenes As Compositional Generative Neural Feature Fields - 2011.12100v2
No ratings yet
GIRAFFE Representing Scenes As Compositional Generative Neural Feature Fields - 2011.12100v2
12 pages
RobustNeRF - Ignoring Distractors With Robust Losses
No ratings yet
RobustNeRF - Ignoring Distractors With Robust Losses
20 pages
Gaussian Activated Neural Radiance Fields For
No ratings yet
Gaussian Activated Neural Radiance Fields For
17 pages
Xie 等 - 2023 - HollowNeRF Pruning Hashgrid-Based NeRFs with Trainable Collision Mitigation
No ratings yet
Xie 等 - 2023 - HollowNeRF Pruning Hashgrid-Based NeRFs with Trainable Collision Mitigation
11 pages
Mip-Nerf 360: Unbounded Anti-Aliased Neural Radiance Fields
No ratings yet
Mip-Nerf 360: Unbounded Anti-Aliased Neural Radiance Fields
18 pages
Nerfmeshing: Distilling Neural Radiance Fields Into Geometrically-Accurate 3D Meshes
No ratings yet
Nerfmeshing: Distilling Neural Radiance Fields Into Geometrically-Accurate 3D Meshes
11 pages
Plenoxel
No ratings yet
Plenoxel
21 pages
Rafe: Generative Radiance Fields Restoration
No ratings yet
Rafe: Generative Radiance Fields Restoration
23 pages
Neural Radiance Fields NeRFs A Review and Some Rec
No ratings yet
Neural Radiance Fields NeRFs A Review and Some Rec
5 pages
IMRaD Research Format
100% (1)
IMRaD Research Format
62 pages
Nerfstudio
No ratings yet
Nerfstudio
12 pages
Point-NeRF Point-Based Neural Radiance Fields
No ratings yet
Point-NeRF Point-Based Neural Radiance Fields
16 pages
Enhancing View Synthesis With Depth-Guided Neural Radiance Fields and Improved Depth Completion
No ratings yet
Enhancing View Synthesis With Depth-Guided Neural Radiance Fields and Improved Depth Completion
17 pages
NeRF in The Dark High Dynamic Range View Synthesis From Noisy Raw Images
No ratings yet
NeRF in The Dark High Dynamic Range View Synthesis From Noisy Raw Images
18 pages
3D Aware Synthesis Via Learning Textural and Structural Representations
No ratings yet
3D Aware Synthesis Via Learning Textural and Structural Representations
13 pages
NeRF-SR High-Quality Neural Radiance Fields Using Super-Sampling
No ratings yet
NeRF-SR High-Quality Neural Radiance Fields Using Super-Sampling
14 pages
A Generic and Flexible Regularization Framework For NeRFs
No ratings yet
A Generic and Flexible Regularization Framework For NeRFs
10 pages
MIP NeRF
No ratings yet
MIP NeRF
19 pages
NeRF Seminar Report Part1
No ratings yet
NeRF Seminar Report Part1
5 pages
DiffusioNeRF Regularizing Neural Radiance Fields With Denoising Diffusion Models CVPR 2023 Paper
No ratings yet
DiffusioNeRF Regularizing Neural Radiance Fields With Denoising Diffusion Models CVPR 2023 Paper
10 pages
Botswana 2019 General Elections REPORT
No ratings yet
Botswana 2019 General Elections REPORT
126 pages
Nerf Studio
No ratings yet
Nerf Studio
13 pages
4K-NeRF High Fidelity Neural Radiance Fields at Ultra High Resolutions
No ratings yet
4K-NeRF High Fidelity Neural Radiance Fields at Ultra High Resolutions
16 pages
Nerf RPN
No ratings yet
Nerf RPN
13 pages
Ne RF
No ratings yet
Ne RF
20 pages
Sitzmann 2020 CVPR
No ratings yet
Sitzmann 2020 CVPR
23 pages
Regnerf: Regularizing Neural Radiance Fields For View Synthesis From Sparse Inputs
No ratings yet
Regnerf: Regularizing Neural Radiance Fields For View Synthesis From Sparse Inputs
11 pages
A Multimodal Automated Interpretability Agent
No ratings yet
A Multimodal Automated Interpretability Agent
29 pages
NeRF-DA Neural Radiance Fields Deblurring With Active Learning
No ratings yet
NeRF-DA Neural Radiance Fields Deblurring With Active Learning
5 pages
Pixel Nerf
No ratings yet
Pixel Nerf
10 pages
Gradient Based Feature Attribution in Explainable AI
No ratings yet
Gradient Based Feature Attribution in Explainable AI
25 pages
NeRV - Pratul
No ratings yet
NeRV - Pratul
12 pages
From 2D To 3D: Leveraging Sparse Inputs For High-Fidelity Model Generation With Neural Radiance Fields
No ratings yet
From 2D To 3D: Leveraging Sparse Inputs For High-Fidelity Model Generation With Neural Radiance Fields
5 pages
Xu DisCoScene Spatially Disentangled Generative Radiance Fields For Controllable 3D-Aware Scene CVPR 2023 Paper
No ratings yet
Xu DisCoScene Spatially Disentangled Generative Radiance Fields For Controllable 3D-Aware Scene CVPR 2023 Paper
11 pages
Turki Mega-NERF Scalable Construction of Large-Scale NeRFs For Virtual Fly-Throughs CVPR 2022 Paper
No ratings yet
Turki Mega-NERF Scalable Construction of Large-Scale NeRFs For Virtual Fly-Throughs CVPR 2022 Paper
10 pages
Interpretability Needs A New Paradigm
No ratings yet
Interpretability Needs A New Paradigm
16 pages
Rendernet: A Deep Convolutional Network For Differentiable Rendering From 3D Shapes
No ratings yet
Rendernet: A Deep Convolutional Network For Differentiable Rendering From 3D Shapes
17 pages
ACM: NeRF: Representing Scenes As Neural Radiance Fields For View Synthesis
No ratings yet
ACM: NeRF: Representing Scenes As Neural Radiance Fields For View Synthesis
8 pages
Entire Kazakh Language Manual
100% (10)
Entire Kazakh Language Manual
249 pages
NARF22: Neural Articulated Radiance Fields For Configuration-Aware Rendering
No ratings yet
NARF22: Neural Articulated Radiance Fields For Configuration-Aware Rendering
8 pages
Volume GAN
No ratings yet
Volume GAN
12 pages
ICCV2021 - In-Place Scene Labelling and Understanding With Implicit Scene Representation
No ratings yet
ICCV2021 - In-Place Scene Labelling and Understanding With Implicit Scene Representation
10 pages
Icon10 - Icon1 Control Panel Owner's Manual
No ratings yet
Icon10 - Icon1 Control Panel Owner's Manual
92 pages
Stylesdf: High-Resolution 3D-Consistent Image and Geometry Generation
No ratings yet
Stylesdf: High-Resolution 3D-Consistent Image and Geometry Generation
17 pages
Depth-Supervised Nerf: Fewer Views and Faster Training For Free
No ratings yet
Depth-Supervised Nerf: Fewer Views and Faster Training For Free
13 pages
Assignment: Department of Textile Engineering
No ratings yet
Assignment: Department of Textile Engineering
16 pages
Nerf Paper IA 3D
No ratings yet
Nerf Paper IA 3D
8 pages
An Interval Type-3 Fuzzy System and A New Online Fractional-Order Learning Algorithm Theory and Practice
100% (1)
An Interval Type-3 Fuzzy System and A New Online Fractional-Order Learning Algorithm Theory and Practice
11 pages
Research 1
No ratings yet
Research 1
27 pages
Amazing Adventures 004 (1961) (Digital) (AnPymGold-Empire)
No ratings yet
Amazing Adventures 004 (1961) (Digital) (AnPymGold-Empire)
26 pages
柱
No ratings yet
柱
2 pages
A Novel Fractional Order Multiple Model
No ratings yet
A Novel Fractional Order Multiple Model
19 pages
Clinical SAS Terms
No ratings yet
Clinical SAS Terms
12 pages
9 Tle Module 5
No ratings yet
9 Tle Module 5
14 pages
TIKTOK1
No ratings yet
TIKTOK1
10 pages
The Lombards and Venetians in Euboia (1340-1470)
No ratings yet
The Lombards and Venetians in Euboia (1340-1470)
28 pages
A Theoretical Framework For AI Models
No ratings yet
A Theoretical Framework For AI Models
9 pages
Deep Learned Recurrent Type-3 Fuzzy System
No ratings yet
Deep Learned Recurrent Type-3 Fuzzy System
13 pages
Fuzzy Subsethood For Fuzzy Sets of Type-2 and Generalized Type - N
No ratings yet
Fuzzy Subsethood For Fuzzy Sets of Type-2 and Generalized Type - N
11 pages
MicroDAQ Execution Profiling
100% (1)
MicroDAQ Execution Profiling
15 pages
Brick Plan
No ratings yet
Brick Plan
15 pages
New Drugs For Multiple Sclerosis New Treatment.3
No ratings yet
New Drugs For Multiple Sclerosis New Treatment.3
9 pages
Sagat Workout PDF
No ratings yet
Sagat Workout PDF
7 pages
Tractor Transmission Performance
No ratings yet
Tractor Transmission Performance
9 pages
Legendary - Just A Second 5 Mind Magic (PF2)
No ratings yet
Legendary - Just A Second 5 Mind Magic (PF2)
4 pages
No. 5 of 19 Geosynthetics in Separation by Prof. Alan Mcgown University of Strathclyde
No ratings yet
No. 5 of 19 Geosynthetics in Separation by Prof. Alan Mcgown University of Strathclyde
17 pages
Chapter 4
No ratings yet
Chapter 4
5 pages
Sciencedirect Sciencedirect Sciencedirect
No ratings yet
Sciencedirect Sciencedirect Sciencedirect
8 pages
Tech 1 PDF
No ratings yet
Tech 1 PDF
5 pages
Pharmaceutical Engineering Set 1
No ratings yet
Pharmaceutical Engineering Set 1
5 pages
41 Paper Dragons Ideas in 2023 Dragon Puppet, Paper Puppets, Dragon Design
No ratings yet
41 Paper Dragons Ideas in 2023 Dragon Puppet, Paper Puppets, Dragon Design
1 page
Solar Switch Review
No ratings yet
Solar Switch Review
4 pages
Terminal Lubrication General Guidelines - Crimp Lubrication Guidelines - 20170516 - Legal Approved
No ratings yet
Terminal Lubrication General Guidelines - Crimp Lubrication Guidelines - 20170516 - Legal Approved
1 page
Nitoseal 280: Heavy Duty Epoxy Urethane Joint Sealant
No ratings yet
Nitoseal 280: Heavy Duty Epoxy Urethane Joint Sealant
2 pages
Cusps: Birth Chart Report
No ratings yet
Cusps: Birth Chart Report
1 page

3721 InNeRF Learning Interpret

Uploaded by

3721 InNeRF Learning Interpret

Uploaded by

1

929 REFERENCES Conference on Computer Vision. 15182–15192. 987

You might also like