0% found this document useful (0 votes)

13 views

BasicVSR++

Uploaded by

matin fazel

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

BasicVSR++

Uploaded by

matin fazel

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

BasicVSR++: Improving Video Super-Resolution

with Enhanced Propagation and Alignment

Kelvin C.K. Chan Shangchen Zhou Xiangyu Xu Chen Change Loy*

S-Lab, Nanyang Technological University
{chan0899, s200094, xiangyu.xu, ccloy}@ntu.edu.sg
arXiv:2104.13371v1 [cs.CV] 27 Apr 2021

BasicVSR++
BasicVSR BasicVSR++ BasicVSR BasicVSR++ (ours)
EDVR
(CVPRW19)
IconVSR
(CVPR21)
Bas ic VSR
(CVPR21) RBPN
(CVPR19)
spatial residual RSDN DUF
(ECCV20)
warping offsets (CVPR18)
PFNL
RLSP (ICCV19)
(ICCVW19)
#Params
Optical flow
DCN FRVSR
Features (CVPR18)
5M 10M 15M 20M

(a) Propagation (b) Alignment (c) Performance Gain

Figure 1: Improvements over BasicVSR [2]. (a) Second-order grid propagation in BasicVSR++ allows a more effective propagation of
features. (b) Flow-guided deformable alignment in BasicVSR++ provides a means for more robust feature alignment across misaligned
frames. (c) BasicVSR++ outperforms existing state of the arts while maintaining efficiency.

Abstract 1. Introduction
A recurrent structure is a popular framework choice Video super-resolution (VSR) is challenging in that one
for the task of video super-resolution. The state-of-the- needs to gather complementary information across mis-
art method BasicVSR adopts bidirectional propagation with aligned video frames for restoration. One prevalent ap-
feature alignment to effectively exploit information from the proach is the sliding-window framework [9, 32, 35, 38],
entire input video. In this study, we redesign BasicVSR where each frame in the video is restored using the frames
by proposing second-order grid propagation and flow- within a short temporal window. In contrast to the sliding-
guided deformable alignment. We show that by empower- window framework, a recurrent framework attempts to ex-
ing the recurrent framework with the enhanced propagation ploit the long-term dependencies by propagating the latent
and alignment, one can exploit spatiotemporal information features. In general, these methods [8, 10, 11, 12, 14, 27] al-
across misaligned video frames more effectively. The new low a more compact model compared to those in the sliding-
components lead to an improved performance under a sim- window framework. Nevertheless, the problems of trans-
ilar computational constraint. In particular, our model Ba- mitting long-term information and aligning features across
sicVSR++ surpasses BasicVSR by 0.82 dB in PSNR with frames in a recurrent model remain formidable.
similar number of parameters. In addition to video super-
A recent work by Chan et al. [2] studies the problems
resolution, BasicVSR++ generalizes well to other video
carefully. It summarizes the common VSR pipelines into
restoration tasks such as compressed video enhancement.
four components, namely Propagation, Alignment, Aggre-
In NTIRE 2021, BasicVSR++ obtains three champions and
gation, and Upsampling, and proposes BasicVSR. In Ba-
one runner-up in the Video Super-Resolution and Com-
sicVSR, bidirectional propagation is adopted to exploit in-
pressed Video Enhancement Challenges. Codes and models
formation from the entire input video for reconstruction.
will be released to MMEditing1 .
For alignment, optical flow is adopted for feature warp-
ing. BasicVSR serves as a succinct yet strong backbone
∗ Corresponding author where components can be easily added for performance
1 https://github.com/open-mmlab/mmediting gain. However, its rudimentary designs in propagation and

1
alignment limit the efficacy of information aggregation. As rent detail structural block and a hidden state adaptation
a result, the network often struggles to restore fine details, module to enhance the robustness to appearance change and
especially when dealing with occluded and complex re- error accumulation. Chan et al. [2] propose BasicVSR. The
gions. The shortcomings call for refined designs in prop- work demonstrates the importance of bidirectional propa-
agation and alignment. gation over unidirectional propagation to better exploit fea-
In this work, we redesign BasicVSR by devising second- tures temporally. In addition, the study also shows the ad-
order grid propagation and flow-guided deformable align- vantage of feature alignment in aligning highly relevant but
ment that allow information to be propagated and aggre- misaligned features. We refer readers to [2] for the detailed
gated more effectively: comparisons of these components against the more conven-
1) The proposed second-order grid propagation, as shown tional ways of performing propagation and alignment. In
in Fig. 1(a), addresses two limitations in BasicVSR: i) we our experiments, we focus on comparing with BasicVSR
allow more aggressive bidirectional propagation arranged since it is the state-of-the-art method for VSR.
in a grid-like manner, and ii) we relax the assumption of Grid Connections. Grid-like designs are seen in various
first-order Markov property in BasicVSR, and incorporate a vision tasks such as object detection [5, 30, 34], semantic
second-order connection [28] into the network so that infor- segmentation [7, 30, 34, 43], and frame interpolation [25].
mation can be aggregated from different spatiotemporal lo- In general, these designs decompose a given image/feature
cations. Both modifications ameliorate information flow in into multiple resolutions, and grids are adopted across res-
the network and improve robustness of the network against olutions to capture both fine and coarse information. Un-
occluded and fine regions. like aforementioned methods, BasicVSR++ does not adopt
2) BasicVSR shows advantages of using optical flow for a multi-scale design. Instead, the grid structure is designed
temporal alignment. However, optical flow is not robust for propagation across time in a bidirectional fashion. We
to occlusion. Inaccurate flow estimation could jeopardize link different frames with a grid connection to repeatedly
the restoration performance. Deformable alignment [32, refine the features, improving expressiveness.
33, 35] has demonstrated its superiority in VSR, but it is Higher-Order Propagation. Higher-order propagation has
difficult to train in practice [3]. To take advantage of de- been studied to improve gradient flow [16, 20, 28]. These
formable alignment while overcoming the training insta- methods demonstrate improvements in different tasks in-
bility, we propose flow-guided deformable alignment, as cluding classification [16] and language modeling [28].
shown in Fig. 1(b). In the proposed module, instead of However, these methods do not consider temporal align-
learning the DCN offsets directly [6, 42], we reduce the bur- ment, which is shown critical in the task of VSR [2].
den of offset learning by using optical flow field as base off- To allow temporal alignment in second-order propagation,
sets refined by flow field residue. The latter can be learned we incorporate alignment into our propagation scheme by
more stably than the original DCN offsets. extending our flow-guided deformable alignment to the
The two aforementioned components are novel and more second-order settings.
discussion can be found in the related work section. Bene- Deformable Alignment. Several works [32, 33, 35, 37]
fit from the more effective designs, BasicVSR++ can adopt employ deformable alignment. TDAN [32] performs align-
a more lightweight backbone than its counterparts. Conse- ment at the feature level using deformable convolution.
quently, BasicVSR++ surpasses existing state of the arts, EDVR [35] further proposes a Pyramid Cascading De-
including BasicVSR and IconVSR (the more elaborated formable (PCD) alignment with a multi-scale design. Re-
BasicVSR variant), by a large margin while maintaining cently, Chan et al. [3] analyze deformable alignment and
efficiency (Fig. 1(c)). In particular, when compared to show that the performance gain over flow-based alignment
its precedent BasicVSR, a gain of 0.82 dB in PSNR on comes from the offset diversity. Motivated by [3], we adopt
REDS4 [35] is obtained with similar numbers of param- deformable alignment but with a reformulation to overcome
eters. In addition, BasicVSR++ obtains three champi- the training instability [3]. Our flow-guided deformable
ons and one runner-up in the NTIRE 2021 Video Super- alignment is different from offset-fidelity loss [3]. The lat-
Resolution [29] and Compressed Video Enhancement [39] ter uses optical flow as a loss function during training. In
Challenges. contrast, we directly incorporate optical flow into our mod-
ule as base offsets, allowing a more explicit guidance, both
2. Related Work during training and inference.
Recurrent Networks. The recurrent framework is a pop- 3. Methodology
ular structure adopted in various video processing tasks
such as super-resolution [8, 10, 11, 12, 14, 27], deblur- BasicVSR++ consists of two effective modifications
ring [24, 41], and frame interpolation [36]. For instance, for improving propagation and alignment. As shown in
RSDN [12] adopts unidirectional propagation with a recur- Fig. 2, given an input video, residual blocks are first ap-
xi
<latexit sha1_base64="mTT3im4iVdIisQa8mLsyYVd/ulE=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LC2CF0uigh4LXjxWsB/QhrLZTtqlm03Y3Ygl9Ed48aCIV3+PN/+N2zYHbX0w8Hhvhpl5QSK4Nq777RTW1jc2t4rbpZ3dvf2D8uFRS8epYthksYhVJ6AaBZfYNNwI7CQKaRQIbAfj25nffkSleSwfzCRBP6JDyUPOqLFS+6mf8fPLab9cdWvuHGSVeDmpQo5Gv/zVG8QsjVAaJqjWXc9NjJ9RZTgTOC31Uo0JZWM6xK6lkkao/Wx+7pScWmVAwljZkobM1d8TGY20nkSB7YyoGellbyb+53VTE974GZdJalCyxaIwFcTEZPY7GXCFzIiJJZQpbm8lbEQVZcYmVLIheMsvr5LWRc1za979VbVeyeMowglU4Aw8uIY63EEDmsBgDM/wCm9O4rw4787HorXg5DPH8AfO5w/7Ko84</latexit>
3 xi
<latexit sha1_base64="RqXt+LLPBj9PRmxN5c6DObPGAlU=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LC2CF0tSBD0WvHisYD+gDWWznbRLN5uwuxFL6I/w4kERr/4eb/4bt20O2vpg4PHeDDPzgkRwbVz32ylsbG5t7xR3S3v7B4dH5eOTto5TxbDFYhGrbkA1Ci6xZbgR2E0U0igQ2Akmt3O/84hK81g+mGmCfkRHkoecUWOlztMg45f12aBcdWvuAmSdeDmpQo7moPzVH8YsjVAaJqjWPc9NjJ9RZTgTOCv1U40JZRM6wp6lkkao/Wxx7oycW2VIwljZkoYs1N8TGY20nkaB7YyoGetVby7+5/VSE974GZdJalCy5aIwFcTEZP47GXKFzIipJZQpbm8lbEwVZcYmVLIheKsvr5N2vea5Ne/+qtqo5HEU4QwqcAEeXEMD7qAJLWAwgWd4hTcncV6cd+dj2Vpw8plT+APn8wf5pY83</latexit>
2 xi
<latexit sha1_base64="8ZgKKml4S7NkJ5L1k2CZDgexDro=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LC2CF0siBT0WvHisYD+gDWWz3bRLN5uwOxFL6I/w4kERr/4eb/4bt20O2vpg4PHeDDPzgkQKg6777RQ2Nre2d4q7pb39g8Oj8vFJ28SpZrzFYhnrbkANl0LxFgqUvJtoTqNA8k4wuZ37nUeujYjVA04T7kd0pEQoGEUrdZ4Gmbj0ZoNy1a25C5B14uWkCjmag/JXfxizNOIKmaTG9Dw3QT+jGgWTfFbqp4YnlE3oiPcsVTTixs8W587IuVWGJIy1LYVkof6eyGhkzDQKbGdEcWxWvbn4n9dLMbzxM6GSFLliy0VhKgnGZP47GQrNGcqpJZRpYW8lbEw1ZWgTKtkQvNWX10n7qua5Ne++Xm1U8jiKcAYVuAAPrqEBd9CEFjCYwDO8wpuTOC/Ou/OxbC04+cwp/IHz+QP4II82</latexit>
1 <latexit sha1_base64="54yGSiK/xIYFJ2p37yRd1ZH/AGE=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LC2Cp5KIUI8FLx4r2g9oQ9lsJ+3SzSbsbsQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJzdzvPKLSPJYPZpqgH9GR5CFn1Fjp/mnAB+WqW3MXIOvEy0kVcjQH5a/+MGZphNIwQbXueW5i/Iwqw5nAWamfakwom9AR9iyVNELtZ4tTZ+TcKkMSxsqWNGSh/p7IaKT1NApsZ0TNWK96c/E/r5ea8NrPuExSg5ItF4WpICYm87/JkCtkRkwtoUxxeythY6ooMzadkg3BW315nbQva55b8+6uqo1KHkcRzqACF+BBHRpwC01oAYMRPMMrvDnCeXHenY9la8HJZ07hD5zPH1bwjbg=</latexit>
xi

fij 1

fij
<latexit sha1_base64="Fuf2p1Br1GNsVk9s5hviwCx8zKQ=">AAAB8nicbVBNS8NAEJ3Ur1q/qh69LC2CF0sigh4LXjxWsB+QxrLZbtq1m92wuxFKyM/w4kERr/4ab/4bt20O2vpg4PHeDDPzwoQzbVz32ymtrW9sbpW3Kzu7e/sH1cOjjpapIrRNJJeqF2JNORO0bZjhtJcoiuOQ0244uZn53SeqNJPi3kwTGsR4JFjECDZW8qOH7PHcywcZywfVuttw50CrxCtIHQq0BtWv/lCSNKbCEI619j03MUGGlWGE07zSTzVNMJngEfUtFTimOsjmJ+fo1CpDFEllSxg0V39PZDjWehqHtjPGZqyXvZn4n+enJroOMiaS1FBBFouilCMj0ex/NGSKEsOnlmCimL0VkTFWmBibUsWG4C2/vEo6Fw3PbXh3l/VmrYijDCdQgzPw4AqacAstaAMBCc/wCm+OcV6cd+dj0Vpyiplj+APn8wcj/pEM</latexit>

2
Flow-Guided
<latexit sha1_base64="PhyMOosCSgzABmmnrKKaU5q/v9g=">AAAB8nicbVBNS8NAEJ34WetX1aOXpUXwYkmKoMeCF48V7AeksWy2m3btZjfsboQS8jO8eFDEq7/Gm//GbZuDtj4YeLw3w8y8MOFMG9f9dtbWNza3tks75d29/YPDytFxR8tUEdomkkvVC7GmnAnaNsxw2ksUxXHIaTec3Mz87hNVmklxb6YJDWI8EixiBBsr+dFD9pgPMnbRyAeVmlt350CrxCtIDQq0BpWv/lCSNKbCEI619j03MUGGlWGE07zcTzVNMJngEfUtFTimOsjmJ+fozCpDFEllSxg0V39PZDjWehqHtjPGZqyXvZn4n+enJroOMiaS1FBBFouilCMj0ex/NGSKEsOnlmCimL0VkTFWmBibUtmG4C2/vEo6jbrn1r27y1qzWsRRglOowjl4cAVNuIUWtIGAhGd4hTfHOC/Ou/OxaF1zipkT+APn8wcni5EN</latexit>

<latexit sha1_base64="nu90nFi8V6ibIGHIAugHUrT4b7o=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LC2Cp5KIoMeCF48V7Qe0oWy2k3TpZhN2N0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAqujet+O6WNza3tnfJuZW//4PCoenzS0UmmGLZZIhLVC6hGwSW2DTcCe6lCGgcCu8Hkdu53n1BpnshHM03Rj2kkecgZNVZ6iIZ8WK27DXcBsk68gtShQGtY/RqMEpbFKA0TVOu+56bGz6kynAmcVQaZxpSyCY2wb6mkMWo/X5w6I+dWGZEwUbakIQv190ROY62ncWA7Y2rGetWbi/95/cyEN37OZZoZlGy5KMwEMQmZ/01GXCEzYmoJZYrbWwkbU0WZselUbAje6svrpHPZ8NyGd39Vb9aKOMpwBjW4AA+uoQl30II2MIjgGV7hzRHOi/PufCxbS04xcwp/4Hz+AD0Kjac=</latexit>
gi Deformable
Alignment
fij 1
fˆij
<latexit sha1_base64="JJHM/fmlMoxSmcfkWUZivgkAAYo=">AAAB8HicbVBNSwMxEJ31s9avqkcvoUXwYtmIoMeCF48V7Ie0a8mm2TY2yS5JVihLf4UXD4p49ed489+YtnvQ1gcDj/dmmJkXJoIb6/vf3srq2vrGZmGruL2zu7dfOjhsmjjVlDVoLGLdDolhgivWsNwK1k40IzIUrBWOrqd+64lpw2N1Z8cJCyQZKB5xSqyT7qOHx17Gz/CkV6r4VX8GtExwTiqQo94rfXX7MU0lU5YKYkwH+4kNMqItp4JNit3UsITQERmwjqOKSGaCbHbwBJ04pY+iWLtSFs3U3xMZkcaMZeg6JbFDs+hNxf+8TmqjqyDjKkktU3S+KEoFsjGafo/6XDNqxdgRQjV3tyI6JJpQ6zIquhDw4svLpHlexX4V315UauU8jgIcQxlOAcMl1OAG6tAAChKe4RXePO29eO/ex7x1xctnjuAPvM8fW2aQAA==</latexit>

<latexit sha1_base64="UTLMbkardjZz4SR4t16SPk4ukzg=">AAAB8nicbVBNS8NAEJ3Ur1q/qh69LC2CF0sigh4LXjxWsB+QxrLZbtq1m92wuxFKyM/w4kERr/4ab/4bt20O2vpg4PHeDDPzwoQzbVz32ymtrW9sbpW3Kzu7e/sH1cOjjpapIrRNJJeqF2JNORO0bZjhtJcoiuOQ0244uZn53SeqNJPi3kwTGsR4JFjECDZW8qNBxvKH7PHcywfVuttw50CrxCtIHQq0BtWv/lCSNKbCEI619j03MUGGlWGE07zSTzVNMJngEfUtFTimOsjmJ+fo1CpDFEllSxg0V39PZDjWehqHtjPGZqyXvZn4n+enJroOMiaS1FBBFouilCMj0ex/NGSKEsOnlmCimL0VkTFWmBibUsWG4C2/vEo6Fw3PbXh3l/VmrYijDCdQgzPw4AqacAstaAMBCc/wCm+OcV6cd+dj0Vpyiplj+APn8wcmBpEM</latexit>
fij 1 <latexit sha1_base64="EWBmEfdUSjpq3vxbfxzChQ0QnJY=">AAAB8nicbVBNS8NAEJ34WetX1aOXpUXwVBIR9Fjw4rGC/YA0ls12067dbMLuRCihP8OLB0W8+mu8+W/ctjlo64OBx3szzMwLUykMuu63s7a+sbm1Xdop7+7tHxxWjo7bJsk04y2WyER3Q2q4FIq3UKDk3VRzGoeSd8LxzczvPHFtRKLucZLyIKZDJSLBKFrJ740o5tG0Lx4e+5WaW3fnIKvEK0gNCjT7la/eIGFZzBUySY3xPTfFIKcaBZN8Wu5lhqeUjemQ+5YqGnMT5POTp+TMKgMSJdqWQjJXf0/kNDZmEoe2M6Y4MsveTPzP8zOMroNcqDRDrthiUZRJggmZ/U8GQnOGcmIJZVrYWwkbUU0Z2pTKNgRv+eVV0r6oe27du7usNapFHCU4hSqcgwdX0IBbaEILGCTwDK/w5qDz4rw7H4vWNaeYOYE/cD5/AIukkU8=</latexit>

<latexit sha1_base64="GxrgUZMJwL7NX9/9XGvrMWWoCUw=">AAAB8HicbVBNS8NAEJ3Ur1q/qh69LC2Cp5KIoMeCF48VTFtpY9lsN+3a3U3Y3Qgl5Fd48aCIV3+ON/+N2zYHbX0w8Hhvhpl5YcKZNq777ZTW1jc2t8rblZ3dvf2D6uFRW8epItQnMY9VN8Saciapb5jhtJsoikXIaSecXM/8zhNVmsXyzkwTGgg8kixiBBsr3UeDjOUP2WM+qNbdhjsHWiVeQepQoDWofvWHMUkFlYZwrHXPcxMTZFgZRjjNK/1U0wSTCR7RnqUSC6qDbH5wjk6tMkRRrGxJg+bq74kMC62nIrSdApuxXvZm4n9eLzXRVZAxmaSGSrJYFKUcmRjNvkdDpigxfGoJJorZWxEZY4WJsRlVbAje8surpH3e8NyGd3tRb9aKOMpwAjU4Aw8uoQk30AIfCAh4hld4c5Tz4rw7H4vWklPMHMMfOJ8/RjiQmg==</latexit>
fij
fij

…
<latexit sha1_base64="W3nA+5zpDHgtJ4x9TfX5AVE8k3Y=">AAAB7HicbVBNS8NAEJ34WetX1aOXpUXwVBIR9Fjw4rGCaQttLJvtpF272YTdjVBCf4MXD4p49Qd589+4bXPQ1gcDj/dmmJkXpoJr47rfztr6xubWdmmnvLu3f3BYOTpu6SRTDH2WiER1QqpRcIm+4UZgJ1VI41BgOxzfzPz2EyrNE3lvJikGMR1KHnFGjZX8qM8fHvuVmlt35yCrxCtIDQo0+5Wv3iBhWYzSMEG17npuaoKcKsOZwGm5l2lMKRvTIXYtlTRGHeTzY6fkzCoDEiXKljRkrv6eyGms9SQObWdMzUgvezPxP6+bmeg6yLlMM4OSLRZFmSAmIbPPyYArZEZMLKFMcXsrYSOqKDM2n7INwVt+eZW0LuqeW/fuLmuNahFHCU6hCufgwRU04Baa4AMDDs/wCm+OdF6cd+dj0brmFDMn8AfO5w+3P46C</latexit>

fij+1
fij
<latexit sha1_base64="kMQJrez4q00768lapNe/aOnvZkI=">AAAB8nicbVBNS8NAEJ3Ur1q/qh69LC2CIJREBD0WvHisYD8gjWWz3bRrN7thdyOUkJ/hxYMiXv013vw3btsctPXBwOO9GWbmhQln2rjut1NaW9/Y3CpvV3Z29/YPqodHHS1TRWibSC5VL8SaciZo2zDDaS9RFMchp91wcjPzu09UaSbFvZkmNIjxSLCIEWys5EeDjOUP2eO5lw+qdbfhzoFWiVeQOhRoDapf/aEkaUyFIRxr7XtuYoIMK8MIp3mln2qaYDLBI+pbKnBMdZDNT87RqVWGKJLKljBorv6eyHCs9TQObWeMzVgvezPxP89PTXQdZEwkqaGCLBZFKUdGotn/aMgUJYZPLcFEMXsrImOsMDE2pYoNwVt+eZV0Lhqe2/DuLuvNWhFHGU6gBmfgwRU04RZa0AYCEp7hFd4c47w4787HorXkFDPH8AfO5w8i+pEK</latexit>

<latexit sha1_base64="W3nA+5zpDHgtJ4x9TfX5AVE8k3Y=">AAAB7HicbVBNS8NAEJ34WetX1aOXpUXwVBIR9Fjw4rGCaQttLJvtpF272YTdjVBCf4MXD4p49Qd589+4bXPQ1gcDj/dmmJkXpoJr47rfztr6xubWdmmnvLu3f3BYOTpu6SRTDH2WiER1QqpRcIm+4UZgJ1VI41BgOxzfzPz2EyrNE3lvJikGMR1KHnFGjZX8qM8fHvuVmlt35yCrxCtIDQo0+5Wv3iBhWYzSMEG17npuaoKcKsOZwGm5l2lMKRvTIXYtlTRGHeTzY6fkzCoDEiXKljRkrv6eyGms9SQObWdMzUgvezPxP6+bmeg6yLlMM4OSLRZFmSAmIbPPyYArZEZMLKFMcXsrYSOqKDM2n7INwVt+eZW0LuqeW/fuLmuNahFHCU6hCufgwRU04Baa4AMDDs/wCm+OdF6cd+dj0brmFDMn8AfO5w+3P46C</latexit>

Residual Blocks
Pixel-Shuffle
"
<latexit sha1_base64="cvjpFk5pfdpVYDnpezH4JYajms4=">AAAB8HicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RTKIkS5idzCZD5rHMzCphyVd48aCIVz/Hm3/jbLIHTSxoKKq66e6KEs6M9f1vr7Syura+Ud6sbG3v7O5V9w/aRqWa0BZRXOn7CBvKmaQtyyyn94mmWEScdqLxde53Hqk2TMk7O0loKPBQspgRbJ300EsTrLV6qvSrNb/uz4CWSVCQGhRo9qtfvYEiqaDSEo6N6QZ+YsMMa8sIp9NKLzU0wWSMh7TrqMSCmjCbHTxFJ04ZoFhpV9Kimfp7IsPCmImIXKfAdmQWvVz8z+umNr4MMyaT1FJJ5ovilCOrUP49GjBNieUTRzDRzN2KyAhrTKzLKA8hWHx5mbTP6oFfD27Pa42rIo4yHMExnEIAF9CAG2hCCwgIeIZXePO09+K9ex/z1pJXzBzCH3ifP6XukEo=</latexit>

<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
Grid Propagation
Second-Order Propagation
"
<latexit sha1_base64="cvjpFk5pfdpVYDnpezH4JYajms4=">AAAB8HicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RTKIkS5idzCZD5rHMzCphyVd48aCIVz/Hm3/jbLIHTSxoKKq66e6KEs6M9f1vr7Syura+Ud6sbG3v7O5V9w/aRqWa0BZRXOn7CBvKmaQtyyyn94mmWEScdqLxde53Hqk2TMk7O0loKPBQspgRbJ300EsTrLV6qvSrNb/uz4CWSVCQGhRo9qtfvYEiqaDSEo6N6QZ+YsMMa8sIp9NKLzU0wWSMh7TrqMSCmjCbHTxFJ04ZoFhpV9Kimfp7IsPCmImIXKfAdmQWvVz8z+umNr4MMyaT1FJJ5ovilCOrUP49GjBNieUTRzDRzN2KyAhrTKzLKA8hWHx5mbTP6oFfD27Pa42rIo4yHMExnEIAF9CAG2hCCwgIeIZXePO09+K9ex/z1pJXzBzCH3ifP6XukEo=</latexit>
Bilinear Upsampling
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
Elementwise Addition
C Channel-wise Concatenation

Figure 2: An Overview of BasicVSR++. BasicVSR++ consists of two modifications to improve propagation and alignment. For propaga-
tion, we introduce second-order propagation (blue solid lines) to refine features bidirectionally. In addition, second-order connection (red
dotted lines) is adopted to improve the robustness of propagation. Within each propagation branch, flow-guided deformable alignment is
proposed to increase the offset diversity while overcoming the offset overflow problem.

plied to extract features from each frame. The features are To further enhance the robustness of propagation, we re-
then propagated under our second-order grid propagation lax the assumption of first-order Markov property in Ba-
scheme, where alignment is performed by our flow-guided sicVSR and adopt a second-order connection, realizing a
deformable alignment. After propagation, the aggregated second-order Markov chain. With this relaxation, informa-
features are used to generate the output image through con- tion can be aggregated from different spatiotemporal loca-
volution and pixel-shuffling. tions, improving robustness and effectiveness in occluded
and fine regions.
3.1. Second-Order Grid Propagation
Integrating the above two components, we devise our
Most existing methods adopt unidirectional propaga- second-order grid propagation as follows. Let xi be the in-
tion [12, 14, 27]. Several works [2, 10, 11] adopt bidi- put image, gi be the feature extracted from xi by multiple
rectional propagation for exploiting the information avail- residual blocks, and fij be the feature computed at the i-
able in the video sequence. In particular, IconVSR [2] con- th timestep in the j-th propagation branch. In this section,
sists of a coupled propagation scheme with sequentially- we describe the procedure for forward propagation, and the
connected branches to facilitate information exchange. procedure for backward propagation is defined similarly.
Motivated by the effectiveness of the bidirectional prop- To compute the feature fij , we first align fi−1
j j
and fi−2
agation, we devise a grid propagation scheme to enable re- (following the second-order Markov chain) using our pro-
peated refinement through propagation. More specifically, posed flow-guided deformable alignment, which will be
the intermediate features are propagated backward and for- discussed in the next section:
ward in time in an alternating manner. Through propaga-
tion, the information from different frames can be “revis-

fˆij = A gi , fi−1
j j
, fi−2 , si→i−1 , si→i−2 , (1)
ited” and adopted for feature refinement. Compared to ex-
isting works that propagate features only once, grid prop-
agation repeatedly extracts information from the entire se- where si→i−1 , si→i−2 denote the optical flows from i-th
quence, improving feature expressiveness. frame to the (i−1)-th and (i−2)-th frames, respectively,
fi 1 si!i 1 gi frame, we first warp fi−1 with si→i−1 :
warping
<latexit sha1_base64="nu90nFi8V6ibIGHIAugHUrT4b7o=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LC2Cp5KIoMeCF48V7Qe0oWy2k3TpZhN2N0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAqujet+O6WNza3tnfJuZW//4PCoenzS0UmmGLZZIhLVC6hGwSW2DTcCe6lCGgcCu8Hkdu53n1BpnshHM03Rj2kkecgZNVZ6iIZ8WK27DXcBsk68gtShQGtY/RqMEpbFKA0TVOu+56bGz6kynAmcVQaZxpSyCY2wb6mkMWo/X5w6I+dWGZEwUbakIQv190ROY62ncWA7Y2rGetWbi/95/cyEN37OZZoZlGy5KMwEMQmZ/01GXCEzYmoJZYrbWwkbU0WZselUbAje6svrpHPZ8NyGd39Vb9aKOMpwBjW4AA+uoQl30II2MIjgGV7hzRHOi/PufCxbS04xcwp/4Hz+AD0Kjac=</latexit>

<latexit sha1_base64="0UDxlsZGvYv5vbHWTTmF5zkzsVg=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LC2CF0sigh4LXjxWsB/QhrLZTtqlm03Y3Qgl5Ed48aCIV3+PN/+N2zYHbX0w8Hhvhpl5QSK4Nq777ZQ2Nre2d8q7lb39g8Oj6vFJR8epYthmsYhVL6AaBZfYNtwI7CUKaRQI7AbTu7nffUKleSwfzSxBP6JjyUPOqLFSNxxm/NLLh9W623AXIOvEK0gdCrSG1a/BKGZphNIwQbXue25i/Iwqw5nAvDJINSaUTekY+5ZKGqH2s8W5OTm3yoiEsbIlDVmovycyGmk9iwLbGVEz0aveXPzP66cmvPUzLpPUoGTLRWEqiInJ/Hcy4gqZETNLKFPc3krYhCrKjE2oYkPwVl9eJ52rhuc2vIfrerNWxFGGM6jBBXhwA024hxa0gcEUnuEV3pzEeXHenY9la8kpZk7hD5zPH9xsjyQ=</latexit>

<latexit sha1_base64="ADHMVwR/ItooMHdI5kXYgI1Jrx4=">AAAB/XicbVDLSgNBEOyNrxhf6+PmZUgQvBh2RdBjwIvHCOYBybLMTibJkNmZZWZWiUvwV7x4UMSr/+HNv3GS7EETCxqKqm66u6KEM20879sprKyurW8UN0tb2zu7e+7+QVPLVBHaIJJL1Y6wppwJ2jDMcNpOFMVxxGkrGl1P/dY9VZpJcWfGCQ1iPBCszwg2VgrdIx1mrKvYYGiwUvIBsTN/EroVr+rNgJaJn5MK5KiH7le3J0kaU2EIx1p3fC8xQYaVYYTTSambappgMsID2rFU4JjqIJtdP0EnVumhvlS2hEEz9fdEhmOtx3FkO2NshnrRm4r/eZ3U9K+CjIkkNVSQ+aJ+ypGRaBoF6jFFieFjSzBRzN6KyBArTIwNrGRD8BdfXibN86rvVf3bi0qtnMdRhGMowyn4cAk1uIE6NIDAIzzDK7w5T86L8+58zFsLTj5zCH/gfP4Aaa2VEg==</latexit>

previous optical LR
feature flow feature
f¯i−1 = W(fi−1 , si→i−1 ), (3)
where W denotes the spatial warping operation. The pre-
warped aligned features are then used to compute the DCN offsets
f¯i C
<latexit sha1_base64="DEfcxByx/STKO8B0obxT3mNnnhg=">AAAB8HicbVDLSgNBEOyNrxhfUY9ehgTBU9gVQY8BLx4jmIckIcxOZpMhM7PLTK8QlnyFFw+KePVzvPk3TpI9aGJBQ1HVTXdXmEhh0fe/vcLG5tb2TnG3tLd/cHhUPj5p2Tg1jDdZLGPTCanlUmjeRIGSdxLDqQolb4eT27nffuLGilg/4DThfUVHWkSCUXTSYy+kJotmAzEoV/2avwBZJ0FOqpCjMSh/9YYxSxXXyCS1thv4CfYzalAwyWelXmp5QtmEjnjXUU0Vt/1scfCMnDtlSKLYuNJIFurviYwqa6cqdJ2K4tiuenPxP6+bYnTTz4ROUuSaLRdFqSQYk/n3ZCgMZyinjlBmhLuVsDE1lKHLqORCCFZfXiety1rg14L7q2q9ksdRhDOowAUEcA11uIMGNIGBgmd4hTfPeC/eu/exbC14+cwp/IH3+QP+TJBr</latexit>

feature oi→i−1 and modulation masks mi→i−1 . Instead of directly

computing the DCN offsets, we compute the residue to the
Conv 𝐶 ! Conv 𝐶 "
optical flow:
oi→i−1 = si→i−1 + C o c(gi , f¯i−1 ) ,

residual
offsets (4)
mi→i−1 = σ C m c(gi , f¯i−1 ) .
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>

DCN DCN Here C {o,m} denotes a stack of convolutions, and σ denotes

DCN block offsets masks the sigmoid function. A DCN is then applied to the un-
fˆi <latexit sha1_base64="WQlshAYPbMCBkrxqsYpzpF1ZEzc=">AAAB/XicbVDLSgNBEOyNrxhf6+PmZUgQvBh2RdBjwIvHCOYBybLMTibJkNmZZWZWiUvwV7x4UMSr/+HNv3GS7EETCxqKqm66u6KEM20879sprKyurW8UN0tb2zu7e+7+QVPLVBHaIJJL1Y6wppwJ2jDMcNpOFMVxxGkrGl1P/dY9VZpJcWfGCQ1iPBCszwg2VgrdIxlmrKvYYGiwUvIBsTN/EroVr+rNgJaJn5MK5KiH7le3J0kaU2EIx1p3fC8xQYaVYYTTSambappgMsID2rFU4JjqIJtdP0EnVumhvlS2hEEz9fdEhmOtx3FkO2NshnrRm4r/eZ3U9K+CjIkkNVSQ+aJ+ypGRaBoF6jFFieFjSzBRzN6KyBArTIwNrGRD8BdfXibN86rvVf3bi0qtnMdRhGMowyn4cAk1uIE6NIDAIzzDK7w5T86L8+58zFsLTj5zCH/gfP4AY1GVDg==</latexit>
oi!i 1 <latexit sha1_base64="s1LgKoDj2pdiYRUos/jo99y3jZI=">AAAB/XicbVDLSgNBEOyNrxhf6+PmZUgQvBh2RdBjwIvHCOYBybLMTibJkJnZZWZWiUvwV7x4UMSr/+HNv3GS7EETCxqKqm66u6KEM20879sprKyurW8UN0tb2zu7e+7+QVPHqSK0QWIeq3aENeVM0oZhhtN2oigWEaetaHQ99Vv3VGkWyzszTmgg8ECyPiPYWCl0j0SYsa5ig6HBSsUPiJ35k9CteFVvBrRM/JxUIEc9dL+6vZikgkpDONa643uJCTKsDCOcTkrdVNMEkxEe0I6lEguqg2x2/QSdWKWH+rGyJQ2aqb8nMiy0HovIdgpshnrRm4r/eZ3U9K+CjMkkNVSS+aJ+ypGJ0TQK1GOKEsPHlmCimL0VkSFWmBgbWMmG4C++vEya51Xfq/q3F5VaOY+jCMdQhlPw4RJqcAN1aACBR3iGV3hznpwX5935mLcWnHzmEP7A+fwBYCOVDA==</latexit>
mi!i 1 warped feature fi−1 :
<latexit sha1_base64="9HPUILtyD4OFEg10zmkYKn9qtvQ=">AAAB8HicbVDLSgNBEOyNrxhfUY9ehgTBU9gVQY8BLx4jmIckIcxOZpMhM7PLTK8QlnyFFw+KePVzvPk3TpI9aGJBQ1HVTXdXmEhh0fe/vcLG5tb2TnG3tLd/cHhUPj5p2Tg1jDdZLGPTCanlUmjeRIGSdxLDqQolb4eT27nffuLGilg/4DThfUVHWkSCUXTSY29MMYtmAzEoV/2avwBZJ0FOqpCjMSh/9YYxSxXXyCS1thv4CfYzalAwyWelXmp5QtmEjnjXUU0Vt/1scfCMnDtlSKLYuNJIFurviYwqa6cqdJ2K4tiuenPxP6+bYnTTz4ROUuSaLRdFqSQYk/n3ZCgMZyinjlBmhLuVsDE1lKHLqORCCFZfXiety1rg14L7q2q9ksdRhDOowAUEcA11uIMGNIGBgmd4hTfPeC/eu/exbC14+cwp/IH3+QMKr5Bz</latexit>

fˆi = D (fi−1 ; oi→i−1 , mi→i−1 ) , (5)

Figure 3: Flow-guided deformable alignment. Optical flow is
used to pre-align the features. The aligned features are then con- where D denotes a deformable convolution.
catenated to produce to DCN offsets (residue to optical flow). A The above formulation is designed only for aligning one
DCN is then applied to the unwarped features. Only first-order single feature, and hence is not directly applicable to our
connections are drawn, the second-order connections are omitted second-order propagation. The most intuitive way to adapt
for simplicity. to the second-order settings is to apply the above procedure
j j
to the two features, fi−1 and fi−2 , independently. How-
and A represents flow-guided deformable alignment2 . The ever, this requires doubled computations, resulting in re-
features are then concatenated and passed into a stack of duced efficiency. Furthermore, separate alignment poten-
residual blocks: tially ignores the complementary information from the fea-
tures. Therefore, we allow alignment of two features simul-
fij = fîj + R c fij−1 , fîj , (2) taneously. More specifically, we concatenate the warped
features and flows to compute the offsets oi−p (p=1, 2):
where fi0 = gi , R denotes the residual blocks, and c denotes
oi→i−p = si→i−p + C o c(gi , f¯i−1 , f¯i−2 ) ,

concatenation along channel dimension.
(6)
mi→i−p = σ C m c(gi , f¯i−1 , f¯i−2 ) .

3.2. Flow-Guided Deformable Alignment
Deformable alignment [33, 35] has demonstrated sig- A DCN is then applied to the unwarped features:
nificant improvements over flow-based alignment [9, 38] oi = c(oi→i−1 , oi→i−2 ),
thanks to the offset diversity [3] intrinsically introduced
mi = c(mi→i−1 , mi→i−2 ), (7)
in deformable convolution (DCN) [6, 42]. However, de-
formable alignment module can be difficult to train [3]. The fî = D (c(fi−1 , fi−2 ); oi , mi ) .
training instability often results in offset overflow, deterio-
More details of the second-order flow-guided deformable
rating the final performance.
alignment are provided in the supplementary material.
To take advantage of the offset diversity while overcom-
Discussion. Unlike existing methods [32, 33, 35, 37]
ing the instability, we propose to employ optical flow to
that directly compute the DCN offsets, our proposed flow-
guide deformable alignment, motivated by the strong rela-
guided deformable alignment adopts optical flow as guid-
tion between deformable alignment and flow-based align-
ance. The benefits are two-fold. First, since CNNs are
ment [3]. The graphical illustration is shown in Fig. 3. In
known to have local receptive fields, the learning of off-
the rest of this section, we detail the alignment procedure
sets can be assisted by pre-aligning the features using op-
for forward propagation. The procedure for backward prop-
tical flow. Second, by learning only the residue, the net-
agation is defined similarly. The superscript j is omitted for
work needs to learn only small deviations from the optical
notational simplicity.
flow, reducing the burden in typical deformable alignment
At the i-th timestep, given the feature gi computed from
modules. In addition, instead of directly concatenating the
the i-th LR image, the feature fi−1 computed for the pre-
warped feature, the modulation masks in DCN act as at-
vious timestep, and the optical flow si→i−1 to the previous
tention maps to weigh the contributions of different pixels,
2s providing additional flexibility.
0→−1 =s0→−2 =s1→−1 =f−1 =f−2 =0.
Table 1: Quantitative comparison (PSNR/SSIM). All results are calculated on Y-channel except REDS4 [23] (RGB-channel). Red and
blue colors indicate the best and the second-best performance, respectively. The runtime is computed on an LR size of 180×320. A 4×
upsampling is performed following previous studies. Blanked entries correspond to results not reported in previous works.

BI degradation BD degradation
Params (M) Runtime (ms) REDS4 [23] Vimeo-90K-T [38] Vid4 [21] UDM10 [40] Vimeo-90K-T [38] Vid4 [21]
Bicubic - - 26.14/0.7292 31.32/0.8684 23.78/0.6347 28.47/0.8253 31.30/0.8687 21.80/0.5246
VESPCN [1] - - - - 25.35/0.7557 - - -
SPMC [31] - - - - 25.88/0.7752 - - -
TOFlow [38] - - 27.98/0.7990 33.08/0.9054 25.89/0.7651 36.26/0.9438 34.62/0.9212 -
FRVSR [27] 5.1 137 - - - 37.09/0.9522 35.64/0.9319 26.69/0.8103
DUF [15] 5.8 974 28.63/0.8251 - - 38.48/0.9605 36.87/0.9447 27.38/0.8329
RBPN [9] 12.2 1507 30.09/0.8590 37.07/0.9435 27.12/0.8180 38.66/0.9596 37.20/0.9458 -
EDVR-M [35] 3.3 118 30.53/0.8699 37.09/0.9446 27.10/0.8186 39.40/0.9663 37.33/0.9484 27.45/0.8406
EDVR [35] 20.6 378 31.09/0.8800 37.61/0.9489 27.35/0.8264 39.89/0.9686 37.81/0.9523 27.85/0.8503
PFNL [40] 3.0 295 29.63/0.8502 36.14/0.9363 26.73/0.8029 38.74/0.9627 - 27.16/0.8355
MuCAN [19] - - 30.88/0.8750 37.32/0.9465 - - - -
TGA [13] 5.8 - - - - - 37.59/0.9516 27.63/0.8423
RLSP [8] 4.2 49 - - - 38.48/0.9606 36.49/0.9403 27.48/0.8388
RSDN [12] 6.2 94 - - - 39.35/0.9653 37.23/0.9471 27.92/0.8505
RRN [14] 3.4 45 - - - 38.96/0.9644 - 27.69/0.8488
BasicVSR [2] 6.3 63 31.42/0.8909 37.18/0.9450 27.24/0.8251 39.96/0.9694 37.53/0.9498 27.96/0.8553
IconVSR [2] 8.7 70 31.67/0.8948 37.47/0.9476 27.39/0.8279 40.03/0.9694 37.84/0.9524 28.04/0.8570
BasicVSR++ 7.3 77 32.39/0.9069 37.79/0.9500 27.79/0.8400 40.72/0.9722 38.21/0.9550 29.04/0.8753

Table 2: Performance of a lighter BasicVSR++. Our lighter sidered inclusively in our method. The number of residual
model, BasicVSR++ (S), has a similar complexity to BasicVSR blocks for each branch is set to 7. The number of feature
and IconVSR, but still shows considerable improvements. The channels is 64. Detailed experimental settings and model
PSNR and runtime are computed on REDS4.
architectures are provided in the supplementary material.
BasicVSR [2] IconVSR [2] BasicVSR++ (S)
4.1. Comparisons with State-of-the-Art Methods
Params (M) 6.3 8.7 6.4
Runtime (ms) 63 70 69 We conduct comprehensive experiments by comparing
PSNR (dB) 31.42 31.67 32.24 with 16 models, as listed in Table 1. The quantitative results
are summarized in Table 1 and the speed and performance
4. Experiments comparison is provided in Fig. 1(c). Note that the parame-
ters reported above are inclusive of that in the optical flow
Two widely-used datasets are adopted for training: network (if any). So the comparison is fair.
REDS [23] and Vimeo-90K [38]. For REDS, following Ba- As shown in Table 1, BasicVSR++ achieves state-of-the-
sicVSR [2], we use REDS43 as our test set and REDSval44 art performance on all datasets for both degradations. In
as our validation set. The remaining clips are used for particular, BasicVSR++ outperforms EDVR [35], a large-
training. We use Vid4 [21], UDM10 [40], and Vimeo- capacity sliding-window method, by up to 1.3 dB in PSNR,
90K-T [38] as test sets along with Vimeo-90K. All models while having 65% fewer parameters. When compared to
are tested with 4× downsampling using two degradations – the previous state of the art, IconVSR [2], BasicVSR++
Bicubic (BI) and Blur Downsampling (BD). possesses fewer parameters but has improvements of up to
We adopt Adam optimizer [17] and Cosine Annealing 1 dB. As shown in Table 2, even if we train a lighter version
scheme [22]. The initial learning rate of the main network of BasicVSR++ (denoted as BasicVSR++ (S)) with com-
and the flow network are set to 1×10−4 and 2.5×10−5 , re- parable network parameters and runtime to BasicVSR and
spectively. The total number of iterations is 600K, and the IconVSR, our model still shows an improvement of 0.82 dB
weights of the flow network are fixed during the first 5,000 over BasicVSR and 0.57 dB over IconVSR. Such gains are
iterations. The batch size is 8 and the patch size of input LR considered significant in VSR.
frames is 64×64. We use Charbonnier loss [4] since it bet- Some qualitative comparisons are shown in Fig. 11 to
ter handles outliers and improves the performance over the Fig. 14. BasicVSR++ successfully restores the fine details.
conventional `2 -loss [18]. We use pre-trained SPyNet [26] In particular, BasicVSR++ is the only method that restores
as our flow network. Its parameters and runtime are con- the wheel’s spokes in Fig. 11, the stairs in Fig. 13, and the
3 Clips 000, 011, 015, 020 of REDS training set. building structure in Fig. 14. More examples are provided
4 Clips 000, 001, 006, 017 of REDS validation set. in the supplementary material.
Bicubic RBPN EDVR-M EDVR
24.87 dB 29.82 dB 28.32 dB 28.64 dB

Frame 018, Clip 000 BasicVSR IconVSR BasicVSR++ (ours) GT

29.03 dB 29.21 dB 29.82 dB PSNR
Figure 4: Challenging scenario on REDS4 [35]. Only BasicVSR++ is able to recover the patterns of the wheel’s spokes.

Bicubic RBPN EDVR-M EDVR

23.79 dB 28.65 dB 28.06 dB 29.64 dB

Sequence 0216, Clip 024 BasicVSR IconVSR BasicVSR++ (ours) GT

28.25 dB 28.79 dB 30.80 dB PSNR
Figure 5: Challenging scenario on Vimeo-90K-T [38]. Only BasicVSR++ is able to reconstruct the stairs.

Bicubic RBPN EDVR-M EDVR

22.99 dB 25.09 dB 25.13 dB 25.37 dB

Frame 017, Clip City BasicVSR IconVSR BasicVSR++ (ours) GT

25.16 dB 25.32 dB 25.52 dB PSNR
Figure 6: Challenging scenario on Vid4 [21]. Only BasicVSR++ is able to recover the correct structure of the building.

5. Ablation Studies tended to higher orders and more propagation iterations.

However, while the performance gain is considerable when
To understand the contributions of the proposed compo- increasing from first-order to second-order (i.e. (B)→(C)),
nents, we start with a baseline and gradually insert the com- and from one to two iterations (i.e. (C)→BasicVSR++), we
ponents. From Table 3, it is apparent that each component observe in our preliminary experiments that further increas-
brings considerable improvement, ranging from 0.14 dB to ing the orders and number of iterations does not lead to a
0.46 dB in PSNR. significant improvement (0.05 dB in PSNR). Therefore, we
In theory, our proposed propagation schemes can be ex-
LR GT LR GT

LR w/o 2nd order w/ 2nd order LR w/o grid w/ grid

(a) Second-Order Propagation (b) Grid Propagation
Figure 7: Analysis of second-order grid propagation. By propagating the features more effectively, our second-order grid propagation
leads to more details, improving the output quality.

(a) Optical flow (b) DCN offsets #1 (c) DCN offsets #2 (d) DCN offsets #3

(e) Reference image (f) Neighboring image (g) Aligned by optical flow (h) Aligned by flow-guided
./.deformable alignment
Figure 8: Analysis of flow-guided deformable alignment. (a-d) The DCN offsets are highly similar to optical flow, but still with
noticeable differences. (e-f) The reference and neighboring images. (g) The feature aligned by optical flow experiences blurry edges. (h)
The feature aligned by our proposed module is sharper and preserves more details, as indicated by the red arrows.

Table 3: Ablation studies of the components. Each component be transmitted via a robust and effective propagation. This
brings significant improvements in PSNR, verifying their effec- complementary information essentially assists the restora-
tiveness. tion of the fine details. As shown in the examples, the net-
(A) (B) (C) BasicVSR++ work successfully restores the details with our components,
Flow-Guided Deform. Align. 3 3 3 whereas the counterparts without our components produce
Second-Order Propagation 3 3 blurry outputs.
Grid Propagation 3
Flow-Guided Deformable Alignment. In Fig. 8(a-d), we
PSNR (dB) 31.48 31.94 32.08 32.39
compare the offsets with the optical flow computed by the
flow estimation module in BasicVSR++. By learning only
keep both the orders and iterations to two. the residue to optical flow, the network produces offsets that
Second-Order Grid Propagation. We further provide are highly similar to the optical flow, but with observable
some qualitative comparisons to understand the contribu- differences. When compared to the baseline which aggre-
tions of the proposed propagation scheme. As shown in the gates information from only one spatial location indicated
two examples of Fig. 7, the contribution of both the second- by the motion (optical flow), our proposed module allows
order propagation and grid propagation is more noticeable retrieving information from multiple locations around, pro-
in regions that contain fine details and complex textures. viding additional flexibility.
In those regions, there is limited information from the cur- This flexibility leads to features with better quality, as
rent frame that can be employed for reconstruction. To im- shown in Fig. 8(g-h). When the warping is performed by
prove the output quality of those regions, effective informa- using optical flow, the aligned features contain blurry edges,
tion aggregation from other video frames is necessary. With owing to the interpolation operation in spatial warping. In
our second-order propagation scheme, the information can contrast, by gathering more information from the neighbors,
Table 4: Comparison of alignment modules. Using optical flow
to guide deformable alignment successfully stabilizes training.
BasicVSR++ directly incorporates optical flow into the network,
outperforming the offset-fidelity loss [3].

w/o Flow Offset-Fidelity Loss [3] Ours

PSNR (dB) 27.44 30.22 32.39

t EDVR BasicVSR BasicVSR++ GT

y Compressed BasicVSR++ GT

Figure 10: Results on compressed video enhancement. The out-

puts clearly possesses fewer artifacts, and the details are shown
more clearly.
Figure 9: Comparison of temporal profile. We select a column
(orange dotted lines) and observe the changes across time. The the temporal profile from EDVR contains significant noise,
profile from EDVR possesses noise, indicating flickering artifacts. indicating flickering artifacts in the output video. In con-
The profile from BasicVSR still contains discontinuity. By bet- trast, for recurrent networks, without explicit modeling of
ter aggregating the long-term information, the profile from Ba- temporal consistency, the profiles from BasicVSR and Ba-
sicVSR++ demonstrates a smoother transition. sicVSR++ demonstrate better consistencies. However, the
profile from BasicVSR still contains discontinuity. Ben-
the feature aligned by our proposed module is sharper and efit from our enhanced propagation and alignment, Ba-
preserves more details. sicVSR++ is able to aggregate richer information from the
To demonstrate the superiority of our designs, we com- video frames, showing smoother temporal transition. The
pare our alignment module with two variants: (1) No opti- video results are given in the supplementary material.
cal flow is used. (2) Optical flow is used as in the offset-
fidelity loss [3], i.e. the flow is merely used as supervision 6. NTIRE 2021 Challenge Results
in the loss function (rather than serving as base offsets as
in our method). As shown in Table 4, without using opti- In NTIRE 2021, BasicVSR++ wins the video super-
cal flow as guidance, the instability causes training to col- resolution track [29] with a compact and efficient structure.
lapse, leading to a very poor PSNR value. When using the In addition to VSR, BasicVSR++ generalizes well to other
offset-fidelity loss, the training is stabilized. However, a restoration tasks. BasicVSR++ obtains two champions and
drop of 2.17 dB from our full model is observed. Our flow- one runner-up in the compressed video enhancement chal-
guided deformable alignment directly incorporates optical lenge [39]. Fig. 10 shows the restoration results of three dif-
flow into the network to provide more explicit guidance, ferent patches of compressed videos. BasicVSR++ success-
leading to better results. fully reduces the artifacts and produces outputs with much
better qualities. The promising performance in the com-
Temporal Consistency. Here, we examine the tempo-
petitions demonstrate the generalizability and versatility of
ral consistency, which is another important direction in
BasicVSR++.
VSR. The recurrent framework intrinsically maintains a
better temporal consistency in comparison to the sliding-
window framework. In the sliding-window framework
7. Conclusion
(e.g., EDVR [35]), each frame is reconstructed indepen- In this work, we redesign BasicVSR with two novel
dently. In such a design, the consistency between the out- components to enhance its propagation and alignment per-
puts cannot be guaranteed. In contrast, in the recurrent formance for the task of video super-resolution. Our model
framework (e.g., BasicVSR [2]), the outputs are related BasicVSR++ outperforms existing state of the arts by a
through the propagation of the intermediate features. The large margin while maintaining efficiency. These designs
temporal propagation essentially helps maintaining better generalizes well to other video restoration tasks including
temporal consistency. compressed video enhancement. These components are
In Fig. 9 we show a comparison of the temporal pro- generic and we speculate that they will be useful for other
files between BasicVSR++ and two state-of-the-art methods video-based enhancement or restoration tasks such as de-
– EDVR and BasicVSR. For the sliding-window method, blurring and denoising.
References [17] Diederik Kingma and Jimmy Ba. Adam: A method for
stochastic optimization. In ICLR, 2015. 5, 10
[1] Jose Caballero, Christian Ledig, Aitken Andrew, Acosta Ale- [18] Wei-Sheng Lai, Jia-Bin Huang, Narendra Ahuja, and Ming-
jandro, Johannes Totz, Zehan Wang, and Wenzhe Shi. Real- Hsuan Yang. Deep laplacian pyramid networks for fast and
time video super-resolution with spatio-temporal networks accurate super-resolution. In CVPR, 2017. 5, 10
and motion compensation. In CVPR, 2017. 5 [19] Wenbo Li, Xin Tao, Taian Guo, Lu Qi, Jiangbo Lu, and Jiaya
[2] Kelvin C.K. Chan, Xintao Wang, Ke Yu, Chao Dong, and Jia. MuCAN: Multi-correspondence aggregation network for
Chen Change Loy. BasicVSR: The search for essential com- video super-resolution. In ECCV, 2020. 5
ponents in video super-resolution and beyond. In CVPR, [20] Tsungnan Lin, Bill G Horne, Peter Tino, and C Lee Giles.
2021. 1, 2, 3, 5, 8, 10 Learning long-term dependencies in NARX recurrent neural
[3] Kelvin C.K. Chan, Xintao Wang, Ke Yu, Chao Dong, and networks. IEEE Transactions on Neural Networks, 1996. 2
Chen Change Loy. Understanding deformable alignment in [21] Ce Liu and Deqing Sun. On bayesian adaptive video super
video super-resolution. In AAAI, 2021. 2, 4, 8 resolution. TPAMI, 2014. 5, 6, 10, 12
[4] Pierre Charbonnier, Laure Blanc-Feraud, Gilles Aubert, and [22] Ilya Loshchilov and Frank Hutter. SGDR: Stochas-
Michel Barlaud. Two deterministic half-quadratic regular- tic gradient descent with warm restarts. arXiv preprint
ization algorithms for computed imaging. In ICIP, 1994. 5, arXiv:1608.03983, 2016. 5, 10
10 [23] Seungjun Nah, Sungyong Baik, Seokil Hong, Gyeongsik
[5] Kai Chen, Jiaqi Wang, Shuo Yang, Xingcheng Zhang, Yuan- Moon, Sanghyun Son, Radu Timofte, and Kyoung Mu Lee.
jun Xiong, Chen Change Loy, and Dahua Lin. Optimizing NTIRE 2019 challenge on video deblurring and super-
video object detection via a scale-time lattice. In CVPR, resolution: Dataset and study. In CVPRW, 2019. 5, 10
2018. 2 [24] Seungjun Nah, Sanghyun Son, and Kyoung Mu Lee. Re-
[6] Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong current neural networks with intra-frame iterations for video
Zhang, Han Hu, and Yichen Wei. Deformable convolutional deblurring. In CVPR, 2019. 2
networks. In ICCV, 2017. 2, 4 [25] Simon Niklaus and Feng Liu. Softmax splatting for video
[7] Damien Fourure, Rémi Emonet, Élisa Fromont, Damien frame interpolation. In CVPR, 2020. 2
Muselet, Alain Trémeau, and Christian Wolf. Residual conv- [26] Anurag Ranjan and Michael J Black. Optical flow estimation
deconv grid network for semantic segmentation. In BMVC, using a spatial pyramid network. In CVPR, 2017. 5, 10
2017. 2 [27] Mehdi S M Sajjadi, Raviteja Vemulapalli, and Matthew
[8] Dario Fuoli, Shuhang Gu, and Radu Timofte. Efficient video Brown. Frame-recurrent video super-resolution. In CVPR,
super-resolution through recurrent latent space propagation. 2018. 1, 2, 3, 5
In ICCVW, 2019. 1, 2, 5 [28] Rohollah Soltani and Hui Jiang. Higher order recurrent neu-
[9] Muhammad Haris, Greg Shakhnarovich, and Norimichi ral networks. arXiv preprint arXiv:1605.00064, 2016. 2
Ukita. Recurrent back-projection network for video super- [29] Sanghyun Son, Suyoung Lee, Seungjun Nah, Radu Timo-
resolution. In CVPR, 2019. 1, 4, 5 fte, Kyoung Mu Lee, Kelvin C.K. Chan, et al. NTIRE 2021
[10] Yan Huang, Wei Wang, and Liang Wang. Bidirectional challenge on video super-resolution. In CVPRW, 2021. 2, 8
recurrent convolutional networks for multi-frame super- [30] Ke Sun, Yang Zhao, Borui Jiang, Tianheng Cheng, Bin Xiao,
resolution. In NIPS, 2015. 1, 2, 3 Dong Liu, Yadong Mu, Xinggang Wang, Wenyu Liu, and
[11] Yan Huang, Wei Wang, and Liang Wang. Video super- Jingdong Wang. High-resolution representations for labeling
resolution via bidirectional recurrent convolutional net- pixels and regions. arXiv preprint arXiv:1904.04514, 2019.
works. TPAMI, 2018. 1, 2, 3 2
[12] Takashi Isobe, Xu Jia, Shuhang Gu, Songjiang Li, Shengjin [31] Xin Tao, Hongyun Gao, Renjie Liao, Jue Wang, and Jiaya
Wang, and Qi Tian. Video super-resolution with recurrent Jia. Detail-revealing deep video super-resolution. In CVPR,
structure-detail network. In ECCV, 2020. 1, 2, 3, 5 2017. 5
[13] Takashi Isobe, Songjiang Li, Xu Jia, Shanxin Yuan, Gregory [32] Yapeng Tian, Yulun Zhang, Yun Fu, and Chenliang Xu.
Slabaugh, Chunjing Xu, Ya-Li Li, Shengjin Wang, and Qi TDAN: Temporally deformable alignment network for video
Tian. Video super-resolution with temporal group attention. super-resolution. In CVPR, 2018. 1, 2, 4
In CVPR, 2020. 5 [33] Hua Wang, Dewei Su, Longcun Jin, and Chuangchuang Liu.
[14] Takashi Isobe, Fang Zhu, and Shengjin Wang. Revisiting Deformable non-local network for video super-resolution.
temporal modeling for video super-resolution. In BMVC, IEEE Access, 2019. 2, 4
2020. 1, 2, 3, 5 [34] Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang,
[15] Younghyun Jo, Seoung Wug Oh, Jaeyeon Kang, and Seon Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui
Joo Kim. Deep video super-resolution network using dy- Tan, Xinggang Wang, Wenyu Liu, and Bin Xiao. Deep
namic upsampling filters without explicit motion compensa- high-resolution representation learning for visual recogni-
tion. In CVPR, 2018. 5 tion. TPAMI, 2020. 2
[16] Nan Rosemary Ke, Anirudh Goyal, Olexa Bilaniuk, Jonathan [35] Xintao Wang, Kelvin C.K. Chan, Ke Yu, Chao Dong, and
Binas, Michael C Mozer, Chris Pal, and Yoshua Bengio. Chen Change Loy. EDVR: Video restoration with enhanced
Sparse attentive backtracking: Temporal creditassignment deformable convolutional networks. In CVPRW, 2019. 1, 2,
through reminding. In NIPS, 2018. 2 4, 5, 6, 8, 11
[36] Xiaoyu Xiang, Yapeng Tian, Yulun Zhang, Yun Fu, Jan P Table 5: Architectures of C o and C m . The two modules share
Allebach, and Chenliang Xu. Zooming Slow-Mo: Fast and the first six layers. They can be implemented as a stack of con-
accurate one-stage space-time video super-resolution. In volutions followed by a channel-splitting. The arguments in the
CVPR, 2020. 2 convolution layer are input channels, output channels, and kernel
[37] Xiangyu Xu, Muchen Li, Wenxiu Sun, and Ming-Hsuan size, respectively.
Yang. Learning spatial and spatio-temporal pixel aggrega- Layer Co Cm
tions for image and video denoising. TIP, 2020. 2, 4 1. conv(196, 64, 3)
[38] Tianfan Xue, Baian Chen, Jiajun Wu, Donglai Wei, and 2. LeakyReLU(0.1)
William T Freeman. Video enhancement with task-oriented 3. conv(64, 64, 3)
flow. IJCV, 2019. 1, 4, 5, 6, 10, 12 4. LeakyReLU(0.1)
[39] Ren Yang, Radu Timofte, Jing Liu, Yi Xu, Xinjian Zhang, 5. conv(64, 64, 3)
Minyi Zhao, Shuigeng Zhou, Kelvin CK Chan, Shangchen 6. LeakyReLU(0.1)
Zhou, Xiangyu Xu, et al. NTIRE 2021 challenge on quality 7. conv(64, 288, 3) conv(64, 144, 3)
enhancement of compressed video: Methods and results. In
CVPRW, 2021. 2, 8
[40] Peng Yi, Zhongyuan Wang, Kui Jiang, Junjun Jiang, and Degradations. All models are tested with 4× down-
Jiayi Ma. Progressive fusion video super-resolution net- sampling using two degradations – Bicubic (BI) and
work via exploiting non-local spatio-temporal correlations. Blur Downsampling (BD). For BI, the MATLAB function
In ICCV, 2019. 5, 10, 11 imresize is used for downsampling. For BD, we blur the
[41] Shangchen Zhou, Jiawei Zhang, Jinshan Pan, Haozhe Xie, ground-truth by a Gaussian filter with σ=1.6, followed by
Wangmeng Zuo, and Jimmy Ren. Spatio-temporal filter a subsampling every four pixels.
adaptive network for video deblurring. In ICCV, 2019. 2 Training Settings. We adopt Adam optimizer [17] and Co-
[42] Xizhou Zhu, Han Hu, Stephen Lin, and Jifeng Dai. De- sine Annealing scheme [22]. When trained on REDS, the
formable ConvNets v2: More deformable, better results. In initial learning rate of the main network and the flow net-
CVPR, 2019. 2, 4 work are set to 1×10−4 and 2.5×10−5 , respectively. The
[43] Juntang Zhuang, Junlin Yang, Lin Gu, and Nicha Dvornek. total number of iterations is 600K, and the weights of the
ShelfNet for fast semantic segmentation. In ICCVW, 2019.
flow network are fixed during the first 5,000 iterations. The
2
batch size is 8 and the patch size of input LR frames is
64×64. We use Charbonnier loss [4] since it better han-
A. Network Architecture dles outliers and improves the performance over the conven-
We use pretrained SPyNet [26] as our flow network. The tional `2 -loss [18]. During training, 30 LR frames are used
number of residual blocks for the initial feature extraction as inputs. Since Vimeo-90K contains only seven frames
is set to 5, and the number of residual blocks for each prop- per sequence, networks trained solely on Vimeo-90K may
agation branch is set to 7. The feature channel is set to 64. not be able to capture long-term dependencies. Therefore,
The architecture of our second-order deformable align- we initialize the model using the weights trained on REDS
ment is highly similar to the first-order counterpart (Fig. when trained on Vimeo-90K. The number of finetune itera-
3 in the main paper). The only difference is that the pre- tions is 300K.
aligned features and optical flows from different timesteps Test Settings. We take the full video sequence as inputs to
are concatenated, and passed to the offset estimation mod- explore information from all video frames for restoration.
ule C o and mask estimation module C m . Their architectures
are detailed in Table 5. We set the DCN kernel size to 3 C. Qualitative Comparisons
and the number of deformable groups to 16. Codes will be In this section, we provide additional qualitative compar-
released. isons on REDS4 [23], UDM10 [40], Vimeo-90K [38], and
Vid4 [21]. From the examples, we see that BasicVSR++ is
B. Experimental Settings able to restore the fine details, leading to plausible results.
A video demo is also provided in the submitted zip file.
Datasets. Two widely-used datasets are adopted for train-
ing: REDS [23] and Vimeo-90K [38]. For REDS, fol-
lowing BasicVSR [2], we use REDS45 as our test set and
REDSval46 as our validation set. The remaining clips are
used for training. We use Vid4 [21], UDM10 [40], and
Vimeo-90K-T [38] as test sets along with Vimeo-90K.
5 Clips 000, 011, 015, 020 of REDS training set.
6 Clips 000, 001, 006, 017 of REDS validation set.
Bicubic RBPN EDVR-M EDVR
25.93 dB 30.88 dB 31.28 dB 32.11 dB

Frame 002, Clip 011 BasicVSR IconVSR BasicVSR++ (ours) GT

32.25 dB 32.62 dB 33.61 dB PSNR

Bicubic RBPN EDVR-M EDVR

26.00 dB 29.31 dB 30.26 dB 30.82 dB

Frame 061, Clip 020 BasicVSR IconVSR BasicVSR++ (ours) GT

31.43 dB 31.65 dB 32.25 dB PSNR
Figure 11: Qualitative comparison on REDS4 [35].

Bicubic EDVR BasicVSR

24.03 dB 30.91 dB 29.42 dB

Frame 031, Clip auditorium IconVSR BasicVSR++ (ours) GT

31.06 dB 31.93 dB PSNR
Figure 12: Qualitative comparison on UDM10 [40].
Bicubic RBPN EDVR-M EDVR
20.32 dB 24.19 dB 23.09 dB 23.89 dB

Sequence 0864, Clip 015 BasicVSR IconVSR BasicVSR++ (ours) GT

22.16 dB 23.78 dB 25.34 dB PSNR

Bicubic RBPN EDVR-M EDVR

30.08 dB 33.34 dB 31.93 dB 30.98 dB

Sequence 0723, Clip 085 BasicVSR IconVSR BasicVSR++ (ours) GT

32.84 dB 31.55 dB 35.51 dB PSNR
Figure 13: Qualitative comparison on Vimeo-90K-T [38].

Bicubic RBPN EDVR-M EDVR

19.10 dB 21.94 dB 22.01 dB 22.12 dB

Frame 040, Clip calendar BasicVSR IconVSR BasicVSR++ (ours) GT

21.78 dB 22.09 dB 22.50 dB PSNR
Figure 14: Qualitative comparison on Vid4 [21].

Lesson 1-Land and Resources of Africa Answer Key
67% (3)
Lesson 1-Land and Resources of Africa Answer Key
3 pages
Selecting and Constructing Test Items and Tasks
100% (3)
Selecting and Constructing Test Items and Tasks
22 pages
BasicVSR
No ratings yet
BasicVSR
14 pages
Chen Et Al. - 2022 - VideoINR Learning Video Implicit Neural Represent
No ratings yet
Chen Et Al. - 2022 - VideoINR Learning Video Implicit Neural Represent
16 pages
2023009532
No ratings yet
2023009532
6 pages
VRT: A Video Restoration Transformer
No ratings yet
VRT: A Video Restoration Transformer
13 pages
Lee Reference-Based Video Super-Resolution Using Multi-Camera Video Triplets CVPR 2022 Paper
No ratings yet
Lee Reference-Based Video Super-Resolution Using Multi-Camera Video Triplets CVPR 2022 Paper
10 pages
Baghbaderani_Temporally-Consistent_Video_Semantic_Segmentation_With_Bidirectional_Occlusion-Guided_Feature_Propagation_WACV_2024_paper
No ratings yet
Baghbaderani_Temporally-Consistent_Video_Semantic_Segmentation_With_Bidirectional_Occlusion-Guided_Feature_Propagation_WACV_2024_paper
11 pages
Swiftnet: Real-Time Video Object Segmentation
No ratings yet
Swiftnet: Real-Time Video Object Segmentation
10 pages
2022_PVT v2
No ratings yet
2022_PVT v2
10 pages
Simple Baseline For Video Restoration With Grouped Spatial-Temporal Shift CVPR 2023 Paper
No ratings yet
Simple Baseline For Video Restoration With Grouped Spatial-Temporal Shift CVPR 2023 Paper
11 pages
A Simple Baseline For Video Restoration With Grouped Spatial-Temporal Shift
No ratings yet
A Simple Baseline For Video Restoration With Grouped Spatial-Temporal Shift
14 pages
7 Video Google A Text Retrieval Approach To Object Matching in Videos
No ratings yet
7 Video Google A Text Retrieval Approach To Object Matching in Videos
8 pages
2501.01235v2
No ratings yet
2501.01235v2
10 pages
EVASR Edge-Based Video Delivery With Salience-Aware
No ratings yet
EVASR Edge-Based Video Delivery With Salience-Aware
11 pages
Bidirectional recurrent deformable alignment network for video super-resolution
No ratings yet
Bidirectional recurrent deformable alignment network for video super-resolution
6 pages
Structure-Preserving Image Super-Resolution
No ratings yet
Structure-Preserving Image Super-Resolution
22 pages
Liu Robust Video Super-Resolution ICCV 2017 Paper
No ratings yet
Liu Robust Video Super-Resolution ICCV 2017 Paper
9 pages
Exploring Pre-Trained Text-to-Video Diffusion Models For Referring Video Object Segmentation
No ratings yet
Exploring Pre-Trained Text-to-Video Diffusion Models For Referring Video Object Segmentation
21 pages
Revitalizing Legacy Video Content Deinterlacing With Bidirectional Information Propagation Paper
No ratings yet
Revitalizing Legacy Video Content Deinterlacing With Bidirectional Information Propagation Paper
13 pages
Generalized and Efficient 2D Gaussian Splatting for
No ratings yet
Generalized and Efficient 2D Gaussian Splatting for
23 pages
Wang End-to-End Video Instance Segmentation With Transformers CVPR 2021 Paper PDF
No ratings yet
Wang End-to-End Video Instance Segmentation With Transformers CVPR 2021 Paper PDF
10 pages
Blazingly Fast Seg
No ratings yet
Blazingly Fast Seg
10 pages
Enhanced Deep Residual Networks For Single Image Super-Resolution
No ratings yet
Enhanced Deep Residual Networks For Single Image Super-Resolution
9 pages
revisit-anything
No ratings yet
revisit-anything
29 pages
Collaborative Video Object Segmentation by Foreground-Background Integration
No ratings yet
Collaborative Video Object Segmentation by Foreground-Background Integration
17 pages
Trunk-Branch Ensemble Convolutional Neural Network
No ratings yet
Trunk-Branch Ensemble Convolutional Neural Network
14 pages
D V2D: V D D S M: EEP Ideo To Epth With Ifferentiable Tructure From Otion
No ratings yet
D V2D: V D D S M: EEP Ideo To Epth With Ifferentiable Tructure From Otion
20 pages
2301.03832v1
No ratings yet
2301.03832v1
29 pages
Li_Recurrent_Dynamic_Embedding_for_Video_Object_Segmentation_CVPR_2022_paper
No ratings yet
Li_Recurrent_Dynamic_Embedding_for_Video_Object_Segmentation_CVPR_2022_paper
10 pages
A Convolutional Neural Network For Nonrigid
No ratings yet
A Convolutional Neural Network For Nonrigid
9 pages
Images Super-Resolution Using Improved Generative Adversarial Networks
No ratings yet
Images Super-Resolution Using Improved Generative Adversarial Networks
5 pages
2024CVPR_Multimodal Prompt Perceiver：Empower Adaptiveness, Generalizability and Fidelity for All-In-One Image Restoration
No ratings yet
2024CVPR_Multimodal Prompt Perceiver：Empower Adaptiveness, Generalizability and Fidelity for All-In-One Image Restoration
13 pages
SAIL-VOS: Semantic Amodal Instance Level Video Object Segmentation - A Synthetic Dataset and Baselines
No ratings yet
SAIL-VOS: Semantic Amodal Instance Level Video Object Segmentation - A Synthetic Dataset and Baselines
11 pages
Video Compression Based On Spatio-Temporal Resolution Adaptation PDF
No ratings yet
Video Compression Based On Spatio-Temporal Resolution Adaptation PDF
6 pages
Occlusion-Aware Video Object Inpainting
No ratings yet
Occlusion-Aware Video Object Inpainting
11 pages
DVIS Decoupled Video Instance Segmentation Framework
No ratings yet
DVIS Decoupled Video Instance Segmentation Framework
14 pages
Water Marking Journal
No ratings yet
Water Marking Journal
7 pages
VIhanceD_BTech_Thesis
No ratings yet
VIhanceD_BTech_Thesis
39 pages
Learning Video Object Segmentation From Static Images
No ratings yet
Learning Video Object Segmentation From Static Images
10 pages
Multi-Modal_Structure-Embedding_Graph_Transformer_for_Visual_Commonsense_Reasoning
No ratings yet
Multi-Modal_Structure-Embedding_Graph_Transformer_for_Visual_Commonsense_Reasoning
11 pages
Lightweight Image Super-Resolution Based On
No ratings yet
Lightweight Image Super-Resolution Based On
27 pages
minVIS
No ratings yet
minVIS
15 pages
Atapour-Abarghouei Veritatem Dies Aperit - Temporally Consistent Depth Prediction Enabled by CVPR 2019 Paper PDF
No ratings yet
Atapour-Abarghouei Veritatem Dies Aperit - Temporally Consistent Depth Prediction Enabled by CVPR 2019 Paper PDF
12 pages
Sigir 2022
No ratings yet
Sigir 2022
11 pages
Doctoral Dissertation For Junyong Lee, PH.D., POSTECH
No ratings yet
Doctoral Dissertation For Junyong Lee, PH.D., POSTECH
141 pages
Opencv C++ Only
No ratings yet
Opencv C++ Only
400 pages
A Unified Spatiotemporal Prior Based On Geodesic Distance For Video Object Segmentation
No ratings yet
A Unified Spatiotemporal Prior Based On Geodesic Distance For Video Object Segmentation
18 pages
Enhanced Super-Resolution Using GAN
No ratings yet
Enhanced Super-Resolution Using GAN
6 pages
2412.10302v1
No ratings yet
2412.10302v1
28 pages
Making Convolutional Networks Shift-Invariant Again
No ratings yet
Making Convolutional Networks Shift-Invariant Again
17 pages
Video Super-Resolution Reconstruction Based On Deep Learning and Spatio-Temporal Feature Self-Similarity Extended Abstract
No ratings yet
Video Super-Resolution Reconstruction Based On Deep Learning and Spatio-Temporal Feature Self-Similarity Extended Abstract
2 pages
Applsci 14 09140
No ratings yet
Applsci 14 09140
33 pages
Copy Paste++ v1
No ratings yet
Copy Paste++ v1
3 pages
2023 Sivos+HCR VS
No ratings yet
2023 Sivos+HCR VS
148 pages
InverseForm A Loss Function For Structured Boundar
No ratings yet
InverseForm A Loss Function For Structured Boundar
11 pages
CLIP As RNN: Segment Countless Visual Concepts Without Training Endeavor
No ratings yet
CLIP As RNN: Segment Countless Visual Concepts Without Training Endeavor
18 pages
Frame-Recurrent Video Super-Resolution
No ratings yet
Frame-Recurrent Video Super-Resolution
9 pages
1 s2.0 S0031320322007518 Main
No ratings yet
1 s2.0 S0031320322007518 Main
12 pages
DS Uf3sc065007k4s-1667973
No ratings yet
DS Uf3sc065007k4s-1667973
13 pages
thinking claude
No ratings yet
thinking claude
8 pages
Map Long Bay Regional Park
No ratings yet
Map Long Bay Regional Park
3 pages
PDF Understanding Racism in a Post-Racial World: Visible Invisibilities Sunshine Kamaloni download
100% (6)
PDF Understanding Racism in a Post-Racial World: Visible Invisibilities Sunshine Kamaloni download
55 pages
Operation Manual OXYGEN
No ratings yet
Operation Manual OXYGEN
16 pages
Reflection Paper - Mead
No ratings yet
Reflection Paper - Mead
3 pages
The World As Archive
No ratings yet
The World As Archive
6 pages
European Mother Tongue Macedonian PDF
No ratings yet
European Mother Tongue Macedonian PDF
218 pages
Report Tec2018
No ratings yet
Report Tec2018
53 pages
MSc 2025 Entrance Test Pattern Syllabus
No ratings yet
MSc 2025 Entrance Test Pattern Syllabus
8 pages
잠재적인 *latent
No ratings yet
잠재적인 *latent
106 pages
We Are Catalyst of Change Q 0ans
No ratings yet
We Are Catalyst of Change Q 0ans
2 pages
Hme Ivd Labgeo Hc10 Catalog Tfs v4.5 Low-0
No ratings yet
Hme Ivd Labgeo Hc10 Catalog Tfs v4.5 Low-0
2 pages
Subject: Machine Tools and Metrology: Course: B Tech - Mechanical Engineering
No ratings yet
Subject: Machine Tools and Metrology: Course: B Tech - Mechanical Engineering
341 pages
Augmented Reality: José María Ariso (Ed.)
No ratings yet
Augmented Reality: José María Ariso (Ed.)
331 pages
The Origin of Sumerians
No ratings yet
The Origin of Sumerians
3 pages
File Reading
No ratings yet
File Reading
43 pages
Development and Application of Infrared Thermograp
No ratings yet
Development and Application of Infrared Thermograp
27 pages
Problems
No ratings yet
Problems
27 pages
Math of Photogrammetry
No ratings yet
Math of Photogrammetry
16 pages
Probability and Random Processes_Course_Outline_2021_22docx
No ratings yet
Probability and Random Processes_Course_Outline_2021_22docx
2 pages
DLL_SCIENCE 6_Q3_W5
No ratings yet
DLL_SCIENCE 6_Q3_W5
5 pages
Reading Extra
100% (2)
Reading Extra
17 pages
Group1 Cleaning Agents and Methods
No ratings yet
Group1 Cleaning Agents and Methods
22 pages
34-Planning and Financial Performance
No ratings yet
34-Planning and Financial Performance
13 pages
System GMM and Others Information Assumtions Restrictions Etc.
No ratings yet
System GMM and Others Information Assumtions Restrictions Etc.
3 pages
DSA Technical Textbook
No ratings yet
DSA Technical Textbook
620 pages
CHAPTER II (Translation)
No ratings yet
CHAPTER II (Translation)
13 pages

BasicVSR++

Uploaded by

BasicVSR++

Uploaded by

BasicVSR++: Improving Video Super-Resolution

with Enhanced Propagation and Alignment

Kelvin C.K. Chan Shangchen Zhou Xiangyu Xu Chen Change Loy*

(a) Propagation (b) Alignment (c) Performance Gain

feature oi→i−1 and modulation masks mi→i−1 . Instead of directly

DCN DCN Here C {o,m} denotes a stack of convolutions, and σ denotes

fˆi = D (fi−1 ; oi→i−1 , mi→i−1 ) , (5)

Frame 018, Clip 000 BasicVSR IconVSR BasicVSR++ (ours) GT

Bicubic RBPN EDVR-M EDVR

Sequence 0216, Clip 024 BasicVSR IconVSR BasicVSR++ (ours) GT

Bicubic RBPN EDVR-M EDVR

Frame 017, Clip City BasicVSR IconVSR BasicVSR++ (ours) GT

5. Ablation Studies tended to higher orders and more propagation iterations.

LR w/o 2nd order w/ 2nd order LR w/o grid w/ grid

w/o Flow Offset-Fidelity Loss [3] Ours

t EDVR BasicVSR BasicVSR++ GT

Figure 10: Results on compressed video enhancement. The out-

Frame 002, Clip 011 BasicVSR IconVSR BasicVSR++ (ours) GT

Bicubic RBPN EDVR-M EDVR

Frame 061, Clip 020 BasicVSR IconVSR BasicVSR++ (ours) GT

Bicubic EDVR BasicVSR

Frame 031, Clip auditorium IconVSR BasicVSR++ (ours) GT

Sequence 0864, Clip 015 BasicVSR IconVSR BasicVSR++ (ours) GT

Bicubic RBPN EDVR-M EDVR

Sequence 0723, Clip 085 BasicVSR IconVSR BasicVSR++ (ours) GT

Bicubic RBPN EDVR-M EDVR

Frame 040, Clip calendar BasicVSR IconVSR BasicVSR++ (ours) GT

You might also like