BasicVSR++
BasicVSR++
BasicVSR++
BasicVSR BasicVSR++ BasicVSR BasicVSR++ (ours)
EDVR
(CVPRW19)
IconVSR
(CVPR21)
Bas ic VSR
(CVPR21) RBPN
(CVPR19)
spatial residual RSDN DUF
(ECCV20)
warping offsets (CVPR18)
PFNL
RLSP (ICCV19)
(ICCVW19)
#Params
Optical flow
DCN FRVSR
Features (CVPR18)
5M 10M 15M 20M
Figure 1: Improvements over BasicVSR [2]. (a) Second-order grid propagation in BasicVSR++ allows a more effective propagation of
features. (b) Flow-guided deformable alignment in BasicVSR++ provides a means for more robust feature alignment across misaligned
frames. (c) BasicVSR++ outperforms existing state of the arts while maintaining efficiency.
Abstract 1. Introduction
A recurrent structure is a popular framework choice Video super-resolution (VSR) is challenging in that one
for the task of video super-resolution. The state-of-the- needs to gather complementary information across mis-
art method BasicVSR adopts bidirectional propagation with aligned video frames for restoration. One prevalent ap-
feature alignment to effectively exploit information from the proach is the sliding-window framework [9, 32, 35, 38],
entire input video. In this study, we redesign BasicVSR where each frame in the video is restored using the frames
by proposing second-order grid propagation and flow- within a short temporal window. In contrast to the sliding-
guided deformable alignment. We show that by empower- window framework, a recurrent framework attempts to ex-
ing the recurrent framework with the enhanced propagation ploit the long-term dependencies by propagating the latent
and alignment, one can exploit spatiotemporal information features. In general, these methods [8, 10, 11, 12, 14, 27] al-
across misaligned video frames more effectively. The new low a more compact model compared to those in the sliding-
components lead to an improved performance under a sim- window framework. Nevertheless, the problems of trans-
ilar computational constraint. In particular, our model Ba- mitting long-term information and aligning features across
sicVSR++ surpasses BasicVSR by 0.82 dB in PSNR with frames in a recurrent model remain formidable.
similar number of parameters. In addition to video super-
A recent work by Chan et al. [2] studies the problems
resolution, BasicVSR++ generalizes well to other video
carefully. It summarizes the common VSR pipelines into
restoration tasks such as compressed video enhancement.
four components, namely Propagation, Alignment, Aggre-
In NTIRE 2021, BasicVSR++ obtains three champions and
gation, and Upsampling, and proposes BasicVSR. In Ba-
one runner-up in the Video Super-Resolution and Com-
sicVSR, bidirectional propagation is adopted to exploit in-
pressed Video Enhancement Challenges. Codes and models
formation from the entire input video for reconstruction.
will be released to MMEditing1 .
For alignment, optical flow is adopted for feature warp-
ing. BasicVSR serves as a succinct yet strong backbone
∗ Corresponding author where components can be easily added for performance
1 https://github.com/open-mmlab/mmediting gain. However, its rudimentary designs in propagation and
1
alignment limit the efficacy of information aggregation. As rent detail structural block and a hidden state adaptation
a result, the network often struggles to restore fine details, module to enhance the robustness to appearance change and
especially when dealing with occluded and complex re- error accumulation. Chan et al. [2] propose BasicVSR. The
gions. The shortcomings call for refined designs in prop- work demonstrates the importance of bidirectional propa-
agation and alignment. gation over unidirectional propagation to better exploit fea-
In this work, we redesign BasicVSR by devising second- tures temporally. In addition, the study also shows the ad-
order grid propagation and flow-guided deformable align- vantage of feature alignment in aligning highly relevant but
ment that allow information to be propagated and aggre- misaligned features. We refer readers to [2] for the detailed
gated more effectively: comparisons of these components against the more conven-
1) The proposed second-order grid propagation, as shown tional ways of performing propagation and alignment. In
in Fig. 1(a), addresses two limitations in BasicVSR: i) we our experiments, we focus on comparing with BasicVSR
allow more aggressive bidirectional propagation arranged since it is the state-of-the-art method for VSR.
in a grid-like manner, and ii) we relax the assumption of Grid Connections. Grid-like designs are seen in various
first-order Markov property in BasicVSR, and incorporate a vision tasks such as object detection [5, 30, 34], semantic
second-order connection [28] into the network so that infor- segmentation [7, 30, 34, 43], and frame interpolation [25].
mation can be aggregated from different spatiotemporal lo- In general, these designs decompose a given image/feature
cations. Both modifications ameliorate information flow in into multiple resolutions, and grids are adopted across res-
the network and improve robustness of the network against olutions to capture both fine and coarse information. Un-
occluded and fine regions. like aforementioned methods, BasicVSR++ does not adopt
2) BasicVSR shows advantages of using optical flow for a multi-scale design. Instead, the grid structure is designed
temporal alignment. However, optical flow is not robust for propagation across time in a bidirectional fashion. We
to occlusion. Inaccurate flow estimation could jeopardize link different frames with a grid connection to repeatedly
the restoration performance. Deformable alignment [32, refine the features, improving expressiveness.
33, 35] has demonstrated its superiority in VSR, but it is Higher-Order Propagation. Higher-order propagation has
difficult to train in practice [3]. To take advantage of de- been studied to improve gradient flow [16, 20, 28]. These
formable alignment while overcoming the training insta- methods demonstrate improvements in different tasks in-
bility, we propose flow-guided deformable alignment, as cluding classification [16] and language modeling [28].
shown in Fig. 1(b). In the proposed module, instead of However, these methods do not consider temporal align-
learning the DCN offsets directly [6, 42], we reduce the bur- ment, which is shown critical in the task of VSR [2].
den of offset learning by using optical flow field as base off- To allow temporal alignment in second-order propagation,
sets refined by flow field residue. The latter can be learned we incorporate alignment into our propagation scheme by
more stably than the original DCN offsets. extending our flow-guided deformable alignment to the
The two aforementioned components are novel and more second-order settings.
discussion can be found in the related work section. Bene- Deformable Alignment. Several works [32, 33, 35, 37]
fit from the more effective designs, BasicVSR++ can adopt employ deformable alignment. TDAN [32] performs align-
a more lightweight backbone than its counterparts. Conse- ment at the feature level using deformable convolution.
quently, BasicVSR++ surpasses existing state of the arts, EDVR [35] further proposes a Pyramid Cascading De-
including BasicVSR and IconVSR (the more elaborated formable (PCD) alignment with a multi-scale design. Re-
BasicVSR variant), by a large margin while maintaining cently, Chan et al. [3] analyze deformable alignment and
efficiency (Fig. 1(c)). In particular, when compared to show that the performance gain over flow-based alignment
its precedent BasicVSR, a gain of 0.82 dB in PSNR on comes from the offset diversity. Motivated by [3], we adopt
REDS4 [35] is obtained with similar numbers of param- deformable alignment but with a reformulation to overcome
eters. In addition, BasicVSR++ obtains three champi- the training instability [3]. Our flow-guided deformable
ons and one runner-up in the NTIRE 2021 Video Super- alignment is different from offset-fidelity loss [3]. The lat-
Resolution [29] and Compressed Video Enhancement [39] ter uses optical flow as a loss function during training. In
Challenges. contrast, we directly incorporate optical flow into our mod-
ule as base offsets, allowing a more explicit guidance, both
2. Related Work during training and inference.
Recurrent Networks. The recurrent framework is a pop- 3. Methodology
ular structure adopted in various video processing tasks
such as super-resolution [8, 10, 11, 12, 14, 27], deblur- BasicVSR++ consists of two effective modifications
ring [24, 41], and frame interpolation [36]. For instance, for improving propagation and alignment. As shown in
RSDN [12] adopts unidirectional propagation with a recur- Fig. 2, given an input video, residual blocks are first ap-
xi
<latexit sha1_base64="mTT3im4iVdIisQa8mLsyYVd/ulE=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LC2CF0uigh4LXjxWsB/QhrLZTtqlm03Y3Ygl9Ed48aCIV3+PN/+N2zYHbX0w8Hhvhpl5QSK4Nq777RTW1jc2t4rbpZ3dvf2D8uFRS8epYthksYhVJ6AaBZfYNNwI7CQKaRQIbAfj25nffkSleSwfzCRBP6JDyUPOqLFS+6mf8fPLab9cdWvuHGSVeDmpQo5Gv/zVG8QsjVAaJqjWXc9NjJ9RZTgTOC31Uo0JZWM6xK6lkkao/Wx+7pScWmVAwljZkobM1d8TGY20nkSB7YyoGellbyb+53VTE974GZdJalCyxaIwFcTEZPY7GXCFzIiJJZQpbm8lbEQVZcYmVLIheMsvr5LWRc1za979VbVeyeMowglU4Aw8uIY63EEDmsBgDM/wCm9O4rw4787HorXg5DPH8AfO5w/7Ko84</latexit>
3 xi
<latexit sha1_base64="RqXt+LLPBj9PRmxN5c6DObPGAlU=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LC2CF0tSBD0WvHisYD+gDWWznbRLN5uwuxFL6I/w4kERr/4eb/4bt20O2vpg4PHeDDPzgkRwbVz32ylsbG5t7xR3S3v7B4dH5eOTto5TxbDFYhGrbkA1Ci6xZbgR2E0U0igQ2Akmt3O/84hK81g+mGmCfkRHkoecUWOlztMg45f12aBcdWvuAmSdeDmpQo7moPzVH8YsjVAaJqjWPc9NjJ9RZTgTOCv1U40JZRM6wp6lkkao/Wxx7oycW2VIwljZkoYs1N8TGY20nkaB7YyoGetVby7+5/VSE974GZdJalCy5aIwFcTEZP47GXKFzIipJZQpbm8lbEwVZcYmVLIheKsvr5N2vea5Ne/+qtqo5HEU4QwqcAEeXEMD7qAJLWAwgWd4hTcncV6cd+dj2Vpw8plT+APn8wf5pY83</latexit>
2 xi
<latexit sha1_base64="8ZgKKml4S7NkJ5L1k2CZDgexDro=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LC2CF0siBT0WvHisYD+gDWWz3bRLN5uwOxFL6I/w4kERr/4eb/4bt20O2vpg4PHeDDPzgkQKg6777RQ2Nre2d4q7pb39g8Oj8vFJ28SpZrzFYhnrbkANl0LxFgqUvJtoTqNA8k4wuZ37nUeujYjVA04T7kd0pEQoGEUrdZ4Gmbj0ZoNy1a25C5B14uWkCjmag/JXfxizNOIKmaTG9Dw3QT+jGgWTfFbqp4YnlE3oiPcsVTTixs8W587IuVWGJIy1LYVkof6eyGhkzDQKbGdEcWxWvbn4n9dLMbzxM6GSFLliy0VhKgnGZP47GQrNGcqpJZRpYW8lbEw1ZWgTKtkQvNWX10n7qua5Ne++Xm1U8jiKcAYVuAAPrqEBd9CEFjCYwDO8wpuTOC/Ou/OxbC04+cwp/IHz+QP4II82</latexit>
1 <latexit sha1_base64="54yGSiK/xIYFJ2p37yRd1ZH/AGE=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LC2Cp5KIUI8FLx4r2g9oQ9lsJ+3SzSbsbsQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJzdzvPKLSPJYPZpqgH9GR5CFn1Fjp/mnAB+WqW3MXIOvEy0kVcjQH5a/+MGZphNIwQbXueW5i/Iwqw5nAWamfakwom9AR9iyVNELtZ4tTZ+TcKkMSxsqWNGSh/p7IaKT1NApsZ0TNWK96c/E/r5ea8NrPuExSg5ItF4WpICYm87/JkCtkRkwtoUxxeythY6ooMzadkg3BW315nbQva55b8+6uqo1KHkcRzqACF+BBHRpwC01oAYMRPMMrvDnCeXHenY9la8HJZ07hD5zPH1bwjbg=</latexit>
xi
fij 1
fij
<latexit sha1_base64="Fuf2p1Br1GNsVk9s5hviwCx8zKQ=">AAAB8nicbVBNS8NAEJ3Ur1q/qh69LC2CF0sigh4LXjxWsB+QxrLZbtq1m92wuxFKyM/w4kERr/4ab/4bt20O2vpg4PHeDDPzwoQzbVz32ymtrW9sbpW3Kzu7e/sH1cOjjpapIrRNJJeqF2JNORO0bZjhtJcoiuOQ0244uZn53SeqNJPi3kwTGsR4JFjECDZW8qOH7PHcywcZywfVuttw50CrxCtIHQq0BtWv/lCSNKbCEI619j03MUGGlWGE07zSTzVNMJngEfUtFTimOsjmJ+fo1CpDFEllSxg0V39PZDjWehqHtjPGZqyXvZn4n+enJroOMiaS1FBBFouilCMj0ex/NGSKEsOnlmCimL0VkTFWmBibUsWG4C2/vEo6Fw3PbXh3l/VmrYijDCdQgzPw4AqacAstaAMBCc/wCm+OcV6cd+dj0Vpyiplj+APn8wcj/pEM</latexit>
2
Flow-Guided
<latexit sha1_base64="PhyMOosCSgzABmmnrKKaU5q/v9g=">AAAB8nicbVBNS8NAEJ34WetX1aOXpUXwYkmKoMeCF48V7AeksWy2m3btZjfsboQS8jO8eFDEq7/Gm//GbZuDtj4YeLw3w8y8MOFMG9f9dtbWNza3tks75d29/YPDytFxR8tUEdomkkvVC7GmnAnaNsxw2ksUxXHIaTec3Mz87hNVmklxb6YJDWI8EixiBBsr+dFD9pgPMnbRyAeVmlt350CrxCtIDQq0BpWv/lCSNKbCEI619j03MUGGlWGE07zcTzVNMJngEfUtFTimOsjmJ+fozCpDFEllSxg0V39PZDjWehqHtjPGZqyXvZn4n+enJroOMiaS1FBBFouilCMj0ex/NGSKEsOnlmCimL0VkTFWmBibUtmG4C2/vEo6jbrn1r27y1qzWsRRglOowjl4cAVNuIUWtIGAhGd4hTfHOC/Ou/OxaF1zipkT+APn8wcni5EN</latexit>
<latexit sha1_base64="nu90nFi8V6ibIGHIAugHUrT4b7o=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LC2Cp5KIoMeCF48V7Qe0oWy2k3TpZhN2N0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAqujet+O6WNza3tnfJuZW//4PCoenzS0UmmGLZZIhLVC6hGwSW2DTcCe6lCGgcCu8Hkdu53n1BpnshHM03Rj2kkecgZNVZ6iIZ8WK27DXcBsk68gtShQGtY/RqMEpbFKA0TVOu+56bGz6kynAmcVQaZxpSyCY2wb6mkMWo/X5w6I+dWGZEwUbakIQv190ROY62ncWA7Y2rGetWbi/95/cyEN37OZZoZlGy5KMwEMQmZ/01GXCEzYmoJZYrbWwkbU0WZselUbAje6svrpHPZ8NyGd39Vb9aKOMpwBjW4AA+uoQl30II2MIjgGV7hzRHOi/PufCxbS04xcwp/4Hz+AD0Kjac=</latexit>
gi Deformable
Alignment
fij 1
fˆij
<latexit sha1_base64="JJHM/fmlMoxSmcfkWUZivgkAAYo=">AAAB8HicbVBNSwMxEJ31s9avqkcvoUXwYtmIoMeCF48V7Ie0a8mm2TY2yS5JVihLf4UXD4p49ed489+YtnvQ1gcDj/dmmJkXJoIb6/vf3srq2vrGZmGruL2zu7dfOjhsmjjVlDVoLGLdDolhgivWsNwK1k40IzIUrBWOrqd+64lpw2N1Z8cJCyQZKB5xSqyT7qOHx17Gz/CkV6r4VX8GtExwTiqQo94rfXX7MU0lU5YKYkwH+4kNMqItp4JNit3UsITQERmwjqOKSGaCbHbwBJ04pY+iWLtSFs3U3xMZkcaMZeg6JbFDs+hNxf+8TmqjqyDjKkktU3S+KEoFsjGafo/6XDNqxdgRQjV3tyI6JJpQ6zIquhDw4svLpHlexX4V315UauU8jgIcQxlOAcMl1OAG6tAAChKe4RXePO29eO/ex7x1xctnjuAPvM8fW2aQAA==</latexit>
<latexit sha1_base64="UTLMbkardjZz4SR4t16SPk4ukzg=">AAAB8nicbVBNS8NAEJ3Ur1q/qh69LC2CF0sigh4LXjxWsB+QxrLZbtq1m92wuxFKyM/w4kERr/4ab/4bt20O2vpg4PHeDDPzwoQzbVz32ymtrW9sbpW3Kzu7e/sH1cOjjpapIrRNJJeqF2JNORO0bZjhtJcoiuOQ0244uZn53SeqNJPi3kwTGsR4JFjECDZW8qNBxvKH7PHcywfVuttw50CrxCtIHQq0BtWv/lCSNKbCEI619j03MUGGlWGE07zSTzVNMJngEfUtFTimOsjmJ+fo1CpDFEllSxg0V39PZDjWehqHtjPGZqyXvZn4n+enJroOMiaS1FBBFouilCMj0ex/NGSKEsOnlmCimL0VkTFWmBibUsWG4C2/vEo6Fw3PbXh3l/VmrYijDCdQgzPw4AqacAstaAMBCc/wCm+OcV6cd+dj0Vpyiplj+APn8wcmBpEM</latexit>
fij 1 <latexit sha1_base64="EWBmEfdUSjpq3vxbfxzChQ0QnJY=">AAAB8nicbVBNS8NAEJ34WetX1aOXpUXwVBIR9Fjw4rGC/YA0ls12067dbMLuRCihP8OLB0W8+mu8+W/ctjlo64OBx3szzMwLUykMuu63s7a+sbm1Xdop7+7tHxxWjo7bJsk04y2WyER3Q2q4FIq3UKDk3VRzGoeSd8LxzczvPHFtRKLucZLyIKZDJSLBKFrJ740o5tG0Lx4e+5WaW3fnIKvEK0gNCjT7la/eIGFZzBUySY3xPTfFIKcaBZN8Wu5lhqeUjemQ+5YqGnMT5POTp+TMKgMSJdqWQjJXf0/kNDZmEoe2M6Y4MsveTPzP8zOMroNcqDRDrthiUZRJggmZ/U8GQnOGcmIJZVrYWwkbUU0Z2pTKNgRv+eVV0r6oe27du7usNapFHCU4hSqcgwdX0IBbaEILGCTwDK/w5qDz4rw7H4vWNaeYOYE/cD5/AIukkU8=</latexit>
<latexit sha1_base64="GxrgUZMJwL7NX9/9XGvrMWWoCUw=">AAAB8HicbVBNS8NAEJ3Ur1q/qh69LC2Cp5KIoMeCF48VTFtpY9lsN+3a3U3Y3Qgl5Fd48aCIV3+ON/+N2zYHbX0w8Hhvhpl5YcKZNq777ZTW1jc2t8rblZ3dvf2D6uFRW8epItQnMY9VN8Saciapb5jhtJsoikXIaSecXM/8zhNVmsXyzkwTGgg8kixiBBsr3UeDjOUP2WM+qNbdhjsHWiVeQepQoDWofvWHMUkFlYZwrHXPcxMTZFgZRjjNK/1U0wSTCR7RnqUSC6qDbH5wjk6tMkRRrGxJg+bq74kMC62nIrSdApuxXvZm4n9eLzXRVZAxmaSGSrJYFKUcmRjNvkdDpigxfGoJJorZWxEZY4WJsRlVbAje8surpH3e8NyGd3tRb9aKOMpwAjU4Aw8uoQk30AIfCAh4hld4c5Tz4rw7H4vWklPMHMMfOJ8/RjiQmg==</latexit>
fij
fij
…
<latexit sha1_base64="W3nA+5zpDHgtJ4x9TfX5AVE8k3Y=">AAAB7HicbVBNS8NAEJ34WetX1aOXpUXwVBIR9Fjw4rGCaQttLJvtpF272YTdjVBCf4MXD4p49Qd589+4bXPQ1gcDj/dmmJkXpoJr47rfztr6xubWdmmnvLu3f3BYOTpu6SRTDH2WiER1QqpRcIm+4UZgJ1VI41BgOxzfzPz2EyrNE3lvJikGMR1KHnFGjZX8qM8fHvuVmlt35yCrxCtIDQo0+5Wv3iBhWYzSMEG17npuaoKcKsOZwGm5l2lMKRvTIXYtlTRGHeTzY6fkzCoDEiXKljRkrv6eyGms9SQObWdMzUgvezPxP6+bmeg6yLlMM4OSLRZFmSAmIbPPyYArZEZMLKFMcXsrYSOqKDM2n7INwVt+eZW0LuqeW/fuLmuNahFHCU6hCufgwRU04Baa4AMDDs/wCm+OdF6cd+dj0brmFDMn8AfO5w+3P46C</latexit>
fij+1
fij
<latexit sha1_base64="kMQJrez4q00768lapNe/aOnvZkI=">AAAB8nicbVBNS8NAEJ3Ur1q/qh69LC2CIJREBD0WvHisYD8gjWWz3bRrN7thdyOUkJ/hxYMiXv013vw3btsctPXBwOO9GWbmhQln2rjut1NaW9/Y3CpvV3Z29/YPqodHHS1TRWibSC5VL8SaciZo2zDDaS9RFMchp91wcjPzu09UaSbFvZkmNIjxSLCIEWys5EeDjOUP2eO5lw+qdbfhzoFWiVeQOhRoDapf/aEkaUyFIRxr7XtuYoIMK8MIp3mln2qaYDLBI+pbKnBMdZDNT87RqVWGKJLKljBorv6eyHCs9TQObWeMzVgvezPxP89PTXQdZEwkqaGCLBZFKUdGotn/aMgUJYZPLcFEMXsrImOsMDE2pYoNwVt+eZV0Lhqe2/DuLuvNWhFHGU6gBmfgwRU04RZa0AYCEp7hFd4c47w4787HorXkFDPH8AfO5w8i+pEK</latexit>
<latexit sha1_base64="W3nA+5zpDHgtJ4x9TfX5AVE8k3Y=">AAAB7HicbVBNS8NAEJ34WetX1aOXpUXwVBIR9Fjw4rGCaQttLJvtpF272YTdjVBCf4MXD4p49Qd589+4bXPQ1gcDj/dmmJkXpoJr47rfztr6xubWdmmnvLu3f3BYOTpu6SRTDH2WiER1QqpRcIm+4UZgJ1VI41BgOxzfzPz2EyrNE3lvJikGMR1KHnFGjZX8qM8fHvuVmlt35yCrxCtIDQo0+5Wv3iBhWYzSMEG17npuaoKcKsOZwGm5l2lMKRvTIXYtlTRGHeTzY6fkzCoDEiXKljRkrv6eyGms9SQObWdMzUgvezPxP6+bmeg6yLlMM4OSLRZFmSAmIbPPyYArZEZMLKFMcXsrYSOqKDM2n7INwVt+eZW0LuqeW/fuLmuNahFHCU6hCufgwRU04Baa4AMDDs/wCm+OdF6cd+dj0brmFDMn8AfO5w+3P46C</latexit>
Residual Blocks
Pixel-Shuffle
"
<latexit sha1_base64="cvjpFk5pfdpVYDnpezH4JYajms4=">AAAB8HicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RTKIkS5idzCZD5rHMzCphyVd48aCIVz/Hm3/jbLIHTSxoKKq66e6KEs6M9f1vr7Syura+Ud6sbG3v7O5V9w/aRqWa0BZRXOn7CBvKmaQtyyyn94mmWEScdqLxde53Hqk2TMk7O0loKPBQspgRbJ300EsTrLV6qvSrNb/uz4CWSVCQGhRo9qtfvYEiqaDSEo6N6QZ+YsMMa8sIp9NKLzU0wWSMh7TrqMSCmjCbHTxFJ04ZoFhpV9Kimfp7IsPCmImIXKfAdmQWvVz8z+umNr4MMyaT1FJJ5ovilCOrUP49GjBNieUTRzDRzN2KyAhrTKzLKA8hWHx5mbTP6oFfD27Pa42rIo4yHMExnEIAF9CAG2hCCwgIeIZXePO09+K9ex/z1pJXzBzCH3ifP6XukEo=</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
"
<latexit sha1_base64="cvjpFk5pfdpVYDnpezH4JYajms4=">AAAB8HicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RTKIkS5idzCZD5rHMzCphyVd48aCIVz/Hm3/jbLIHTSxoKKq66e6KEs6M9f1vr7Syura+Ud6sbG3v7O5V9w/aRqWa0BZRXOn7CBvKmaQtyyyn94mmWEScdqLxde53Hqk2TMk7O0loKPBQspgRbJ300EsTrLV6qvSrNb/uz4CWSVCQGhRo9qtfvYEiqaDSEo6N6QZ+YsMMa8sIp9NKLzU0wWSMh7TrqMSCmjCbHTxFJ04ZoFhpV9Kimfp7IsPCmImIXKfAdmQWvVz8z+umNr4MMyaT1FJJ5ovilCOrUP49GjBNieUTRzDRzN2KyAhrTKzLKA8hWHx5mbTP6oFfD27Pa42rIo4yHMExnEIAF9CAG2hCCwgIeIZXePO09+K9ex/z1pJXzBzCH3ifP6XukEo=</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
"
<latexit sha1_base64="cvjpFk5pfdpVYDnpezH4JYajms4=">AAAB8HicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RTKIkS5idzCZD5rHMzCphyVd48aCIVz/Hm3/jbLIHTSxoKKq66e6KEs6M9f1vr7Syura+Ud6sbG3v7O5V9w/aRqWa0BZRXOn7CBvKmaQtyyyn94mmWEScdqLxde53Hqk2TMk7O0loKPBQspgRbJ300EsTrLV6qvSrNb/uz4CWSVCQGhRo9qtfvYEiqaDSEo6N6QZ+YsMMa8sIp9NKLzU0wWSMh7TrqMSCmjCbHTxFJ04ZoFhpV9Kimfp7IsPCmImIXKfAdmQWvVz8z+umNr4MMyaT1FJJ5ovilCOrUP49GjBNieUTRzDRzN2KyAhrTKzLKA8hWHx5mbTP6oFfD27Pa42rIo4yHMExnEIAF9CAG2hCCwgIeIZXePO09+K9ex/z1pJXzBzCH3ifP6XukEo=</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
"
<latexit sha1_base64="cvjpFk5pfdpVYDnpezH4JYajms4=">AAAB8HicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RTKIkS5idzCZD5rHMzCphyVd48aCIVz/Hm3/jbLIHTSxoKKq66e6KEs6M9f1vr7Syura+Ud6sbG3v7O5V9w/aRqWa0BZRXOn7CBvKmaQtyyyn94mmWEScdqLxde53Hqk2TMk7O0loKPBQspgRbJ300EsTrLV6qvSrNb/uz4CWSVCQGhRo9qtfvYEiqaDSEo6N6QZ+YsMMa8sIp9NKLzU0wWSMh7TrqMSCmjCbHTxFJ04ZoFhpV9Kimfp7IsPCmImIXKfAdmQWvVz8z+umNr4MMyaT1FJJ5ovilCOrUP49GjBNieUTRzDRzN2KyAhrTKzLKA8hWHx5mbTP6oFfD27Pa42rIo4yHMExnEIAF9CAG2hCCwgIeIZXePO09+K9ex/z1pJXzBzCH3ifP6XukEo=</latexit>
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
Grid Propagation
Second-Order Propagation
"
<latexit sha1_base64="cvjpFk5pfdpVYDnpezH4JYajms4=">AAAB8HicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RTKIkS5idzCZD5rHMzCphyVd48aCIVz/Hm3/jbLIHTSxoKKq66e6KEs6M9f1vr7Syura+Ud6sbG3v7O5V9w/aRqWa0BZRXOn7CBvKmaQtyyyn94mmWEScdqLxde53Hqk2TMk7O0loKPBQspgRbJ300EsTrLV6qvSrNb/uz4CWSVCQGhRo9qtfvYEiqaDSEo6N6QZ+YsMMa8sIp9NKLzU0wWSMh7TrqMSCmjCbHTxFJ04ZoFhpV9Kimfp7IsPCmImIXKfAdmQWvVz8z+umNr4MMyaT1FJJ5ovilCOrUP49GjBNieUTRzDRzN2KyAhrTKzLKA8hWHx5mbTP6oFfD27Pa42rIo4yHMExnEIAF9CAG2hCCwgIeIZXePO09+K9ex/z1pJXzBzCH3ifP6XukEo=</latexit>
Bilinear Upsampling
<latexit sha1_base64="N4ztu7rA6GcWPA6nADZ8tMOcvqE=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOZpMxszPLTK8QQv7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqWw6PvfXmFtfWNzq7hd2tnd2z8oHx41rc4M4w2mpTbtiFouheINFCh5OzWcJpHkrWh0O/NbT9xYodUDjlMeJnSgRCwYRSc1uzqVme2VK37Vn4OskiAnFchR75W/un3NsoQrZJJa2wn8FMMJNSiY5NNSN7M8pWxEB7zjqKIJt+Fkfu2UnDmlT2JtXCkkc/X3xIQm1o6TyHUmFId22ZuJ/3mdDOPrcCJUmiFXbLEoziRBTWavk74wnKEcO0KZEe5WwobUUIYuoJILIVh+eZU0L6qBXw3uLyu1mzyOIpzAKZxDAFdQgzuoQwMYPMIzvMKbp70X7937WLQWvHzmGP7A+/wBz/6PRQ==</latexit>
Elementwise Addition
C Channel-wise Concatenation
Figure 2: An Overview of BasicVSR++. BasicVSR++ consists of two modifications to improve propagation and alignment. For propaga-
tion, we introduce second-order propagation (blue solid lines) to refine features bidirectionally. In addition, second-order connection (red
dotted lines) is adopted to improve the robustness of propagation. Within each propagation branch, flow-guided deformable alignment is
proposed to increase the offset diversity while overcoming the offset overflow problem.
plied to extract features from each frame. The features are To further enhance the robustness of propagation, we re-
then propagated under our second-order grid propagation lax the assumption of first-order Markov property in Ba-
scheme, where alignment is performed by our flow-guided sicVSR and adopt a second-order connection, realizing a
deformable alignment. After propagation, the aggregated second-order Markov chain. With this relaxation, informa-
features are used to generate the output image through con- tion can be aggregated from different spatiotemporal loca-
volution and pixel-shuffling. tions, improving robustness and effectiveness in occluded
and fine regions.
3.1. Second-Order Grid Propagation
Integrating the above two components, we devise our
Most existing methods adopt unidirectional propaga- second-order grid propagation as follows. Let xi be the in-
tion [12, 14, 27]. Several works [2, 10, 11] adopt bidi- put image, gi be the feature extracted from xi by multiple
rectional propagation for exploiting the information avail- residual blocks, and fij be the feature computed at the i-
able in the video sequence. In particular, IconVSR [2] con- th timestep in the j-th propagation branch. In this section,
sists of a coupled propagation scheme with sequentially- we describe the procedure for forward propagation, and the
connected branches to facilitate information exchange. procedure for backward propagation is defined similarly.
Motivated by the effectiveness of the bidirectional prop- To compute the feature fij , we first align fi−1
j j
and fi−2
agation, we devise a grid propagation scheme to enable re- (following the second-order Markov chain) using our pro-
peated refinement through propagation. More specifically, posed flow-guided deformable alignment, which will be
the intermediate features are propagated backward and for- discussed in the next section:
ward in time in an alternating manner. Through propaga-
tion, the information from different frames can be “revis-
fˆij = A gi , fi−1
j j
, fi−2 , si→i−1 , si→i−2 , (1)
ited” and adopted for feature refinement. Compared to ex-
isting works that propagate features only once, grid prop-
agation repeatedly extracts information from the entire se- where si→i−1 , si→i−2 denote the optical flows from i-th
quence, improving feature expressiveness. frame to the (i−1)-th and (i−2)-th frames, respectively,
fi 1 si!i 1 gi frame, we first warp fi−1 with si→i−1 :
warping
<latexit sha1_base64="nu90nFi8V6ibIGHIAugHUrT4b7o=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LC2Cp5KIoMeCF48V7Qe0oWy2k3TpZhN2N0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAqujet+O6WNza3tnfJuZW//4PCoenzS0UmmGLZZIhLVC6hGwSW2DTcCe6lCGgcCu8Hkdu53n1BpnshHM03Rj2kkecgZNVZ6iIZ8WK27DXcBsk68gtShQGtY/RqMEpbFKA0TVOu+56bGz6kynAmcVQaZxpSyCY2wb6mkMWo/X5w6I+dWGZEwUbakIQv190ROY62ncWA7Y2rGetWbi/95/cyEN37OZZoZlGy5KMwEMQmZ/01GXCEzYmoJZYrbWwkbU0WZselUbAje6svrpHPZ8NyGd39Vb9aKOMpwBjW4AA+uoQl30II2MIjgGV7hzRHOi/PufCxbS04xcwp/4Hz+AD0Kjac=</latexit>
<latexit sha1_base64="0UDxlsZGvYv5vbHWTTmF5zkzsVg=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LC2CF0sigh4LXjxWsB/QhrLZTtqlm03Y3Qgl5Ed48aCIV3+PN/+N2zYHbX0w8Hhvhpl5QSK4Nq777ZQ2Nre2d8q7lb39g8Oj6vFJR8epYthmsYhVL6AaBZfYNtwI7CUKaRQI7AbTu7nffUKleSwfzSxBP6JjyUPOqLFSNxxm/NLLh9W623AXIOvEK0gdCrSG1a/BKGZphNIwQbXue25i/Iwqw5nAvDJINSaUTekY+5ZKGqH2s8W5OTm3yoiEsbIlDVmovycyGmk9iwLbGVEz0aveXPzP66cmvPUzLpPUoGTLRWEqiInJ/Hcy4gqZETNLKFPc3krYhCrKjE2oYkPwVl9eJ52rhuc2vIfrerNWxFGGM6jBBXhwA024hxa0gcEUnuEV3pzEeXHenY9la8kpZk7hD5zPH9xsjyQ=</latexit>
<latexit sha1_base64="ADHMVwR/ItooMHdI5kXYgI1Jrx4=">AAAB/XicbVDLSgNBEOyNrxhf6+PmZUgQvBh2RdBjwIvHCOYBybLMTibJkNmZZWZWiUvwV7x4UMSr/+HNv3GS7EETCxqKqm66u6KEM20879sprKyurW8UN0tb2zu7e+7+QVPLVBHaIJJL1Y6wppwJ2jDMcNpOFMVxxGkrGl1P/dY9VZpJcWfGCQ1iPBCszwg2VgrdIx1mrKvYYGiwUvIBsTN/EroVr+rNgJaJn5MK5KiH7le3J0kaU2EIx1p3fC8xQYaVYYTTSambappgMsID2rFU4JjqIJtdP0EnVumhvlS2hEEz9fdEhmOtx3FkO2NshnrRm4r/eZ3U9K+CjIkkNVSQ+aJ+ypGRaBoF6jFFieFjSzBRzN6KyBArTIwNrGRD8BdfXibN86rvVf3bi0qtnMdRhGMowyn4cAk1uIE6NIDAIzzDK7w5T86L8+58zFsLTj5zCH/gfP4Aaa2VEg==</latexit>
previous optical LR
feature flow feature
f¯i−1 = W(fi−1 , si→i−1 ), (3)
where W denotes the spatial warping operation. The pre-
warped aligned features are then used to compute the DCN offsets
f¯i C
<latexit sha1_base64="DEfcxByx/STKO8B0obxT3mNnnhg=">AAAB8HicbVDLSgNBEOyNrxhfUY9ehgTBU9gVQY8BLx4jmIckIcxOZpMhM7PLTK8QlnyFFw+KePVzvPk3TpI9aGJBQ1HVTXdXmEhh0fe/vcLG5tb2TnG3tLd/cHhUPj5p2Tg1jDdZLGPTCanlUmjeRIGSdxLDqQolb4eT27nffuLGilg/4DThfUVHWkSCUXTSYy+kJotmAzEoV/2avwBZJ0FOqpCjMSh/9YYxSxXXyCS1thv4CfYzalAwyWelXmp5QtmEjnjXUU0Vt/1scfCMnDtlSKLYuNJIFurviYwqa6cqdJ2K4tiuenPxP6+bYnTTz4ROUuSaLRdFqSQYk/n3ZCgMZyinjlBmhLuVsDE1lKHLqORCCFZfXiety1rg14L7q2q9ksdRhDOowAUEcA11uIMGNIGBgmd4hTfPeC/eu/exbC14+cwp/IH3+QP+TJBr</latexit>
BI degradation BD degradation
Params (M) Runtime (ms) REDS4 [23] Vimeo-90K-T [38] Vid4 [21] UDM10 [40] Vimeo-90K-T [38] Vid4 [21]
Bicubic - - 26.14/0.7292 31.32/0.8684 23.78/0.6347 28.47/0.8253 31.30/0.8687 21.80/0.5246
VESPCN [1] - - - - 25.35/0.7557 - - -
SPMC [31] - - - - 25.88/0.7752 - - -
TOFlow [38] - - 27.98/0.7990 33.08/0.9054 25.89/0.7651 36.26/0.9438 34.62/0.9212 -
FRVSR [27] 5.1 137 - - - 37.09/0.9522 35.64/0.9319 26.69/0.8103
DUF [15] 5.8 974 28.63/0.8251 - - 38.48/0.9605 36.87/0.9447 27.38/0.8329
RBPN [9] 12.2 1507 30.09/0.8590 37.07/0.9435 27.12/0.8180 38.66/0.9596 37.20/0.9458 -
EDVR-M [35] 3.3 118 30.53/0.8699 37.09/0.9446 27.10/0.8186 39.40/0.9663 37.33/0.9484 27.45/0.8406
EDVR [35] 20.6 378 31.09/0.8800 37.61/0.9489 27.35/0.8264 39.89/0.9686 37.81/0.9523 27.85/0.8503
PFNL [40] 3.0 295 29.63/0.8502 36.14/0.9363 26.73/0.8029 38.74/0.9627 - 27.16/0.8355
MuCAN [19] - - 30.88/0.8750 37.32/0.9465 - - - -
TGA [13] 5.8 - - - - - 37.59/0.9516 27.63/0.8423
RLSP [8] 4.2 49 - - - 38.48/0.9606 36.49/0.9403 27.48/0.8388
RSDN [12] 6.2 94 - - - 39.35/0.9653 37.23/0.9471 27.92/0.8505
RRN [14] 3.4 45 - - - 38.96/0.9644 - 27.69/0.8488
BasicVSR [2] 6.3 63 31.42/0.8909 37.18/0.9450 27.24/0.8251 39.96/0.9694 37.53/0.9498 27.96/0.8553
IconVSR [2] 8.7 70 31.67/0.8948 37.47/0.9476 27.39/0.8279 40.03/0.9694 37.84/0.9524 28.04/0.8570
BasicVSR++ 7.3 77 32.39/0.9069 37.79/0.9500 27.79/0.8400 40.72/0.9722 38.21/0.9550 29.04/0.8753
Table 2: Performance of a lighter BasicVSR++. Our lighter sidered inclusively in our method. The number of residual
model, BasicVSR++ (S), has a similar complexity to BasicVSR blocks for each branch is set to 7. The number of feature
and IconVSR, but still shows considerable improvements. The channels is 64. Detailed experimental settings and model
PSNR and runtime are computed on REDS4.
architectures are provided in the supplementary material.
BasicVSR [2] IconVSR [2] BasicVSR++ (S)
4.1. Comparisons with State-of-the-Art Methods
Params (M) 6.3 8.7 6.4
Runtime (ms) 63 70 69 We conduct comprehensive experiments by comparing
PSNR (dB) 31.42 31.67 32.24 with 16 models, as listed in Table 1. The quantitative results
are summarized in Table 1 and the speed and performance
4. Experiments comparison is provided in Fig. 1(c). Note that the parame-
ters reported above are inclusive of that in the optical flow
Two widely-used datasets are adopted for training: network (if any). So the comparison is fair.
REDS [23] and Vimeo-90K [38]. For REDS, following Ba- As shown in Table 1, BasicVSR++ achieves state-of-the-
sicVSR [2], we use REDS43 as our test set and REDSval44 art performance on all datasets for both degradations. In
as our validation set. The remaining clips are used for particular, BasicVSR++ outperforms EDVR [35], a large-
training. We use Vid4 [21], UDM10 [40], and Vimeo- capacity sliding-window method, by up to 1.3 dB in PSNR,
90K-T [38] as test sets along with Vimeo-90K. All models while having 65% fewer parameters. When compared to
are tested with 4× downsampling using two degradations – the previous state of the art, IconVSR [2], BasicVSR++
Bicubic (BI) and Blur Downsampling (BD). possesses fewer parameters but has improvements of up to
We adopt Adam optimizer [17] and Cosine Annealing 1 dB. As shown in Table 2, even if we train a lighter version
scheme [22]. The initial learning rate of the main network of BasicVSR++ (denoted as BasicVSR++ (S)) with com-
and the flow network are set to 1×10−4 and 2.5×10−5 , re- parable network parameters and runtime to BasicVSR and
spectively. The total number of iterations is 600K, and the IconVSR, our model still shows an improvement of 0.82 dB
weights of the flow network are fixed during the first 5,000 over BasicVSR and 0.57 dB over IconVSR. Such gains are
iterations. The batch size is 8 and the patch size of input LR considered significant in VSR.
frames is 64×64. We use Charbonnier loss [4] since it bet- Some qualitative comparisons are shown in Fig. 11 to
ter handles outliers and improves the performance over the Fig. 14. BasicVSR++ successfully restores the fine details.
conventional `2 -loss [18]. We use pre-trained SPyNet [26] In particular, BasicVSR++ is the only method that restores
as our flow network. Its parameters and runtime are con- the wheel’s spokes in Fig. 11, the stairs in Fig. 13, and the
3 Clips 000, 011, 015, 020 of REDS training set. building structure in Fig. 14. More examples are provided
4 Clips 000, 001, 006, 017 of REDS validation set. in the supplementary material.
Bicubic RBPN EDVR-M EDVR
24.87 dB 29.82 dB 28.32 dB 28.64 dB
(a) Optical flow (b) DCN offsets #1 (c) DCN offsets #2 (d) DCN offsets #3
(e) Reference image (f) Neighboring image (g) Aligned by optical flow (h) Aligned by flow-guided
./.deformable alignment
Figure 8: Analysis of flow-guided deformable alignment. (a-d) The DCN offsets are highly similar to optical flow, but still with
noticeable differences. (e-f) The reference and neighboring images. (g) The feature aligned by optical flow experiences blurry edges. (h)
The feature aligned by our proposed module is sharper and preserves more details, as indicated by the red arrows.
Table 3: Ablation studies of the components. Each component be transmitted via a robust and effective propagation. This
brings significant improvements in PSNR, verifying their effec- complementary information essentially assists the restora-
tiveness. tion of the fine details. As shown in the examples, the net-
(A) (B) (C) BasicVSR++ work successfully restores the details with our components,
Flow-Guided Deform. Align. 3 3 3 whereas the counterparts without our components produce
Second-Order Propagation 3 3 blurry outputs.
Grid Propagation 3
Flow-Guided Deformable Alignment. In Fig. 8(a-d), we
PSNR (dB) 31.48 31.94 32.08 32.39
compare the offsets with the optical flow computed by the
flow estimation module in BasicVSR++. By learning only
keep both the orders and iterations to two. the residue to optical flow, the network produces offsets that
Second-Order Grid Propagation. We further provide are highly similar to the optical flow, but with observable
some qualitative comparisons to understand the contribu- differences. When compared to the baseline which aggre-
tions of the proposed propagation scheme. As shown in the gates information from only one spatial location indicated
two examples of Fig. 7, the contribution of both the second- by the motion (optical flow), our proposed module allows
order propagation and grid propagation is more noticeable retrieving information from multiple locations around, pro-
in regions that contain fine details and complex textures. viding additional flexibility.
In those regions, there is limited information from the cur- This flexibility leads to features with better quality, as
rent frame that can be employed for reconstruction. To im- shown in Fig. 8(g-h). When the warping is performed by
prove the output quality of those regions, effective informa- using optical flow, the aligned features contain blurry edges,
tion aggregation from other video frames is necessary. With owing to the interpolation operation in spatial warping. In
our second-order propagation scheme, the information can contrast, by gathering more information from the neighbors,
Table 4: Comparison of alignment modules. Using optical flow
to guide deformable alignment successfully stabilizes training.
BasicVSR++ directly incorporates optical flow into the network,
outperforming the offset-fidelity loss [3].
y Compressed BasicVSR++ GT