论文阅读 | Video Frame Interpolation via Adaptive Separable Convolution

btee

已于 2022-12-22 16:12:29 修改

阅读量349

点赞数

CC 4.0 BY-SA版权

文章标签：论文阅读人工智能

于 2022-12-22 16:12:07 首次发布

本文链接：https://blog.csdn.net/bettii/article/details/128409813

前言：ICCV2017的一篇基于核做视频插帧的文章，adaConv改进版
论文地址：【here】

Video Frame Interpolation via Adaptive Separable Convolution

引言

基于核的方法比基于光流的方法能更好的应对遮挡、模糊、亮度变化等情况，但是基于核的方法是每个像素点得到一个核，核的大小又必须很大因为要处理大的位移变化，因此所需要的内存空间会变得非常大
具体如文章所述

. The convolution kernels jointly account for the two separate steps of motion estimation and re-sampling involved in traditional frame interpolation methods. In order to handle large motion, large kernels are required. For example, Niklaus et al. employ a neural network to output two 41×41 kernels for each output pixel. To generate the kernels for all pixels in a 1080p
video frame, the output kernels alone will require 26 GB of memory. The memory demand increases quadratically with the kernel size and thus limits the maximal motion to be handled.

网络架构

本文的主要思想，将一个2维的核拆成2个一维的核，最后可由两个1维核的乘积得到2维核
即对于前后两帧，则分别需要得到一个2维核，拆开则需要4个一维核
在这里插入图片描述
具体文章部分

Our method addresses this problem by estimating a pair of 1D kernels that approximate a 2D kernel. That is, we estimate hk1,v, k1,hi and hk2,v, k2,hi to approximate K1 as k1,v ∗ k1,h and K2 as k2,v ∗ k2,h. Thus, our method reduces the number of kernel parameters from n2 to 2n for each kernel. This enables the synthesis of a high-resolution video frame in one pass and the incorporation of perceptual loss to further improve the visual quality of the interpolation results, as detailed in the following subsections.