0% found this document useful (0 votes)
79 views11 pages

Deep Affine Motion Compensation Network For Inter Prediction in VVC

Uploaded by

Mateus Araújo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views11 pages

Deep Affine Motion Compensation Network For Inter Prediction in VVC

Uploaded by

Mateus Araújo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 32, NO.

6, JUNE 2022 3923

Deep Affine Motion Compensation Network for


Inter Prediction in VVC
Dengchao Jin , Jianjun Lei , Senior Member, IEEE, Bo Peng , Member, IEEE,
Wanqing Li , Senior Member, IEEE, Nam Ling , Life Fellow, IEEE,
and Qingming Huang , Fellow, IEEE

Abstract— In video coding, it is a challenge to deal with scenes Group (VCEG) and ISO/IEC Moving Picture Experts
with complex motions, such as rotation and zooming. Although Group (MPEG) have developed a series of video compression
affine motion compensation (AMC) is employed in Versatile Video coding standards [1], [2]. Different from image coding, video
Coding (VVC), it is still difficult to handle non-translational
motions due to the adopted hand-craft block-based motion coding generally focuses on removing temporal redundancy by
compensation. In this paper, we propose a deep affine motion inter prediction with motion compensation to effectively boost
compensation network (DAMC-Net) for inter prediction in video coding performance. In the process of motion compensation,
coding to effectively improve the prediction accuracy. To the best pixels of each block are first predicted with the highest similar
of our knowledge, our work is the first attempt to deal with the block in reference frames, and then the residual between
deformable motion compensation based on CNN in VVC. Specif-
ically, a deformable motion-compensated prediction (DMCP) predicted pixels and real pixels is encoded into bitstream.
module is proposed to compensate the current encoding block Therefore, how to improve the prediction accuracy of motion
through a learnable way to estimate accurate motion fields. compensation is highly critical for boosting compression
Meanwhile, the spatial neighboring information and the temporal efficiency.
reference block as well as the initial motion field are fully In the latest Versatile Video Coding (VVC), the exist-
exploited. By effectively fusing the multi-channel feature maps
from DMCP, an attention-based fusion and reconstruction (AFR) ing translational motion compensation (TMC) and advanced
module is designed to reconstruct the output block. The proposed affine motion compensation (AMC) [3] are jointly exploited
DAMC-Net is integrated into VVC and the experimental results for eliminating temporal redundancy. TMC predicts pixels
demonstrate that the proposed method considerably enhances the on the assumption that movement between video frames is
coding performance. translational. Therefore, non-translational motions in natural
Index Terms— Video coding, VVC, affine motion compensa- videos will result in a large residual in TMC. To address
tion, deep neural network, deformable motion compensation. this issue, AMC is integrated into VVC to improve the
ability to deal with complex motions. Although AMC has
I. I NTRODUCTION significantly improved coding performance, there still exist
several limitations. First, the subblock-wised motion field is
W ITH the prevalence
ultra-high-definition
for high efficiency video
of high-definition (HD) and
(UHD) videos, the demand
compression techniques has
derived from fixed points by hand-craft algorithms, thus result-
ing in blocking artifacts between sub-blocks, and inaccurate
increased dramatically. The ITU-T Video Coding Experts prediction in some high-order motions, such as bilinear and
perspective motions. Second, existing AMC algorithms pay
Manuscript received April 30, 2021; revised July 27, 2021; accepted more attention to the correlation in temporal domain, while
August 10, 2021. Date of publication August 24, 2021; date of cur- the spatial neighboring information in the current frame is not
rent version June 6, 2022. The work of Jianjun Lei and Bo Peng was
supported in part by the National Key R&D Program of China under effectively utilized.
Grant 2018YFE0203900, in part by the National Natural Science Foundation In the past years, deep learning-based methods have achie-
of China under Grant 61931014 and Grant 61722112, and in part by the ved promising results in several image and video processing
Natural Science Foundation of Tianjin under Grant 18JCJQJC45800. The
work of Qingming Huang was supported in part by the National Natural tasks, such as classification, super-resolution, and atten-
Science Foundation of China under Grant 61620106009 and Grant U1636214. tion prediction [4]–[8]. Inspired by the success of deep
This article was recommended by Associate Editor G. Correa. (Corresponding learning, recent researches have been devoted to devel-
author: Jianjun Lei.)
Dengchao Jin, Jianjun Lei, and Bo Peng are with the School of Electrical oping learning-based tools for traditional video coding
and Information Engineering, Tianjin University, Tianjin 300072, China schemes [9]–[25] and learning-based end-to-end compression
(e-mail: [email protected]; [email protected]; [email protected]). schemes [26]–[31]. Specifically, several studies [9]–[12] have
Wanqing Li is with the Advanced Multimedia Research Laboratory, Univer-
sity of Wollongong, Wollongong, NSW 2522, Australia (e-mail: wanqing@ attempted to substitute or enhance TMC with convolutional
uow.edu.au). neural networks (CNNs) to improve the coding performance.
Nam Ling is with the Department of Computer Science and Engineering, However, there is no report yet on CNN-based AMC to
Santa Clara University, Santa Clara, CA 95053 USA (e-mail: [email protected]).
Qingming Huang is with the School of Computer and Control Engineering, effectively deal with complex motions.
University of Chinese Academy of Sciences, Beijing 100190, China (e-mail: This paper proposes a deep affine motion compensation
[email protected]). network (DAMC-Net) to boost the performance of AMC.
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TCSVT.2021.3107135. The main idea of the proposed method is to compensate the
Digital Object Identifier 10.1109/TCSVT.2021.3107135 current encoding block by estimating accurate motion fields
1051-8215 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Universidade Federal dos Pampas (UNIPAMPA). Downloaded on November 08,2022 at 17:27:04 UTC from IEEE Xplore. Restrictions apply.
3924 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 32, NO. 6, JUNE 2022

with learning-based methods. In addition, the multi-domain


information from the current reconstructed frame and the
temporal reference frame as well as the initial motion field
are fully exploited to provide informative patches. Different
from existing AMC in VVC, motion fields in DAMC-Net are
learnable. Thus, the proposed method is capable of dealing
with most complex motions in natural videos. The major
contributions of this paper are summarized as follows.
• Aiming to boost the performance of inter prediction
in video coding, a DAMC-Net is proposed to improve
the prediction accuracy of AMC. To the best of our
knowledge, the proposed DAMC-Net is the first attempt Fig. 1. Difference between (a) AMC and (b) TMC.
to perform deformable prediction task based on CNNs
in VVC. in VVC. For the affine inter-mode, CPMVs are determined
• A deformable motion-compensated prediction (DMCP) in the encoder and signalled explicitly to the decoder. For
module is designed to compensate the current encod- the affine merge-mode, CPMVs are derived implicitly from
ing block by estimating accurate motion fields with neighboring blocks.
multi-domain information as references. The above traditional affine motion compensation generally
• The proposed DAMC-Net is integrated into VVC and predicts the current block by a parameterized affine model,
experimental results in both affine inter-mode and affine which is hard to effectively deal with irregular motions. In this
merge-mode have demonstrated that the proposed method paper, the proposed DAMC-Net is a pixel-wise motion model,
significantly increases the selection rate of affine modes which is much more flexible than a 4-parameter affine model
and subsequently achieves considerable coding perfor- and a 6-parameter affine model, hence, its robustness for
mance improvement. scenes with complex motions.
The rest of this paper is organized as follows. Section II
reviews the related work. Section III introduces the proposed B. Deep Learning for Inter Prediction
method in detail. Experimental results are shown in Section IV. Inspired by the success of deep learning, many researchers
Finally, Section V concludes the paper. have employed CNNs to improve the performance of video
coding, and achieved superior coding efficiency in filtering,
II. R ELATED W ORK intra prediction, and inter prediction.
A. Affine Motion Compensation in VVC For the learning-based tools for inter prediction of tra-
For the current encoding block, motion compensation gen- ditional coding schemes, several works have been pro-
erally obtains the most similar block among the reference posed to refine or substitute modules of uni-directional and
frames as its prediction signal. By integrating TMC into bi-directional prediction in HEVC. In [9] and [12], CNN-based
video coding standards, such as High Efficiency Video Coding methods were proposed to refine the uni-directional predic-
(HEVC), the relatively accurate motion fields of most blocks tion of TMC in HEVC. To improve coding performance
are estimated. However, it is difficult to further improve of bi-directional prediction, Zhao et al. [10] proposed a
the coding performance of TMC due to its limited ability fully-convolutional neural network to learn a mapping between
to model complex motions. With an increasing demand for bi-directional compensated blocks and final prediction sig-
video compression efficiency, more sophisticated models are nals. Mao and Yu [11] took spatial pixels and temporal
needed to handle complex motions. To this end, Joint Video display orders as additional information to improve the accu-
Exploration Team (JVET) integrated both 4-parameter [3] and racy of bi-directional prediction. Yan et al. [13] proposed
6-parameter affine motion models of AMC into VVC as an a fractional-pixel reference generation network FRCNN to
inter-prediction tool [32], [33]. perform fractional-pixel motion compensation. Inspired by the
The difference between AMC and TMC is shown in Fig. 1. works on video prediction, there were also several works
For each 4 × 4 sub-block in the current block, AMC utilizes focusing on directly extrapolating or interpolating the pre-
multiple Control Points Motion Vectors (CPMVs) to derive diction of the current frame as an additional reference.
specific Motion Vector (MV), while TMC utilizes a common For instance, Lin et al. [14] proposed a video coding ori-
MV of the current block to represent the MVs for all sub- ented Laplacian pyramid of generative adversarial networks
blocks. Therefore, AMC has the ability to characterize com- (VC-LAPGAN) to predict the current frame. Huo et al. [15]
plex motions more effectively than TMC. In VVC, there are combined block-based motion-compensated prediction and
two kinds of affine motion models for AMC, i.e., 4-parameter frame extrapolation to generate an additional reference frame,
affine model and 6-parameter affine model. Specifically, fewer which achieves considerable performance improvement in both
bits are required to signal the CPMVs of a 4-parameter affine HEVC and VVC. Zhao et al. [16] proposed a novel network
model, and more complex motions can be represented by to generate a high-quality reference and devised a CTU level
a 6-parameter affine model. Besides, affine inter-mode and coding tool to achieve a trade-off between performance and
affine merge-mode are applied as two inter prediction modes complexity. Xia et al. [17] proposed a multi-scale network to

Authorized licensed use limited to: Universidade Federal dos Pampas (UNIPAMPA). Downloaded on November 08,2022 at 17:27:04 UTC from IEEE Xplore. Restrictions apply.
JIN et al.: DAMC-Net FOR INTER PREDICTION IN VVC 3925

Fig. 2. Overall architecture of the proposed method.

generate an additional reference frame from coarse to fine. field between frames is estimated to compensate the current
Liu et al. [18] proposed a multi-scale quality attentive fac- block for alleviating temporal misalignment.
torized kernel convolutional neural network (MQ-FKCNN) to Fig. 2 shows the overall architecture of the proposed
synthesize an additional reference frame. Choi and Bajic [19] DAMC-Net. As shown in the figure, the multi-domain infor-
utilized both decoded frames and temporal indices to generate mation is fully leveraged in the proposed DAMC-Net. In order
a reference frame for video coding scheme. They also pro- to improve the prediction accuracy of AMC, the spatial
posed an affine transformation-based scheme [20], in which neighboring pixels of the current block as well as its prediction
the spatially-varying filters and affine parameters are computed of AMC are combined as the first input (IC ) to explore spatial
to generate the warped samples for synthesizing the reference correlations. Besides, to obtain as accurate as possible source
frame. pixels in the temporal reference frame of the current block,
However, to the best of our knowledge, no work has the most similar block together with neighboring pixels in
been reported on learning-based AMC. Moreover, the existing the reference frame is constructed based on CPMVs and used
learning-based tools for TMC pay little attention to estimating as the second input (I R ). More importantly, since the initial
the complex motion field for compensating the current block, motion field (I M F ) constructed by CPMVs contains motion
while estimating the accurate motion field is essential for information, I M F is utilized as the third input. Taking the
motion-compensated prediction. IC , I M F , and I R as the inputs, the proposed DAMC-Net is
optimized with respect to jointly utilizing spatial neighboring
III. T HE P ROPOSED M ETHOD and temporal correlative information to improve the prediction
In this section, the proposed method is presented in detail. accuracy.
First, the architecture of the proposed method is systematically In the DMCP module, features FC , FM F , and FR are
introduced. Second, the deformable motion-compensated pre- first extracted from IC , I M F , and I R respectively. Taking
diction module which plays an important role in the proposed these features as inputs, the motion estimation unit (MEU)
network is illustrated. Third, the attention-based fusion and is designed to estimate motion fields. Based on estimated
reconstruction module is illustrated. Finally, the details of motion fields, deformable convolution is used to compensate
integrating DAMC-net into VVC are described. FC and FR . Features of compensated output FTCar and FTRar
are concatenated with FC as well as FR to construct the
A. Architecture of DAMC-Net aggregated feature FAgg . Finally, the output block, O D AMC ,
Traditional AMC compensates the current block merely is reconstructed from FAgg by an AFR module.
by a parameterized affine model, which has limited capa-
bility to model complex motions. To solve this problem,
a learning-based model, DAMC-Net, is designed to com- B. Deformable Motion-Compensated Prediction (DMCP)
pensate the current block by explicitly estimating pixel-wise Due to the limitation of deriving the subblock-wised motion
motion fields rather than implicitly deriving the subblock-wise field, the prediction of AMC suffers from misalignment at
motion field. Specifically, a spatial pixel-wise motion field pixel-level. In order to improve the granularity of AMC,
is estimated to refine the prediction block for alleviating the the DMCP module is designed to estimate pixel-wise motion
spatial blocking artifacts, and a temporal pixel-wise motion fields for compensating the current block.

Authorized licensed use limited to: Universidade Federal dos Pampas (UNIPAMPA). Downloaded on November 08,2022 at 17:27:04 UTC from IEEE Xplore. Restrictions apply.
3926 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 32, NO. 6, JUNE 2022

In DMCP, to extract deep features with abundant infor- an up-sampling layer are employed to increase the receptive
mation, features FC , FM F , and FR are extracted from IC , field and preserve the low frequency information, with two
I M F , and I R by multi-scale convolution unit [25] respectively. Res + CBAM units between them. The skip connection [36]
Then, the MEU is designed to estimate accurate motion fields. is designed to accelerate training process. As shown in Fig. 2,
As shown in Fig. 2, FC , FM F , and FR are first concatenated, the overall framework of the AFR module stacks five Res +
then separate convolution operations are followed to generate CBAM units. To optimize the proposed network, L2 loss is
offsets for each texture branch in MEU. It should be noticed utilized as the loss function:
that not only the texture information IC and I R are exploited in
MEU, but also the initial motion field I M F are jointly utilized L =  (OGT − O D AMC ) 22 (3)
to estimate accurate motion fields. Compared to a network where OGT is the corresponding block in the raw videos and
which learns motion fields from scratch, the DMCP-Net esti- O D AMC is the output of AFR.
mates accurate motion fields with coarse input, which helps to
reduce the network training difficulty and ensure the quality
D. Integration of DAMC-Net in VVC
of learned motion fields.
Let δC denote the motion field from FC to FTCar , which 1) The Scope of DAMC-Net in VVC: There are two affine
computes the affine motion between IC and O D AMC . δ R modes in VVC, namely, affine inter-mode and affine merge-
denotes the motion field from FR to FTRar , which computes mode. Since these two modes are both based on AMC,
the affine motion between I R and O D AMC . Since the motion the proposed DAMC-Net is applied to these two modes.
between IC and O D AMC is smaller than that between I R and In addition, since the flexible quadtree nested multi-type tree
O D AMC , the MEU actually estimates a fine motion field from structure is explored in VVC to split coding tree unit (CTU)
IC to O D AMC . Calculation of motion fields δC and δ R for the into CUs, there exist CUs with square or rectangular shape
two texture branches can be expressed as: in VVC. Overall, there are 12 sizes of CUs with affine inter-
 mode, and 19 sizes of CUs with affine merge-mode. In order
δC = Fθ1 (FC , FM F , FR ) to ensure the performance of DAMC-Net in each size of CU,
(1)
δ R = Fθ2 (FC , FM F , FR ) a series of models corresponding to different sizes of CUs are
exploited in this paper. Therefore, 12 models are trained for the
where θ 1 and θ 2 are parameters learned by the network. F (·) affine inter-mode and 19 models for the affine merge-mode.
represents the operation of the motion estimation unit. The following section illustrates the performance of two affine
Similar to the function of the affine motion compensation modes integrated with DAMC-Net.
module in VVC, features FC and FR of the two texture 2) The Strategy of DAMC-Net in VVC: The DAMC-Net is
branches are deformed separately to generate compensated defined as a new DAMC mode and embedded into the process
features. Inspired by [34], motion compensation is operated of CU optimal mode decision. The DAMC mode is utilized as
by deformable convolution, which adaptively deforms the an optional mode for the CUs with affine inter-mode and affine
kernel sampling under the control of a motion field. Therefore, merge-mode and determined whether to be selected in the
compensated features FTCar and FTRar for the two texture encoder based on rate-distortion optimization (RDO). As for
branches can be computed as follows. the DAMC mode in the affine inter-mode, DAMC-Net is first

FTRar = DConv (FR , δ R ) fed with inputs after AMC, and outputs the compensated block
(2) O D AMC . Then, the RDO process determines whether to select
FTCar = DConv (FC , δC )
the DAMC mode, and a designed flag recording the decision
where DConv (·) denotes the deformable convolution [34]. result of the RDO is signalled to decoder. As for the DAMC
Since DMCP compensates the feature maps rather than the mode in the affine merge-mode, considering the encoding
pixels of the target image, it effectively exploits non-local complexity, it is determined whether to be selected by the RDO
context. after the best affine merge candidate is searched. Meanwhile,
only the CUs with the affine merge-mode need to encode and
C. Attention-Based Fusion and Reconstruction (AFR) decode the flag. Since the DAMC-Net is nested in the process
Taking the outputs of the DMCP module as input, the AFR of CU optimal mode decision rather than post-processing after
module fuses the multi-channel information and reconstructs encoding frames, the selection rate of CUs with affine modes
the final prediction signal. In order to obtain feature repre- increases significantly.
sentation with abundant information and improve the quality
of the final prediction signal, the AFR module obtains the IV. E XPERIMENTS
aggregated feature FAgg by fusing the non-deformed fea- A. Experimental Settings
tures FC and FR with the compensated features FTCar and 1) Training Data Preparation: To evaluate the proposed
FTRar . Due to multiple sources of information are included DAMC-Net, a training dataset is first collected with 106 videos
in FAgg , an attention mechanism is employed to emphasize at different resolutions from [37], [38] and 8 videos at the
useful features and suppress the others. Considering that the resolution of 3840 × 2160 from [39], that is 114 raw video
residual block can extract deep features effectively, the Res + sequences with rich and complex motions in total. Considering
CBAM structure [35], being composed of CBAM and residual the complexity of VVC, the 4K videos are down-sampled
block, is adopted. Furthermore, a down-sampling layer and to 1280 × 720. Then, VTM-6.2 is used to compress the

Authorized licensed use limited to: Universidade Federal dos Pampas (UNIPAMPA). Downloaded on November 08,2022 at 17:27:04 UTC from IEEE Xplore. Restrictions apply.
JIN et al.: DAMC-Net FOR INTER PREDICTION IN VVC 3927

TABLE I
BD-R ATE R ESULTS OF THE P ROPOSED DAMC-N ET FOR A FFINE I NTER -M ODE

Fig. 3. Visual comparison between VTM-6.2 and the proposed Inter & Merge DAMC-Net. Top: The 3-rd frame of BQsquare under QP 27. Bottom: The
30-th frame of PartyScene under QP 32. (a) BQsquare original image, (b) Inter & Merge AMC (5216 bits, 35.24 dB), (c) Proposed (4440 bits, 35.42 dB),
(d) PartyScene original image, (e) Inter & Merge AMC (22408 bits, 29.82 dB), (f) Proposed (21984 bits, 29.91 dB).

video sequences configured with Low Delay P (LDP) under sampled at the regular interval of 3 to generate the training
four quantization parameters (QPs) {22, 27, 32, 37}. Due to samples. In the process of compression, IC , I M F , and I R
the high similarity between adjacent frames, video frames are of CUs with affine modes in the selected frames, and the

Authorized licensed use limited to: Universidade Federal dos Pampas (UNIPAMPA). Downloaded on November 08,2022 at 17:27:04 UTC from IEEE Xplore. Restrictions apply.
3928 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 32, NO. 6, JUNE 2022

TABLE II
BD-R ATE R ESULTS OF THE P ROPOSED DAMC-N ET FOR B OTH A FFINE I NTER -M ODE AND A FFINE M ERGE -M ODE

corresponding ground-truth in raw video frames are utilized affine merge-mode are trained for each QP. In the training
to construct training samples. Consequently, 76 sub-datasets phase, the base model with the affine inter-mode, QP of 22,
in total are obtained, which correspond to 4 QPs and 19 CU and CU size of 16 × 16, is firstly trained. Specifically,
sizes (8 × 8, 8 × 16, 8 × 32, 8 × 64, 16 × 8, 16 × 16, the network is optimized using Adam [44] with a batch size
16 × 32, 16 × 64, 32 × 8, 32 × 16, 32 × 32, 32 × 64, of 128, and the learning rate is initially set to 0.0001 for
64 × 8, 64 × 16, 64 × 32, 64 × 64, 64 × 128, 128 × 64, the first 2,000,000 steps and decayed to 0.00001 for the last
128 × 128). 1,000,000 steps. Then, other models are refined based on the
2) Encoding Configurations: DAMC-Net is integrated into base model with the learning rate of 0.00005 for 100,000 steps.
VVC reference software VTM (version 6.2). Experiments
are performed under the JVET common test conditions B. Comparison Results and Analyses
(CTC) [40]. Since single reference frame is utilized in the To validate the effectiveness of the proposed DAMC-Net,
proposed DAMC-Net, LDP configuration and Classes B∼E performance of the scheme of DAMC-Net for affine
are tested. In the experiments, the testing QPs are set as inter-mode (Inter DAMC-Net) is first compared with the
{22, 27, 32, 37}, and the widely employed BD-rate [41], [42] scheme of VTM-6.2 with affine inter-mode (Inter AMC).
is used as the objective metric to evaluate the coding Meanwhile, the VTM-6.2 without affine inter-mode is set as
performance. A CPU + GPU cluster is used as the test the baseline to compute BD-rate. The coding performance on
environment, where VVC coding is tested in CPU and the Y, Cb, and Cr components of different methods are reported
DAMC-Net is running in GPU. The CPU is Inter(R) Core(TM) in Table I. As shown in the table, on the Y component,
i9-9900K CPU @ 3.60GHz, and the GPU is NVIDIA GeForce the “Inter DAMC-Net” achieves 4.11%, 1.59%, 3.43%, and
GTX 1080Ti. 3.02% BD-rate reduction on average for Class B, C, D, and E,
3) Training Strategy: The proposed DAMC-Net is imple- respectively. Particularly, the “Inter DAMC-Net” achieves up
mented on TensorFlow [43] and trained on a Nvidia GeForce to 6.13% BD-rate reduction on BQsquare, while “Inter AMC”
GTX 1080Ti GPU. To satisfy all the sizes of CUs with affine only obtains 3.43% BD-rate reduction. Besides, to further
mode, 12 models for affine inter-mode and 19 models for validate the advantage of the DAMC-Net, experiments about

Authorized licensed use limited to: Universidade Federal dos Pampas (UNIPAMPA). Downloaded on November 08,2022 at 17:27:04 UTC from IEEE Xplore. Restrictions apply.
JIN et al.: DAMC-Net FOR INTER PREDICTION IN VVC 3929

TABLE III TABLE IV


S ELECTION R ATES OF THE P ROPOSED DAMC M ODE S ELECTION R ATES OF THE DAMC M ODE BASED ON THE A REA

the scheme of DAMC-Net for both affine inter-mode and


affine merge-mode (Inter & Merge DAMC-Net) are conducted.
Since the proposed DAMC mode is set as an optional method BD-rate reduction. Moreover, the pixels on the boundaries of
for all the CUs with affine mode, the VTM-6.2 without the moving objects obtained by the proposed method are more
both affine inter-mode and affine merge-mode is set as the closing to the raw videos, due to the explicit motion fields
baseline for a fair comparison. The comparison results are estimated for motion compensation.
shown in Table II, it can be seen that the proposed “Inter &
Merge DAMC-Net” achieves significant bits saving than the C. Mode Selection Results
“Inter & Merge AMC”. Specifically, 4.32% BD-rate reduction To further analyze the contribution of the proposed method,
on average is achieved in the proposed scheme of “Inter & the results of mode selection of the DAMC mode for affine
Merge DAMC-Net” on Y component. Compared with the inter mode on all the test sequences with QP 22 are reported
O
scheme of “Inter & Merge AMC”, the proposed method in Table III. Let N AMC denote the number of CUs with the
affine inter-mode in the original VTM, N AMC P denote the
further achieves 50% coding gain in average and even doubles
the coding gain on several sequences, such as BQTerrace in number of CUs with the affine inter-mode in the proposed
Class B and PartyScene in Class C, which strongly demon- scheme with “Inter DAMC-Net”, and N D AMC denote the
strates the effectiveness of the proposed DAMC-Net. Espe- number of CUs with the DAMC mode in “Inter DAMC-Net”.
cially, on sequences with complex motions, such as Cactus, As shown in Table III, the selection rate H R of DAMC
the proposed method achieves more coding gain than the mode in “Inter DAMC-Net” and increasing rate I R of affine
average, which demonstrates that the proposed DAMC-Net is inter-mode are defined as follows.
effective in dealing with the complex motions in these scenes. P
H R = N D AMC /(N D AMC + N AMC ) (4)
In addition, in order to intuitively verify the effectiveness of
I R = (N D AMC + P
N AMC )/N AMC
O
(5)
the proposed method, the visual comparison of the decoded
images obtained by different methods are shown in Fig. 3. It can be observed that the proposed DAMC mode is
It can be observed that the proposed DAMC-Net obtains the selected for most cases with the H R of 74%. More impor-
decoded images with higher quality while achieving more tantly, larger I R is obtained by utilizing the proposed

Authorized licensed use limited to: Universidade Federal dos Pampas (UNIPAMPA). Downloaded on November 08,2022 at 17:27:04 UTC from IEEE Xplore. Restrictions apply.
3930 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 32, NO. 6, JUNE 2022

TABLE V
BD-R ATE R ESULTS OF THE DAMC-N ET W ITHOUT DMCP M ODULE

Fig. 4. Mode selection results for the 21-st frame of Cactus under QP 22.
(a) Inter AMC. (b) Proposed.

block denotes the CUs with affine inter-mode. It is obvious


that the DAMC mode is more likely to be selected in the CUs
with complex motions, and the selection rate of affine modes
increases significantly.
DAMC-Net. It is mainly because that the affine inter-mode
is more likely to be selected, as the proposed DAMC-Net D. Ablation Study
effectively eliminates the negative effects of the hand-craft The DMCP module is designed to compensate the current
algorithms and improves the performance of AMC. Besides, encoding block by estimating more accurate motion fields.
higher H R and I R are obtained on the sequences with In order to illustrate the contribution of the DMCP module,
complex motions, such as Cactus and PartyScene, which DMCP module is removed from DAMC-Net, and only FC
further demonstrates the effectiveness of the DAMC-Net than is taken as the input of AFR. The results of the proposed
traditional AMC in dealing with complex motions. In addition, method without the DMCP module on the scheme with “Inter
it can also be seen that high-definition sequences in Class B DAMC-Net” are shown in Table V. Compared with Tabel I,
have the highest H R and I R among all test sequences, the BD-rate reduction on average decreases from 4.11%,
which indicates that the proposed method is more suitable for 1.59%, 3.43%, and 3.02% to 3.81%, 1.35%, 2.81%, and
high-definition videos. 2.64% on Class B, C, D, and E, respectively. This is mainly
Meanwhile, the results of mode selection based on the because that the network without DMCP module only refines
influenced area on all test sequences with QP 22 are also the pixels of the current block instead of compensating the
reported in Table IV. It can be observed that the proposed block according to accurate motion fields estimated by DMCP
DAMC mode is more likely to be selected than the traditional module, which weakens the ability of the network to deal with
AMC for most cases. Especially, higher selection rates on the complex motions. However, the coding performance of the
the sequences with complex motions, MarketPlace and Par- network without the DMCP module only reduces from 8.88%
tyScene, also demonstrate the effectiveness of the proposed to 8.80% on Cactus. A possible reason is that the affine inter
DAMC-Net in dealing with complex motions. mode behaves well on Cactus, and it is difficult for the DMCP
Fig. 4 (a) shows the mode selection result of VTM-6.2 with module to further improve the coding performance of the affine
affine inter-mode. The highlighted blocks are the CUs selected inter mode on Cactus.
with affine inter-mode. Fig. 4 (b) shows the mode selection The initial motion field I M F is utilized to estimate accurate
result of the proposed scheme with “Inter DAMC-Net”. The motion fields in the DMCP module. To evaluate its effec-
red block denotes the CUs with DAMC mode, while the blue tiveness, the I M F is removed from the DMCP module. The

Authorized licensed use limited to: Universidade Federal dos Pampas (UNIPAMPA). Downloaded on November 08,2022 at 17:27:04 UTC from IEEE Xplore. Restrictions apply.
JIN et al.: DAMC-Net FOR INTER PREDICTION IN VVC 3931

TABLE VI TABLE VII


C OMPUTATIONAL C OMPLEXITY OF THE P ROPOSED M ETHOD BD-R ATE R ESULTS OF THE P ROPOSED M ETHOD ON
A FFINE T EST S EQUENCES

TABLE VIII
BD-R ATE R ESULTS OF THE P ROPOSED M ETHOD FOR VTM-12.1

comes from the forward operation in the network, the


fluctuation of the decoding times is primarily determined by
the number and size of CUs selected with the proposed DAMC
mode.
Furthermore, the storage consumption of the CNN models
and the run-time GPU memory are also analyzed. There are
12 models for the proposed “Inter DAMC-Net” and 19 models
for the proposed “Inter & Merge DAMC-Net”. Specifically,
network without the I M F achieves 2.80% BD-rate reduction on the model size of the DAMC-Net is 4.36 MB. Hence, the total
average, which is less than that of the proposed DAMC-Net. model size equals 52.52 MB and 82.84 MB for each QP,
The experimental results prove that the initial motion field respectively. As for GPU memory usage, 1,762 MB and
is helpful to improve the quality of learned motion fields. 1,968 MB run-time GPU memory are needed, respectively.
Moreover, in order to verify the benefit of the attention in In the future, we will explore more lightweight networks to
the AFR module, the CBAM module is removed from the better achieve this task.
DAMC-Net. The coding performance of the network without 2) Coding Performance on Affine Test Sequences: The
attention decreases from 3.11% to 2.88% on average, which coding performances on some affine test sequences [3] are
proves that the attention in the proposed DAMC-Net is useful reported to valid the efficacy of the proposed method to deal
to reconstruct the final output block. with complex motions. The results are shown in Table VII.
Obviously, the proposed method achieves better coding per-
E. Discussion formance than the traditional AMC. Moreover, the proposed
1) Complexity Analysis: It is widely known that the “Inter & Merge DAMC-Net” achieves 7.97% BD-rate reduc-
learning-based tools achieve superior coding efficiency at the tion on average, which is more than the average coding gain
cost of high computational complexity [15], [18]. To address on CTC test sequences. It convinces that the proposed method
this issue, in this paper, the network inference is running is more suitable for sequences with complex motions.
in GPU and the codec is performed in CPU. The encod- 3) Coding Performance for VTM-12.1: To further evaluate
ing and decoding time of the proposed method in com- the effectiveness of the proposed method on the new reference
parison with VTM-6.2 are shown in Table VI. Since the software of VVC, the VTM-12.1 is used as the reference codec
proposed DAMC-Net is integrated into the decision process and the proposed DAMC-Net is incorporated into VTM-12.1
of the optimal mode, the encoding time increases to 242.71% accordingly. The results are provided in Table VIII. Overall,
and 1890.25% for “Inter DAMC-Net” and “Inter & Merge the proposed “Inter & Merge DAMC-Net” achieves 4.95%
DAMC-Net”, respectively. The decoding time increases to BD-rate reduction on Y component. Furthermore, the overall
779.16% and 3,396.81%. Since the high complexity mainly R-D performance of the proposed method increases from

Authorized licensed use limited to: Universidade Federal dos Pampas (UNIPAMPA). Downloaded on November 08,2022 at 17:27:04 UTC from IEEE Xplore. Restrictions apply.
3932 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 32, NO. 6, JUNE 2022

4.32% to 4.95% when compared with the results on VTM-6.2. [16] L. Zhao, S. Wang, X. Zhang, S. Wang, S. Ma, and W. Gao,
Therefore, the proposed method is proved effective for the new “Enhanced motion-compensated video coding with deep virtual refer-
ence frame generation,” IEEE Trans. Image Process., vol. 28, no. 10,
reference software. pp. 4832–4844, Oct. 2019.
[17] S. Xia, W. Yang, Y. Hu, and J. Liu, “Deep inter prediction via pixel-wise
motion oriented reference generation,” in Proc. IEEE Int. Conf. Image
V. C ONCLUSION Process. (ICIP), Sep. 2019, pp. 1710–1714.
[18] J. Liu, S. Xia, and W. Yang, “Deep reference generation with multi-
In this paper, a DAMC-Net for inter prediction is pro- domain hierarchical constraints for inter prediction,” IEEE Trans. Mul-
posed to effectively boost the performance of affine motion timedia, vol. 22, no. 10, pp. 2497–2510, Oct. 2020.
[19] H. Choi and I. V. Bajić, “Deep frame prediction for video coding,”
compensation in VVC. In order to compensate the current IEEE Trans. Circuits Syst. Video Technol., vol. 30, no. 7, pp. 1843–1855,
encoding block, a DMCP module is designed to estimate Jul. 2020.
accurate motion fields from spatial neighboring informa- [20] H. Choi and I. V. Bajic, “Affine transformation-based deep frame
prediction,” IEEE Trans. Image Process., vol. 30, pp. 3321–3334, 2021.
tion, temporal reference block, and initial motion field. [21] J. Lin, D. Liu, H. Yang, H. Li, and F. Wu, “Convolutional neural
Then, an attention-based fusion and reconstruction module network-based block up-sampling for HEVC,” IEEE Trans. Circuits Syst.
is designed to fuse multi-channel features from DMCP Video Technol., vol. 29, no. 12, pp. 3701–3715, Dec. 2019.
[22] Y. Li et al., “Convolutional neural network-based block up-sampling for
and reconstruct the final prediction signal. The proposed intra frame coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 28,
DAMC-Net is integrated into VVC as an optional mode for no. 9, pp. 2316–2330, Sep. 2018.
CU with affine mode. Experimental results demonstrate that [23] J. Deng, L. Wang, S. Pu, and C. Zhuo, “Spatio-temporal deformable
the proposed DAMC-Net can considerably enhance coding convolution for compressed video quality enhancement,” in Proc. AAAI,
Apr. 2020, pp. 10696–10703.
performance. [24] Z. Pan, X. Yi, Y. Zhang, B. Jeon, and S. Kwong, “Efficient in-loop
filtering based on enhanced deep convolutional neural networks for
HEVC,” IEEE Trans. Image Process., vol. 29, pp. 5352–5366, 2020.
R EFERENCES [25] Z. Guan, Q. Xing, M. Xu, R. Yang, T. Liu, and Z. Wang, “MFQE
2.0: A new approach for multi-frame quality enhancement on com-
[1] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the pressed video,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 3,
high efficiency video coding (HEVC) standard,” IEEE Trans. Circuits pp. 949–963, Mar. 2021.
Syst. Video Technol., vol. 22, no. 12, pp. 1649–1668, Dec. 2012. [26] S. Ma, X. Zhang, C. Jia, Z. Zhao, S. Wang, and S. Wang, “Image
[2] S. Li, C. Zhu, and M.-T. Sun, “Hole filling with multiple reference and video compression with neural networks: A review,” IEEE Trans.
views in DIBR view synthesis,” IEEE Trans. Multimedia, vol. 20, no. 8, Circuits Syst. Video Technol., vol. 30, no. 6, pp. 1683–1698, Jun. 2020.
pp. 1948–1959, Aug. 2018. [27] J. Lin, D. Liu, H. Li, and F. Wu, “M-LVC: Multiple frames prediction
[3] L. Li et al., “An efficient four-parameter affine motion model for video for learned video compression,” in Proc. IEEE/CVF Conf. Comput. Vis.
coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 28, no. 8, Pattern Recognit. (CVPR), Jun. 2020, pp. 3543–3551.
pp. 1934–1948, Aug. 2018. [28] E. Agustsson, D. Minnen, N. Johnston, J. Balle, S. J. Hwang, and
[4] J. Xie, N. He, L. Fang, and P. Ghamisi, “Multiscale densely-connected G. Toderici, “Scale-space flow for end-to-end optimized video compres-
fusion networks for hyperspectral images classification,” IEEE Trans. sion,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR),
Circuits Syst. Video Technol., vol. 31, no. 1, pp. 246–259, Jan. 2021. Jun. 2020, pp. 8500–8509.
[5] J. Lei et al., “Deep stereoscopic image super-resolution via interaction [29] R. Yang, F. Mentzer, L. Van Gool, and R. Timofte, “Learning for
module,” IEEE Trans. Circuits Syst. Video Technol., vol. 31, no. 8, video compression with hierarchical quality and recurrent enhancement,”
pp. 3051–3061, Aug. 2021. in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR),
[6] L. Wang et al., “Learning parallax attention for stereo image super- Jun. 2020, pp. 6627–6636.
resolution,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. [30] Z. Chen, T. He, X. Jin, and F. Wu, “Learning for video compression,”
(CVPR), Jun. 2019, pp. 12242–12251. IEEE Trans. Circuits Syst. Video Technol., vol. 30, no. 2, pp. 566–576,
[7] Y. Fang, C. Zhang, H. Huang, and J. Lei, “Visual attention prediction for Feb. 2020.
stereoscopic video by multi-module fully convolutional network,” IEEE [31] G. Lu, W. Ouyang, D. Xu, X. Zhang, C. Cai, and Z. Gao, “DVC: An end-
Trans. Image Process., vol. 28, no. 11, pp. 5253–5265, Nov. 2019. to-end deep video compression framework,” in Proc. IEEE/CVF Conf.
[8] W. Bao, W.-S. Lai, X. Zhang, Z. Gao, and M.-H. Yang, “MEMC-Net: Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 10998–11007.
Motion estimation and motion compensation driven neural network for [32] K. Zhang, Y.-W. Chen, L. Zhang, W.-J. Chien, and M. Karczewicz,
video interpolation and enhancement,” IEEE Trans. Pattern Anal. Mach. “An improved framework of affine motion compensation in video
Intell., vol. 43, no. 3, pp. 933–948, Mar. 2021. coding,” IEEE Trans. Image Process., vol. 28, no. 3, pp. 1456–1469,
[9] S. Huo, D. Liu, F. Wu, and H. Li, “Convolutional neural network-based Mar. 2019.
motion compensation refinement for video coding,” in Proc. IEEE Int. [33] H. Huang, J. W. Woods, Y. Zhao, and H. Bai, “Control-point represen-
Symp. Circuits Syst. (ISCAS), May 2018, pp. 1–4. tation and differential coding affine-motion compensation,” IEEE Trans.
[10] Z. Zhao, S. Wang, S. Wang, X. Zhang, S. Ma, and J. Yang, “Enhanced Circuits Syst. Video Technol., vol. 23, no. 10, pp. 1651–1660, Oct. 2013.
bi-prediction with convolutional neural network for high-efficiency video [34] J. Dai et al., “Deformable convolutional networks,” in Proc. IEEE Int.
coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 29, no. 11, Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 764–773.
pp. 3291–3301, Nov. 2019. [35] S. Woo, J. Park, J. Lee, and I. Kweon, “CBAM: Convolutional block
[11] J. Mao and L. Yu, “Convolutional neural network based bi-prediction attention module,” in Proc. Eur. Conf. Comput. Vis. (ECCV), Jul. 2018,
utilizing spatial and temporal information in video coding,” IEEE Trans. pp. 3–19.
Circuits Syst. Video Technol., vol. 30, no. 7, pp. 1856–1870, Jul. 2020. [36] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
[12] Y. Wang, X. Fan, C. Jia, D. Zhao, and W. Gao, “Neural network based image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
inter prediction for HEVC,” in Proc. IEEE Int. Conf. Multimedia Expo (CVPR), Jun. 2016, pp. 770–778.
(ICME), Jul. 2018, pp. 1–4. [37] Xiph.org. (2017). Xiph.org Video Test Media. [Online]. Available:
[13] N. Yan, D. Liu, H. Li, B. Li, L. Li, and F. Wu, “Convolutional neural https://media.xiph.org/video/derf
network-based fractional-pixel motion compensation,” IEEE Trans. Cir- [38] VQEG. (2017). VQEG Video Datasets Organizations. [Online].
cuits Syst. Video Technol., vol. 29, no. 3, pp. 840–853, Mar. 2019. Available: https://www.its.bldrdoc.gov/vqeg/video-datasetsand-
[14] J. Lin, D. Liu, H. Li, and F. Wu, “Generative adversarial network-based organizations.aspx/
frame extrapolation for video coding,” in Proc. IEEE Vis. Commun. [39] L. Song, X. Tang, W. Zhang, X. Yang, and P. Xia, “The SJTU 4K video
Image Process. (VCIP), Dec. 2018, pp. 1–4. sequence dataset,” in Proc. 5th Int. Workshop Qual. Multimedia Exper.
[15] S. Huo, D. Liu, B. Li, S. Ma, F. Wu, and W. Gao, “Deep network- (QoMEX), Jul. 2013, pp. 34–35.
based frame extrapolation with reference frame alignment,” IEEE Trans. [40] K. Suehring and X. Li, JVET Common Test Conditions and Software
Circuits Syst. Video Technol., vol. 31, no. 3, pp. 1178–1192, Mar. 2021. Reference Configurations, document JVET-G1010, Aug. 2017.

Authorized licensed use limited to: Universidade Federal dos Pampas (UNIPAMPA). Downloaded on November 08,2022 at 17:27:04 UTC from IEEE Xplore. Restrictions apply.
JIN et al.: DAMC-Net FOR INTER PREDICTION IN VVC 3933

[41] J. Lei, J. Duan, W. Feng, N. Ling, and C. Hou, “Fast mode decision Nam Ling (Life Fellow, IEEE) received the B.Eng.
based on grayscale similarity and inter-view correlation for depth map degree from the National University of Singapore,
coding in 3D-HEVC,” IEEE Trans. Circuits Syst. Video Technol., vol. 28, Singapore, in 1981, and the M.S. and Ph.D. degrees
no. 3, pp. 706–718, Mar. 2018. from the University of Louisiana at Lafayette,
[42] G. Bjontegaard, Calculation of Average PSNR Differences Between RD Lafayette, LA, USA, in 1985 and 1989, respectively.
Curves, document VCEG-M33, Apr. 2001. From 2002 to 2010, he was an Associate Dean with
[43] M. Abadi et al., “TensorFlow: Large-scale machine learning on het- the School of Engineering, Santa Clara University,
erogeneous distributed systems,” 2016, arXiv:1603.04467. [Online]. Santa Clara, CA, USA. He was Sanfilippo Family
Available: http://arxiv.org/abs/1603.04467 Chair Professor, and currently Wilmot J. Nicholson
[44] D. P. Kingma and J. Ba, “Adam: A method for stochastic Family Chair Professor and the Chair with the
optimization,” 2014, arXiv:1412.6980. [Online]. Available: http:// Department of Computer Science and Engineering,
arxiv.org/abs/1412.6980 Santa Clara University. He is/was also a Consulting Professor with the
National University of Singapore; a Guest Professor with Tianjin University,
Tianjin, China, and Shanghai Jiao Tong University, Shanghai, China; Cuiying
Chair Professor with Lanzhou University, Lanzhou, China; a Chair Professor
and Minjiang Scholar with Fuzhou University, Fuzhou, China; a Distin-
Dengchao Jin received the B.S. degree in mecha- guished Professor with Xi’an University of Posts and Telecommunications,
tronic engineering from Northwestern Polytechnical Xi’an, China; a Guest Professor with Zhongyuan University of Technology,
University, Xi’an, China, in 2019. He is currently Zhengzhou, China; and an Outstanding Overseas Scholar with Shanghai
pursuing the Ph.D. degree with the School of Electri- University of Electric Power, Shanghai. He has authored or coauthored over
cal and Information Engineering, Tianjin University, 220 publications and seven adopted standard contributions. He has been
Tianjin, China. His research interests include video granted nearly 20 U.S. patents so far. He is an IEEE Fellow due to his
coding and deep learning. contributions to video coding algorithms and architectures. He is also an
IET Fellow. He was named as an IEEE Distinguished Lecturer twice and
also an APSIPA Distinguished Lecturer. He was a recipient of the IEEE
ICCE Best Paper Award (First Place) and the Umedia Best/Excellent Paper
Award three times, six awards from Santa Clara University, four at the Uni-
versity level (Outstanding Achievement, Recent Achievement in Scholarship,
President’s Recognition, and Sustained Excellence in Scholarship), and two at
the School/College level (Researcher of the Year and Teaching Excellence).
He was a Keynote Speaker of IEEE APCCAS, VCVP (twice), JCPC, IEEE
Jianjun Lei (Senior Member, IEEE) received the ICAST, IEEE ICIEA, IET FC Umedia, IEEE Umedia, IEEE ICCIT, and Work-
Ph.D. degree in signal and information processing shop at XUPT (twice); and a Distinguished Speaker of IEEE ICIEA. He served
from Beijing University of Posts and Telecommuni- as the General Chair/Co-Chair for IEEE Hot Chips, VCVP (twice), IEEE
cations, Beijing, China, in 2007. He was a Visiting ICME, Umedia (seven times), and IEEE SiPS. He was an Honorary Co-Chair
Researcher with the Department of Electrical Engi- of IEEE Umedia 2017. He served as the Technical Program Co-Chair for
neering, University of Washington, Seattle, WA, IEEE ISCAS, APSIPA ASC, IEEE APCCAS, IEEE SiPS (twice), DCV, and
USA, from August 2012 to August 2013. He is cur- IEEE VCIP. He was the Technical Committee Chair of IEEE CASCOM TC
rently a Professor with Tianjin University, Tianjin, and IEEE TCMM. He served as a Guest Editor or an Associate Editor for the
China. His research interests include 3-D video IEEE T RANSACTIONS ON C IRCUITS AND S YSTEMS —I: R EGULAR PAPERS ,
processing, virtual reality, and artificial intelligence. the IEEE J OURNAL OF S ELECTED T OPICS IN S IGNAL P ROCESSING, IEEE
A CCESS , JSPS (Springer), and MSSP (Springer).

Bo Peng (Member, IEEE) received the Ph.D. degree


in information and communication engineering from
Tianjin University, Tianjin, China, in 2020. She
was a Visiting Research Scholar with the School
of Computing, National University of Singapore,
Singapore, from March 2019 to April 2020. She
is currently an Assistant Professor with the School
of Electrical and Information Engineering, Tianjin Qingming Huang (Fellow, IEEE) received the
University. Her research interests include computer bachelor’s degree in computer science and the
vision, image processing, and vision understanding. Ph.D. degree in computer engineering from Harbin
Institute of Technology, China, in 1988 and
1994, respectively. He is currently a Professor
with the University of Chinese Academy of Sci-
ences and an Adjunct Research Professor with
the Institute of Computing Technology, Chinese
Wanqing Li (Senior Member, IEEE) received Academy of Sciences. He has published more than
the Ph.D. degree in electronic engineering from 400 academic articles in prestigious international
The University of Western Australia. He was an journals, including the IEEE T RANSACTIONS ON
Associate Professor with Zhejiang University from I MAGE P ROCESSING, the IEEE T RANSACTIONS ON M ULTIMEDIA, and the
1991 to 1992, a Senior Researcher and later a IEEE T RANSACTIONS ON C IRCUITS AND S YSTEMS FOR V IDEO T ECHNOL -
Principal Researcher with Motorola Research Labo- OGY , and top-level conferences, such as the ACM Multimedia, ICCV, CVPR,
ratory from 1998 to 2003, and a Visiting Researcher IJCAI, and VLDB. His research areas include multimedia video analysis,
with Microsoft Research, USA, in 2008, 2010, and image processing, computer vision, and pattern recognition. He served as the
2013. He is currently an Associate Professor and General Chair, the Program Chair, the Track Chair, and a TPC Member for
the Director of the Advanced Multimedia Research various conferences, including ACM Multimedia, CVPR, ICCV, ICME, PCM,
Laboratory (AMRL), University of Wollongong, and PSIVT. He is also an Associate Editor of the IEEE T RANSACTIONS ON
Australia. His research areas include machine learning, 3D computer vision, C IRCUITS AND S YSTEMS FOR V IDEO T ECHNOLOGY and Acta Automatica
3D multimedia signal processing, and medical image analysis. He also Sinica, and a Reviewer of various international journals, including the IEEE
serves as an Associate Editor for IEEE T RANSACTIONS ON C IRCUITS T RANSACTIONS ON M ULTIMEDIA, the IEEE T RANSACTIONS ON C IRCUITS
AND S YSTEMS FOR V IDEO T ECHNOLOGY and IEEE T RANSACTIONS ON AND S YSTEMS FOR V IDEO T ECHNOLOGY , and the IEEE T RANSACTIONS
M ULTIMEDIA. ON I MAGE P ROCESSING .

Authorized licensed use limited to: Universidade Federal dos Pampas (UNIPAMPA). Downloaded on November 08,2022 at 17:27:04 UTC from IEEE Xplore. Restrictions apply.

You might also like