0% found this document useful (0 votes)
76 views95 pages

US20210136407A1

The document describes a video coding and decoding apparatus. It introduces rectangular slices that can be decoded independently without reference to other slices or regions outside the slice. This simplifies the complex coding structure by replacing tiles. A flag is decoded to indicate if a slice is rectangular. For rectangular slices, the position and size do not change over time when referring to the same sequence parameter set.

Uploaded by

aryandubey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views95 pages

US20210136407A1

The document describes a video coding and decoding apparatus. It introduces rectangular slices that can be decoded independently without reference to other slices or regions outside the slice. This simplifies the complex coding structure by replacing tiles. A flag is decoded to indicate if a slice is rectangular. For rectangular slices, the position and size do not change over time when referring to the same sequence parameter set.

Uploaded by

aryandubey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 95

US 20210136407A1

IN
( 19) United States
( 12 ) AONO
Patentet Application
al .
Publication ((4310)) Pub
Pub.. Date
No .: :US 2021/0136407 A1
May 6, 2021
( 54 ) VIDEO CODING APPARATUS AND VIDEO (52) U.S. CI.
DECODING APPARATUS CPC H04N 19/563 (2014.11 ) ; H04N 19/57
( 2014.11 )
( 71 ) Applicant: SHARP KABUSHIKI KAISHA , Sakai
City, Osaka ( JP )
(72) Inventors : TOMOKO AONO , Sakai City, Osaka (57 ) ABSTRACT
( JP) ; TOMOHIRO IKAI, Sakai City,
Osaka ( JP ) ; TAKESHI CHUJOH ,
Sakai City, Osaka (JP ) A slice or a tile can be decoded in a single picture , without
(21 ) Appl. No .: 16 /757,236 reference to information outside of a target slice or outside
of a target tile . However, there are problems in that in order
( 22 ) PCT Filed : Oct. 15 , 2018 to decode some regions of a video in a sequence, the entire
video needs to be reconstructed, and in that the slice and the
( 86 ) PCT No .: PCT / JP2018 /038362 tile coexist in the single picture, and the slices include an
$ 371 ( c ) ( 1 ) , independent slice and dependent slice , causing the coding
( 2) Date : Apr. 17, 2020 structure to be complex. In the present invention, a flag
indicating whether a shape of a slice is rectangular or not is
( 30) Foreign Application Priority Data decoded , and in a case that the flag indicates that the shape
of the slice is rectangular, a position and a size of the slice
Oct. 20 , 2017 (JP) 2017-203697 that is rectangular are not changed in a period of time of
referring to a same SPS . The slice that is rectangular is
Publication Classification decoded independently without reference to information of
( 51 ) Int . Ci . another slice . As described above, introducing the slice that
H04N 19/563 ( 2006.01 ) is rectangular instead of the tile can simplify the coding
H04N 19/57 (2006.01 ) structure that is complex.

( a ) CODING VIDEO VPS SPS# 0 SPS# 1 PPS# 0 PPS # 1 SEI PICT PICT
SEQUENCE #0 #1

( b ) CODING PICTURE SLICE O SLICE 1 SLICE NS - 1

( c ) CODING SLICE SLICE HEADER SLICE DATA

( d ) CODING SLICE CODING TREE UNIT CTU CTU | CTU CTU CTU CTU CTU CTU
DATA (CTU )

e ) CODING TREE UNIT


CTU
CU ( 10 )

CN CN CN CN CU ( 12 )
( 10 ) ( 12 ) ( 13 )

CU CN CN CN CN CU CN CN CN
( 10 ) | (110 ) ( 112 ) | (113 ) ( 12 ) ||(130 ) |( 131) |(132) | (133 )
CU CU CU CU CU CU
( 110 ) | ( 111 ) || (112 ) ||( 113) ( 131) || (132 ) ||( 133 )

CN CN CN CN
| ( 120) || ( 121 ) || ( 122 ) || ( 123 )
CU CU CU CU
( 120) (121) || (122 )||( 123)
( 1 ) CODING UNIT
CUH PUF0 PU#1 PU #2 PU # 3 | TU# 0 | TU # 1 TU #2 TU # 3
Patent Application Publication May 6 , 2021 Sheet 1 of 50 US 2021/0136407 A1

1
}
Te Te Td
VIDEO CODING VIDEO DECODING VIDEO DISPLAY
APPARATUS APPARATUS APPARATUS
21 31 41

FIG . 1
Patent Application Publication May 6 , 2021 Sheet 2 of 50 US 2021/0136407 A1

( a ) CODING VIDEO VPS SPS# 0 SPS # 1 PPS# 0 PPS # 1 SEI PICT PICT
SEQUENCE #0 #1

( b ) CODING PICTURE SLICE O SLICE 1 SLICE NS- 1

( c ) CODING SLICE SLICE HEADER SLICE DATA

( d ) CODING SLICE CODING TREE UNIT CTU CTU CTU CTU CTU CTU CTU CTU
DATA ( CTU )

( e ) CODING TREE UNIT


CTU
(0 ) CU ( 10 )

CN CN CN CN CU ( 12 )
( 10) ( 11 ) ( 12) ( 13 )

CN CN CN CN CU CN CN CN CN
( 10 ) ( 110 )||( 111) || ( 112 ) | (113) ( 12 ) |(130 )|| ( 131) |( 132 ) | (133 )
CU CU CU CU CU CU CU
( 110 ) || ( 111 ) || ( 112 ) || ( 113 ) ( 131 ) || (132 ) ||( 133 )

CN CN CN CN
( 120 ) || (121) || ( 122) || (123)
CU CU CU CU
( 120 ) || (121 ) ||(122) ||( 123)
( f ) CODING UNIT
CUH PU # 0 PU # 1 PU # 2 PU # 3 TUO TU # 1 TU # 2 | TU # 3

FIG . 2
Patent Application Publication May 6 , 2021 Sheet 3 of 50 US 2021/0136407 A1

(a)
currPic

B3 B4

B2

10 P1

TIME ( POC )
0 1 2 3 4

( b)

REFERENCE PICTURE LIST OF currPic (= B3)


refldxLO

LO LIST 10 B2 P1
RefPicListLO

L1 LIST
RefPicListL1 B2 P1 10

refldxL1

FIG . 3
Patent Application Publication May 6, 2021 Sheet 4 of 50 US 2021/0136407 A1

(a) (b)
Sliceo RSliceo ERSlice1
Slice1
WUULUJUU JUULI

Slice2
RSlice2 RSI i ce3
ATA ARTIST ice3
(c)

lllll

INDEPENDENT SLICE
M DEPENDENT SLICE
W DEPENDENT SLICE
U BEGINNING BLOCK OF INDEPENDENT SLICE
BEGINNING BLOCK OF DEPENDENT SLICE
(d )

RSRS [[OO]] [[22]] [[yy--11 ]] R$[1] [2] [ y- 1]


RS [ 0 ] [0 ] [y ]
RS [ 2 ] [ O ] [ y ]
IIIIIIMU1 1 1 LI
R$ [ 2 ] [0 ] [ y ] RS [ 3 ] [0 ] [ y ]
REFERENCE BLOCK
LEFT END BLOCK (BEGINNING BLOCK OF CTU COLUMN )

FIG . 4
Patent Application Publication May 6 , 2021 Sheet 5 of 50 US 2021/0136407 A1

(a )
0
1
1

b
(
) 0 1
1 2 3
2 3

0
1

IN
CA
3
(c) 0 1 3
2
01 2345 67
4 5 6 7

1
2 3 2
?4
4 5
6 7 7

0 1 2 3 4 5 6 7
101 1112131415
8 9 10 11 12 13 14 15
1
0 1 2 3 2
4 5
5 6 7 6 7

8 9 10 11
12 15 14

14
?

FIG . 5
Patent Application Publication May 6 , 2021 Sheet 6 of 50 US 2021/0136407 A1

(a ) RECTANGULAR SLICE PICTURE (b)

( xRSs , yRSs ) RSlice (0 ) RS1ice ( 1 ) RSI ice ( 2 )

CTU
HRS hPict RSlice (n )

WRS RSlice (N - 1

wPict

(c )
I PICTURE

VIDEO SEQUENCE

wPict

WRS
hPict TIME tk - 2
hRSKI TIME tk - 1
TIME tk
PICTURE RECTANGULAR
SLICE
RSI ice ( n , tk )
CVS(CodedVideoSequence)
TIME to

I SLICE
TIME
RECTANGULAR SLICE SEQUENCE

FIG . 6
Patent Application Publication May 6 , 2021 Sheet 7 of 50 US 2021/0136407 A1

( a ) seq parameter set rbspo { Descriptor


sps video parameter set_id, LI( 4 )
sps max sub lavers minust u (3 )
sps temporal id nesting flag u( 1 )
profile tier level ( 1 , sps max sub layers minus . )
sps seq parameter set id ue ( v )
chroma format idc ue ( v )
if( chroma forinat idc 3)
separate colour plane flag u (1 )
pic width in luma samples ue ( v )
pic height in luma_samples ue ( v)
rectangular slice flag u( 1 )
vui parameters present flag u( 1 )
if( vui parameters present flag )
vui parameters )

(b ) pic parameter set rbsp { Descriptor


pps_pic parameter_set_id ue ( v )
pps ser parameter_set id ue ( v )
entropy coding sync enabled flag u (1 )
if( rectangular slice flag ) {
rectangular slice info ( )

(c ) rectangular slicc info { Descriptor


num rslice columns minust ue ( V )
num rslice rows minus1 ue ( V )
uniform spacing flag u( 1 )
if( ! uniform spacing ?lag ) {
for ( i = 0 ; i < num rslice columns minus 1; i++ )
column_width_minus1 [ i ] ue ( V )
for ( i = 0 ; i < num rslice rows minus1; i++ )
row height minus1 il ue (V )
}
loop filter across rslices enabled flag

FIG . 7
Patent Application Publication May 6, 2021 Sheet 8 of 50 US 2021/0136407 A1

slice segment header { Descriptor


first slice segment in pic flag u (1)
if( ! first slice segment in pic flag ) { SYNO1
if dependent slice segments enabled flag )
dependent slice_segment_flag u ( 1)
slice segment address u (v) I SYNO4
if ( !dependent slice segment flag ) {
slice type ue ( v )
if( nal unit type ! = IDR W RADL && nal unit type ! = IDR N LP ) {
sliee pic order ent Isb u (v) SYNO2
short term ref pic set sps flag u ( 1)

}
if( slice type ==Pslice type == B ) {

if( slice temporal mvp enabled flag {


if ( slice type --B )
collocated_from_10_flag u (1)
if( ( collocated from 10 flag && num ref idx 10_active minus 1 > 0 ) |
! collocated from 10 flag && num ref idx 11 active minus1 > 0 )
collocated ref idx uev ) SYNO3
}
}

if( tiles enabled flag | entropy coding sync enabled flag ) {


num entry point offsets ue ( v )
if num entry point offsets > 0 ) { SYNO5
offset len minus1 ue ( V )
for( i = 0 ; i < nun entry point offsets; i++ )
entry point offset minus1 il u (v )
}

FIG . 8
Patent Application Publication May 6 , 2021 Sheet 9 of 50 US 2021/0136407 A1

( a ) seq . parameter set rbspo { Descriptor


sps video parameter set id u( 4)
sps may sub layers minust
sps temporal id nesting flag u( 1 )
profile tier level( 1, sps max sub layers minus1 )
sps seg parameter set id ue ( v
chroma format idc ue ( V )
if( chroma format ido 3
14 .

separate colour plane flag u (1)


pic width in luma samples ue ( V )
pic height in luma samples ue ( V )
rectangular slice flag u ( 1)
if rectangular slice flag ) { u( 1 )
rectangular slice info ( )
islice info
}
vui parameters present flag
if( vui parameters present flag )
vui parameters )

( b ) islice into01 Descriptor


num islice picture ue ( V )
for ( i - 0 ; i < num islice picture ; i++ ) {
for (i = 0 ; i < NumRSlice ; iH )
islice flag u (1)
}

(c ) islice info Descriptor


islice period ue ( V )
max tid ue ( V )
}

( d) i slice_flag [ i ] [ j ] REMARKS
0 1000100010 POG = 0
1 0100010001 POC = 8
2 0010001000 POC = 16
3 0001000100 POC = 24

FIG . 9
Patent Application Publication May 6 , 2021 Sheet 10 of 50 US 2021/0136407 A1

RSI ice (0 , t0 ) RS1ice (1 , t0 ) RSlice (2 , t0 ) RSlice (0 , t1) RSlice (1, t1 ) ***


Ek
BLK27K
RS / ice (n , to )
BLK1
CUT
CU21RS1ice ( n , t12 CU4'
IRSI i ce ( n + 1 , t0 ) z
CU3
GUABR
***
BLK3Z RSI Ice (N- 1 ,
RSI ice ( N - 2 . *** *** RSlice (N - 1 ,
to ) to ) t1 )

( a) TIME TO I PICTURE Pict ( to ) (b) TIME 11 = 10 + 1 P PICTURE Pict ( 11 )

RSI i ce (0 , t2 ) RSlice (1 , t2)

wa
RSlice (n , t2 )
CU44

*** RSlice (N - 1.
t2 )

( c) TIME 12 = 0 + 2 P PICTURE Pict ( t2 )

FIG . 10
Patent Application Publication May 6 , 2021 Sheet 11 of 50 US 2021/0136407 A1

(a ) slice segment header rectangular í Descriptor


if( dependent slice segments enabled flag )
dependent slice segment flag
if( !dependent slice segment flag ) {
u 7 SYN11

slice type ue ( V )

slice pic order cnt Isb Tu


u (v ) SYN12
if ( nal unit type != IDR W RADL && nal unit type != IDR N LP ) {
short term ref pic set sps flag Tu(1)
if( slice type == P slice type -
B ){
if slice temporal myn enabled flag ) {
if ( slice type == B )
collocated from 10 flag
if ( ( collocated_from_10_flag && num_ref_idx_10_active_minus1 > 0 ) || (
!collocated from 10 flag && num ref idx 11 active minus1 > O )
collocated ret idx ue ( V ) SYN13
3

}
else
slice segment address
if( entropy coding sync enabled flag ) {
u(V )
? SYN14

if( row Height slice idl- 1 > 0 ) {


offset len minus1 ue ( V )
SYN15
for( i = 0 ; i < rowHeight?slice id )-1 ; i ++ )
entry point offset minus1 il (v )
}

(b ) slice segment layer rbsp Descriptor


if ( slice id == 0xFFFF ) * general slice /
slice segment header
else * rectangular slice * /
slice segment header rectangular
slice segment data

FIG . 11
Patent Application Publication May 6 , 2021 Sheet 12 of 50 US 2021/0136407 A1

(a)
POCO 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0
0 1 2 3 4 5 6 7 8 9 10 11 | 12 13 14 15 0

POC 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
3 5 11 13 15
0 2 4 6 8 10 12 14
(c )
POC 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12
2 3 10 15
5 9 WR
0 4 8 12

(d)
POC 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8
3 12 15
2 5 10 13
1 9
0 8

(e)
POC 0 1 2 3 4 5 6 7 0
11 2 3 4 5 6 7 0

(f)
POCO 1 2 3 4 7 8 15 16 :
23 24 31 0
1 2 3 4 7 8
51
15 | 16 --- 23 24 3110

FIG . 12
Patent Application Publication May 6 , 2021 Sheet 13 of 50 US 2021/0136407 A1

( a ) 4 RECTANGULAR SLICES , maxTid = 2 , PSlice = 8


0 1

2 3

b
(
) 1 )
c
( 1

1 1
POC = 0 POC= 4

( d ) 6 RECTANGULAR SLICES , maxTid = 1 , PSlice = 16


11 2

3 5

(e)
1
(f) 1

POC = 0 POC = 2

(g) (h)

POC = 4 POC = 6

(1)

1 1

POC = 8 POC = 10

FIG . 13
Patent Application Publication May 6 , 2021 Sheet 14 of 50 US 2021/0136407 A1

( a ) 10 RECTANGULAR SLICES , maxTid = 3 , PSlice = 32

0 1 2 3 4

5 6 7 8 9

(b) 0
(
)
1 1 1

I 1 1

POC = 0 POC = 8

d
(
) 1
( e)
I

1 I

POC = 16 POC = 24

FIG . 14
Patent Application Publication May 6 , 2021 Sheet 15 of 50 US 2021/0136407 A1

CONTROL
INFORMATION
(a )
RECTANGULAR SLICE INFORMATION , CONTROL INFORMATION
O BOXRUX** ### ?? ??

2002a
SLICE
DECODER 1

2002b
2001 SLICE 2003
HEADER DECODER 2 SLICE
INFORMATION COMBINING UNIT
DECODER ?
CODING STREAM Te ?
?
DECODED IMAGE TO

2002n
SLICE
DECODER
VIDEO DECODING APPARATUS 31

(b)
RECTANGULAR SLICE INFORMATION
2012a
SLICE
CODER 1

2012b
2010 s 2011 SLICE s 2013
HEADER CODER 2 CODING STREAM
PARTITIONING INFORMATION
PROCESSING GENERATION UNIT GENERATION UNIT

INPUT VIDEOT ?
?
CODING STREAM Te

2012n
SLICE
CODER
VIDEO CODING APPARATUS 11

FIG . 15
Patent Application Publication May 6 , 2021 Sheet 16 of 50 US 2021/0136407 A1

(a) (b)
start start

S1601 S1611
RECTANGULAR SLICE ? N RECTANGULAR SLICE ? N

Y S1602 Y S1612
DETERMINE RECTANGULAR DECODE INFORMATION RELATED
SLICE TO INSERT I SLICE TO I SLICE INSERTION
$ 1603 S1613
CODE INFORMATION RELATED DERIVE RECTANGULAR SLICE BEING
TO I SLICE INSERTION TARGET OF I SLICE INSERTION

ALL (RECTANGULAR ) SLICES ALL (RECTANGULAR ) SLICES


-S1604 -S1614
CODE (RECTANGULAR ) SLICE DECODE (RECTANGULAR ) SLICE

P
end end

FIG . 16
Patent Application Publication May 6 , 2021 Sheet 17 of 50 US 2021/0136407 A1

( a ) nal unit( NumBytesInNalUnit Descriptor


nal unit header
NumBytesInRbsp = 0
for ( i = 2.1 < NumBytesInNalUnit: 1- + )
if( i + 2 < NumBytesInNalUnit && next bits ( 24 ) = = Ox000003 {
rbsp byte NumBytesInRbsptt b( 8 )
rbsp byte NumBytesInRbsp ++ 1
1 + 2
emulation prevention three byte / * equal to 0x03 */
} else
rbsp byte NumBytesInRbsp ++ b(8 )

( b ) nalforbidden
unit header { Descriptor
zero bit.
nal_unit type ul ( 6 )
nuh laver id U (6 )
nuh temporal id plus 1 U (3)

(c ) nal unit( NumBytesInNalUnit ) { Descriptor


nal unit header ( )
nal unit header extension flag
if (nal unit header extension flag )
nal unit header exto
VumBytesInRbsp = 0
for( i = 2 ; i < NumBytesin NalUnit ; i++ )
if ( i + 2 NumBytesInNalUnit && next bits ( 24 ) -- 0X000003 ) {
rbsp byte NumBytesinRbsp ++ ] b ( 8)
rbsp byte NumBytesinRbsp + 1 b(8)
i += 2
emulation prevention three byte * equal to 0x03 * ! f (8 )
else
rbsp byte NumBytesInRbsptt (8)
1

(d) nal unit header Descriptor


forbidden Zero bil f( 1 )
nal unit_type U (6 )
nuh layer id (6)
nuh temporal_id plust u (3)
if (nal unit type RSV VCL31 )
slice id LV )

(e ) nalif (nal
unitunit
headertypeextension () { Descriptor
( RSV VCL31)
slice id

FIG . 17
Patent Application Publication May 6 , 2021 Sheet 18 of 50 US 2021/0136407 A1

2002
SLICE DECODER

301 ADDITION UNIT 305 SLICE DECODED


CODING IMAGE TOS
SLCE Tes INVERSE 312
ENTROPY QUANTIZATION AND LOOP FILTER
DECODER INVERSE TRANFORM
PROCESSING UNT

DECODING VALUE OF SYNTAX PREDICTION


INDICATION ELEMENT PICTURE BLOCKP
SIGNAL
302 308
PREDICTION PREDICTION IMAGE
PARAMETER DECODER GENERATION UNIT
303 309
INTER PREDICTION INTER PREDICTION
PARAMETER PREDICTION
DECODER PARAMETER GENERATION UNIT

304
INTRA PREDICTION INTRA PREDICTION
PARAMETER IMAGE
DECODER GENERATION UNIT

306
PREDICTION PARAMETER PREDICTION REFERENCE
PARAMETER PICTURE
MEMORY MEMORY

FIG . 18
Patent Application Publication May 6 , 2021 Sheet 19 of 50 US 2021/0136407 A1

50 (VERTICAL /VER )
34 ( UPPER LEFT DIAGONAL/ DIA ) 42 58 66 ( UPPER RIGHT DIAGONAL /VDIA )

26

18 (HORIZONTAL/HOR )
0 :Planar (PLANAR PREDICTION )
1: D ( PLANAR PREDICTION )
2-66 : Angular/DIRECTIONAL PREDICTION )
IN CASE OF CHROMINANCE PIXELS ,
Planar , VER, HOR , DC , VDIR AND
LM PREDICTION ( LINEAR PREDICTION FROM LUMINANCE PIXEL VALUES )
DM PREDICTION (DIVERSION OF INTRA PREDICTION MODE OF LUMINANCE )
ARE AVAILABLE .
2

FIG . 19
Patent Application Publication May 6 , 2021 Sheet 20 of 50 US 2021/0136407 A1

( a) ( xRSS , YR $ s ) (b)
PADDING
AL A JAR

TARGET BLOCK
RECTANGULAR SLICE ERS L
BL
TARGET PICTURE Pcur

WRS (c )

COLLOCATED BLOCK
(d ) C
CTU
BR
TARGET BLOCK

TARGET RECTANGULAR SLICE


2 (f)

?
BR
(e)

1 COLLOCATED BLOCK

COLLOCATED RECTANGULAR LOWER RIGHT BLOCK OF


SLICE COLLOCATED BLOCK BR

FIG . 20
Patent Application Publication May 6 , 2021 Sheet 21 of 50 US 2021/0136407 A1

(a)
-1 0 1 2 3 4 5 6

0
1 UNFILTERED REFERENCE IMAGE r [ x ] [- 1 ]
2
3
"UNFILTERED REFERENCE IMAGE r [ -1 ] [ - 1 ]

5
6
7
PREDICTION TARGET BLOCK Pred [ x ] [ y ]
UNFILTERED REFERENCE IMAGE 8 ( -1 ) [ y ]

-1 0 1 2 3 4 5 6
-1
0
1 FILTERED REFERENCE IMAGE S [ X ] [ - 1 ]
2
3 FILTERED REFERENCE IMAGES [ -1 ] [-1 ]
4
5
6 TEMPORARY PREDICTION IMAGE q [ x ] [y ]
7
FILTERED REFERENCE IMAGE S [-1 ] [ y ]
?

FIG . 21
Patent Application Publication May 6 , 2021 Sheet 22 of 50 US 2021/0136407 A1

?? 310
INTRA PREDICTION IMAGE GENERATION UNIT

PREDICTOR
-302
PREDICTION PTI
PARAMETER 31041
DECODER PLANAR
PREDICTOR
-3101 REFERENCE
FILTERED 31042 -3108
PREDICTION DC PREDICTION
TARGET BLOCK IMAGE
CONFIGURATION PREDICTOR CORRECTION
UNT TEMPORARY UNIT PREDICTION
31043 PREDICTION
MAGE MAGE
3103 ANGULAR (CORRECTED
PREDICTOR
FILTERED
REFERENCE
PREDICTION
PREDICTION WAGE 31044 IMAGE )
CONFIGURATION
PARAMETER UNI
MEMORY PREDICTOR

3102
UNFILTERED
REFERENCE
IN4G8
CONFIGURANON
UNT

P
-306 -313
REFERENCE
PICTURE TABLE MEMORY
MEMORY
3131
FILTER INTENSITY
COEFFICIENT TABLE

FIG . 22
Patent Application Publication May 6 , 2021 Sheet 23 of 50 US 2021/0136407 A1

(a)
UNFILTERED REFERENCE IMAGE dB DOWNSAMPLED UNFILTERED
B r [x ] [ -1 ] REFERENCE IMAGE drl (x11-1]
DOWNSAMPLED DECODED
IMAGE dul [x ] [ y ]
DECODED IMAGE
UL [x ] [y ] DOWNSAMPLED UNFILTERED
REFERENCE IMAGE drL (-1) [y ]
UNFILTERED REFERENCE IMAGE
r [ -1 ] [ y ]

(c) (d )
UNFILTERED REFERENCE IMAGE UNFILTERED REFERENCE IMAGE
rG5 [ x ] [ - 1 ] rCr [ x ] [ -1 ]
PREDICTION TARGET BLOCK PREDICTION TARGET BLOCK
qCb [x ] [ y ] qor [x ] [ y ]
UNFILTERED REFERENCE IMAGE UNFILTERED REFERENCE IMAGE
rCo[ - ] [ y ] rCr [ -1 ] [y ]

FIG . 23
Patent Application Publication May 6 , 2021 Sheet 24 of 50 US 2021/0136407 A1

(a)
31044
LM PREDICTOR
4101
CCLM
PREDICTOR

4102
MMLM
PREDICTOR

31044
LM PREDICTOR
$4103
SWITCHING 4101
CCLM
PREDICTOR
4102
MMLM
PREDICTOR

FIG . 24
Patent Application Publication May 6 , 2021 Sheet 25 of 50 US 2021/0136407 A1

(a)
Pred [ x ] [ y ] { (civ > k [ y ]) * r [ x ] [-1 ] + (cth >> k [x ]) * r [-1 ] [ y]
( ( c2v >> k [ y ] ) + ( c2h >> k [ x] ) ) * r [ -1 ] [- 1 ] +
b [ x ] [ y ] *q [ x ] [ y ] + ( 1 << ( rshift - 1 ) ) ) >> rshift

( b)
b [x ] [ y ] = { ( 1 << rshift ) - ( civ >> k [ y ] ) ( cth > / k [ x ] ) +
( (c2v >>k [ y ] ) + (c2h >> k [ x ]) ) )

(
)
c
k [ x] = floor ( x/ dx )
k [ y ] = floor ( y / dy)

(d)
W <= 16 W> 16
dx 1 2

FIG . 25
Patent Application Publication May 6 , 2021 Sheet 26 of 50 US 2021/0136407 A1

(a)

11:01
r [-1 ] [- 1 ] r [0 ] [ - 1 ]
r [W- 1 ] [ -1 ]

UNFILTERED REFERENCE IMAGE


r [ x ] [-1 ]
TEMPORARY PREDICTION IMAGE
q [ x ] [y ]

LEFT SIDE RECTANGULAR SLICE BOUNDARY

(b)

r[-111-117 UPPER SIDE RECTANGULAR


r [-1 ] [ 0 ] SLICE BOUNDARY
?

r [-1 ] [ H - 1 )
TEMPORARY PREDICTION IMAGE
q [ x] [y ]

UNFILTERED REFERENCE IMAGE


r [-1 ] [ y ]

FIG . 26
Patent Application Publication May 6 , 2021 Sheet 27 of 50 US 2021/0136407 A1

(a )
Pred [ x ] [ y] =
{ (civ >> k [y ]) * r [x ] [- 1 ] - (c2v >>k [ y ] ) * r [0 ] [ - 1 ] +
b [x ] [ y] *q [ x ] [ y ] + ( 1 << ( rshift - 1 ) ) } >> rshift
b [ x ] [ y] : ( ( 1 << rshift) - ( c1v >> k [ y ] ) + ( c2v>> k [ y ] ) }
(b )
Pred [ x ] [ y ] { (civ >> k [ y ] ) * r [x ] [- 1 ] - (c2v >>k [ y] ) * r [W- 1 ] [ - 1 ] +
b [ x ] [ y] * q [ x ] [ y ] + ( 1 << ( rshift - 1 ) ) } >> rshift
b [ x] [ y ] : ( ( 1 << rshift ) - ( civ >> k [ y ] ) + (c2v >> k [ y ] ) }
( c)
Pred [ x ] [ y ] = ( (cth >> k [ x ]) * r [-1 ] [ y ] - (c2h >>k [ x ] ) * r [ -1 ] [ O] +
b [x ] [ y ] * q [ x ] [ y ] + ( 1 << ( rshift - 1 ) ) } >> rshift
b [ x ] [y] : { ( 1 << rshift ) - ( c1h >> k [ x ] ) + (c2h >> k [ x ] ) }
(d)
Pred [ x ] [ y ] = { ( cth >> k [ x ] ) * r [ -1 ] [ y ] - (c2h >>k [ x] ) * r [ -1 ] [ H- 1 ] +
b [ x ] [ y ] * q [ x ] [ y ] + (1 << (rshift - 1) ) } >> rshift
b [ x] [y] = ( ( 1 << rshift ) - ( c1h >>k [ x ] ) + (c2h >>k [x ]) }

FIG . 27
Patent Application Publication May 6 , 2021 Sheet 28 of 50 US 2021/0136407 A1

303
INTER PREDICTION PARAMETER DECODER 307
PREDICTION
I
PARAMETER
predFlaglx MEMORY
-301 3031 merge i dx 3036
ENTROPY INTER PREDICTION MARGE PREDICTION MVLX
DECODER PARAMETER DECODING PARAMETER
CONTROL UNIT DERIVATION UNIT

3037 SUBBLOCK PREDICTION 3038


PARAMETER
DERIVATION UNIT PREDICTOR
subPbMotionFlag 30371
SPATIAL - TEMPORAL
SUBBLOCK PREDICTOR
30372
AFFINE
PREDICTOR 1

30373
MATCHING
PREDICTOR
30374
OBMC PREDICTOR 309

3039
INTER
LIC PREDICTION
PREDICTOR IMAGE
refldxLX GENERATION
predFlaglX 3032 UNIT

AMVP PREDICTION PARAMETER


DERIVATION UNIT
mvp_LX idx 30321
AFFINE
PREDICTOR

PREDICTION VECTOR mvpLX


mydlX 3035 MvLX
ADDITION UNIT

FIG . 28
Patent Application Publication May 6 , 2021 Sheet 29 of 50 US 2021/0136407 A1

-307
PREDICTION PARAMETER
MEMORY -3031
INTER PREDICTION
PARAMETER DECODING
predFlagLX CONTROL UNIT
refldxLX
mvLX merge idx
3036
MARGE PREDICTION
PARAMETER DERIVATION UNIT
Vr30361 30363 -30362
MERGE CANDIDATE MERGE CANDIDATE MERGE CANDIDATE
DERIVATION UNIT STORAGE UNIT SELECTION UNIT predFlagLX
mergeCandList [ ] refldxLX
MVLX

FIG . 29
Patent Application Publication May 6 , 2021 Sheet 30 of 50 US 2021/0136407 A1

( a)

TARGET BLOCK
IMV
PCur
IRef
RefpicListx [ 0 ]
RefPicListX [ 1 ]

RefPicList [Refldx ]
(b )

spMvLX [ OJ [ OH RefPiclistx [ 0 ]
SpMvLx [k ] [ 1 ]

X BMV
spRefMvLxK
CURRENT PU
PCur BRef

FIG . 30
Patent Application Publication May 6 , 2021 Sheet 31 of 50 US 2021/0136407 A1

SPATIAL MERGE CANDIDATE ( BLOCK L)


SPATIAL MERGE CANDIDATE ( BLOCKA )
SPATIAL MERGE CANDIDATE ( BLOCK AR )
SPATIAL MERGE CANDIDATE ( BLOCK BL)
SPATIAL - TEMPORAL MERGE CANDIDATE (ATMVP)
SPATIAL - TEMPORAL MERGE CANDIDATE ( STMVP )
SPATIAL MERGE CANDIDATE ( BLOCK AL )
TEMPORAL MERGE CANDIDATE ( TMVP )
JOINT MERGE CANDIDATE
ZERO MERGE CANDIDATE

FIG . 31
Patent Application Publication May 6 , 2021 Sheet 32 of 50 US 2021/0136407 A1

start
S2301
SEARCH OF ADJACENT BLOCKS

S2302
ARE THERE AVAILABLE N
ADJACENT BLOCKS ?
Y
S2303
SET IMV AND IRef
S2304
SEARCH BMV AND BRef

IS THERE BMV S2305


IN WHICH BLOCK POINTS INSIDE OF N
COLLOCATED RECTANGULAR
SLICE ?

S2306
ACQUIRE spRefMvLX ,
SpRef OF SUBBLOCK
S2307
SCALING OF SpRefMvLX

S2308
N IS spMvLX OF ALL
SUBBLOCKS IN RECTANGULAR
SLICE?
$ 2300 V
PROCESSING 1 Y

S2310 S2311
STORE ATMVP IN MERGE DO NOT STORE ATMVP IN
CANDIDATE LIST MERGE CANDIDATE LIST

end

FIG . 32
Patent Application Publication May 6 , 2021 Sheet 33 of 50 US 2021/0136407 A1

(a ) (b)
LEFT SIDE ADJACENT BLOCK
UPPER SIDE ADJACENT BLOCK Abr
C d
AC B'C
1
1 B?
?I

b AB A
a IC D C D B' br
D'br
C'
COLLOCATED
TARGET BLOCK SUBBLOCK C ' br BLOCK
D C
TARGET PICTURE PCur REFERENCE PICTURE

(0) (d)
mvc myd mvc myd
myb spMvLX mvb
SpMvLX SpMvLX
[A ] [A ] [ B ]
mya mya

(e) ( f) ( g) Abr B'br


mvc myd myc myd A' B '
spMvLXIspMvLX C'bric ' '
mvb spMvLXIspMvLX
IA ] [B ] mvb [A] [B]
L

spilvLX SpMvLX SpMvLX D'br


mva mya
[C ] [C ] [ D]

REFERENCE PICTURE
COLLOCATED BLOCK

FIG . 33
Patent Application Publication May 6 , 2021 Sheet 34 of 50 US 2021/0136407 A1

start
S2601
SUBBLOCK PARTITIONING

S2602
SEARCH FOR UPPER SIDE ,
LEFT SIDE , TEMPORAL
DIRECTION ADJACENT BLOCKS

ARE THERE $ 2603


AVAILABLE ADJACENT
S2609 BLOCKS ?
NEXT SUBBLOCK Y
$ 2604
SCALING OF MV OF
ADJACENT BLOCKS
S2605
MV CALCULATION OF
TARGET SUBBLOCK

IS REFERENCE S2606
BLOCK POINTED BY SpMvLX
INSIDE OF RECTANGULAR
VS2607 SLICE ?
PROCESSING 2 Y

S2608
N
LAST SUBBLOCK ?

YV S2610 S2611
STORE STMVP IN MERGE DO NOT STORE STMVP
CANDIDATE LIST IN MERGE CANDIDATE LIST

end

FIG . 34
Patent Application Publication May 6 , 2021 Sheet 39 of 50 US 2021/0136407 A1

(a ) (b )
start start

S3201
INITIAL VECTOR CANDIDATE
1
IS Temp_Cur S3211
Y
CONFIGURATION 1
1
1 INSIDE OF RECTANGULAR
SLICE ?
S3202 S3213)
BLOCK LEVEL INITIAL 1 S3212 TEMPLATE
VECTOR SEARCH PROCESSING 6 ACQUISITION
1 1
1
1

DOES INITIAL S3203 1


S3201
1
1

VECTOR POINT OUTSIDE INITIALVECTORCANDIDATE 1

OF RECTANGULAR CONFIGURATION
SLICE ? S3202
V S3204 BLOCK LEVEL INITIAL
PROCESSING 5 VECTOR SEARCH 1
1

S3205 DOES INITIAL S3203 .


1
1
1 VECTOR POINT OUTSIDE 1
1 OF RECTANGULAR 1
BLOCK LEVEL LOCAL SEARCH 1
SLICE ?
I
Y S3204 1
BLOCK LEVEL
PROCESSING 5 1
}
1
ALL SUBBLOCKS
S3205 1
1
-S3206 BLOCK LEVEL LOCAL SEARCH
SUBBLOCK LEVEL INITIAL
VECTOR SEARCH BLOCK LEVEL
S3207
SUBBLOCK LEVEL ALL SUBBLOCKS 1
LOCAL SEARCH ?
S3214 1
TEMPLATE ACQUISITION I

t
1
S3206 1
1
1 SUBBLOCK LEVEL 1
SUBBLOCK LEVEL INITIAL
1 VECTOR SEARCH
1
end S3207
SUBBLOCK LEVEL I

LOCAL SEARCH

SUBBLOCK LEVEL
end

FIG . 39
Patent Application Publication May 6 , 2021 Sheet 40 of 50 US 2021/0136407 A1

( xRSSYRSs )

( xPosX.yPosX ) DIY
HRS
D2x RECTANGULAR SLICE
29
Dlx H? *****
W
***
D2y

WRS

FIG . 40
Patent Application Publication May 6 , 2021 Sheet 41 of 50 US 2021/0136407 A1

BLOCK BOUNDARY
( a) ADJACENT BLOCK OF SCU1 [3] [0]
PU
SCU1 [3 ] [0]
BLOCK BOUNDARY
ADJACENT BLOCK OF SCU2 [O ] [2]
BLOCK 1

SCU2 [ 0 ] [ 2 ] BLOCK BOUNDARY

? BLOCK 2
SUBBLOCK SCU

( b)

ADJACENT BLOCK
BOA
VU
TARGET SUBBLOCK
SCU [ 4 ] [ 2] TARGET SUBBLOCK OF
OBMC PROCESSING

(c )

ADJACENT SUBBLOCK SCU3 [0 ] [ 0]


BLOCK
BLOCK CU3
ADJACENT BLOCK
( INTRA)
ADJACENT BLOCK
( INTER) SUBBLOCK SCU4 [ 3 ] [ 0 ]
-RECTANGULAR SLICE BLOCK CU4

SUBBLOCK SCU4 [O ] [ O ]

FIG . 41
Patent Application Publication May 6 , 2021 Sheet 42 of 50 US 2021/0136407 A1

(a ) (b )
start start
S3411
Pred? I I GENERATION

$ 3401 N
ARE THERE ? = 14
ADJACENT BLOCKS ?
S3413
N
obmc_flag [ i ] = 1 ?
IS S3402
ADJACENT BLOCK INTRA Y
PREDICTION ? Y S3414
PredRN | I GENERATION
N
IS MV OF - S3403 Y -S3415
TARGET SUBBLOCK AND WEIGHTED
ADJACENT BLOCK AVERAGE
IDENTICAL ?
Vr $ 3404
N
obmc_flag [ i ] =0
DOES MV POINT
S3405
Y
INSIDE OF RECTANGULAR
SLICE ? S3416
N PredLX [ ] [ ] = PredC [ ] [ ]
S3406
PROCESSING 3
end
S3407
obmc_flag [ i ] = 1

end

FIG . 42
Patent Application Publication May 6, 2021 Sheet 43 of 50 US 2021/0136407 A1

(a) Template
Gur block

Refo CurPic
Template

Cur block

Cur Pic Ref1

( b) BTM S3501
PROCESSING TEMPLATE ACQUISITION

MACCURACY SEARCH LO PROCESSING $ 3502


NACCURACY SEARCH LO PROCESSING 7 S3503
V
MACCURACY SEARCH L1 PROCESSING S3504
V S3505
NACCURACY SEARCH L1 PROCESSING

(0 ) Cur_Temp

myLO myL1

mVLO ' mvL1


Cur_block

FIG . 43
Patent Application Publication May 6 , 2021 Sheet 44 of 50 US 2021/0136407 A1

307
PREDICTION PARAMETER
MEMORY 3031
INTER PREDICTION
predFlagLX PARAMETER DECODING
CONTROL UNIT
refldxLX
MVLX mvp LX_idx
3032
AMVP PREDICTION PARAMETERI
DERIVATION UNIT
3033 3036 3034
VECTOR CANDIDATE VECTOR CANDIDATE MERGE CANDIDATE
DERIVATION UNIT STORAGE UNIT SELECTION UNIT predFlagLX
mypListLX [ ] refldxLX
mvLX

FIG . 44
Patent Application Publication May 6 , 2021 Sheet 45 of 50 US 2021/0136407 A1

Cur_Temp
b
(
)
Ref_Temp )
a
(

Cur Block
mvLX

Cur_Pic
Refo
PIXEL USED FOR LINEAR PREDICTION

FIG . 45
Patent Application Publication May 6 , 2021 Sheet 46 of 50 US 2021/0136407 A1

309
INTER PREDICTION IMAGE GENERATION UNIT

REFERENCE
PICTURE 3091 3094 ADDITION
MEMORY MOTION UNIT
306 COMPENSATION WEIGHT
PREDICTOR 312
UNIT

predFlagLX
refldxLX
MvLX.

INTER PREDICTION
PARAMETER DECODER 303

FIG . 46
Patent Application Publication May 6 , 2021 Sheet 47 of 50 US 2021/0136407 A1

2012
SLICE CODER
SUBTRACTION CODING
UNIT SLICE Tes
RESIDUAL
IMAGET SIGNAL TRANFORM AND ENTROPY
QUANTIZATION CODER

PREDICTION IMAGE P
109 105
PREDICTION REFERENCE - 106 INVERSE
MAGE QUANTIZATON AND
GENERATION PICTURE FILTER INVERSE TRANFORM
MEMORY FROCESSING UNIT

PREDICTION PARAMETER

- 108
PREDICTION PREDICTION
PARAMETER PARAMETER CODER
MEMORY 112
INTER PREDICTION CODING
PARAMETER CODER PARAMETER
113
INTRA PREDICTION
PARAMETER CODER

PREDICTION
PARAMETER 110
CODING PARAMETER CODE AMOUNT
DETERMINATION UNIT

FIG . 47
Patent Application Publication May 6 , 2021 Sheet 48 of 50 US 2021/0136407 A1

( 110 108 101


PREDICTION PREDICTION PREDICTION
PARAMETER PARAMETER IMAGE
DETERMINATION UNIT MEMORY GENERATION UNIT
predFlagLX I
1
predFlagLx
refldxlx refldxLX
mvp LXidx mvLX
mvLX subMvLX merge idx
ref idx LX
LIC PREDICTOR 1127 VdLXLX idx
MVP 104
ENTROPY
CODER
SUBBLOCK PREDICTION 1125
PARAMETER DERIVATION UNIT
SPATIAL - TEMPORAL 11251
SubPbMotionFlag : SUBBLOCK
PREDICTOR
AFFINE PREDICTOR 11252 subMvLX
MATCHING 11253
PREDICTOR
OBVC PREDICTOR 11254

AMVP PREDICTION PARAMETER


DERIVATION UNIT Ir 1122
AFFINE PREDICTOR
11221

FpLX
SIIVLX mydLX
1123 1126
1121 BTM
PREDICTOR
INTER PREDICTION PARAMETER
CODING CONTROL UNIT
MERGE INDEX DERIVATION UNIT
11211
VECTOR CANDIDATE INDEX 111212 merge idx
DERIVATION UNIT myp_L_idx

INTER PREDICTION PARAMETER CODER


112

FIG . 48
Patent Application Publication May 6 , 2021 Sheet 49 of 50 US 2021/0136407 A1

(a)
PROD A4
PROD A
CAMERA
[

PROD_A5 PROD_A1 PROD A2 PROD_A3


RECORDING CODER MODULATION TRANSMITTER
MEDIUM UNIT

PROD A6
INPUT
TERMINAL

PROD A7
IMAGE
PROCESSING
UNIT

( b)
PROD_B4
PROD B
DISPLAY

PROD B1 PROD B2 PROD B3 PROD B5

RECEIVER DEMODULATION DECODER RECORDING


UNIT MEDIUM

PROD_B6

TERMINAL

FIG . 49
Patent Application Publication May 6 , 2021 Sheet 50 of 50 US 2021/0136407 A1

(a)
PROD C3
PROD C
CAMERA
r

PROD_C4 PROD_C1 PROD C2 PROD M


INPUT CODER WRITING RECORDING
TERMINAL UNIT MEDIUM

PROD_C5
RECEIVER

PROD_C6
IMAGE
PROCESSING
UNIT

(b)
PROD 03
PROD D
DISPLAY

PROD M PROD_D1 PROD D2 PROD_D4


RECORDING READING OUTPUT
MEDIUM DECODER
UNIT TERMINAL

PROD 05
TRANSMITTER

FIG . 50
US 2021/0136407 A1 May 6 , 2021
1

VIDEO CODING APPARATUS AND VIDEO displayed on the display. In a smartphone or an HMD , the
DECODING APPARATUS capacity of the battery is not large , so a mechanism is
expected to be able to view the video with minimal decoding
TECHNICAL FIELD processing, with some regions necessary for display being
extracted .
[ 0001 ] The embodiments of the present invention relate to
a video decoding apparatus and a video coding apparatus. CITATION LIST
BACKGROUND ART
Non Patent Literature
[ 0002 ] A video coding apparatus ( image coding apparatus)
which generates coded data by coding a video , and a video [ 0008 ] NPL 1 : “ Algorithm Description of Joint Explora
decoding apparatus (image decoding apparatus ) which gen tion Test Model 6 ” , JVET - F1001 , Joint Video Exploration
erates decoded images by decoding the coded data are used Team ( JVET) of ITU - T SG 16 WP 3 and ISO/IEC JTC
to transmit or record a video efficiently. 1 / SC 29 / WG 11 , 31 March - April 2017
[ 0003 ] For example, specific video coding schemes [ 0009 ] NPL 2 : ITU - T H.265 ( April /2015) SERIES H :
include schemes proposed in H.264 / AVC and High -Effi AUDIOVISUAL AND MULTIMEDIA SYSTEMS Infra
ciency Video Coding (HEVC ) . structure of audiovisual services Coding of moving
[ 0004 ] In such a video coding scheme, images (pictures ) video High efficiency video coding
constituting a video are managed by a hierarchy structure
including slices obtained by partitioning images , Coding SUMMARY OF INVENTION
Tree Units (CTU ) obtained by partitioning slices , coding
units (also sometimes referred to as Coding Units ( CUs ) ) Technical Problem
obtained by partitioning coding tree units, and Prediction
Units ( PUS) which are blocks obtained by partitioning [ 0010 ] Meanwhile , a slice and a tile coexist in a single
coding units, and Transform Units ( TUS ) , and are coded / picture, and there is a case that the slice is further partitioned
decoded for each CU . into tiles and a CTU is included in a tile of the tiles , or a case
[ 0005 ] In such a video coding scheme, usually, prediction that the tile is further partitioned into slices and a CTU is
images are generated based on local decoded images included in a slice of the slices . The slices further include an
obtained by coding /decoding input images, and prediction independent slice and a dependent slice , causing the coding
residuals ( also sometimes referred to as “ difference images ” structure to be complex.
or “ residual images ” ) obtained by subtracting the prediction [ 0011 ] The slice and the tile have a common advantage
images from the input images ( original images ) are coded . and disadvantage , except that they differ in shape. For
Examples of generation methods of prediction images example , decoding can be performed in parallel without
include an inter -picture prediction (an inter prediction) and reference to information outside of a target slice or outside
an intra -picture prediction ( an intra prediction ) (NPL 1 ) . of a target tile in the single picture, but there is a problem in
[ 0006 ] In recent years, with the evolution of processors that the entire video needs to be reconstructed to decode
such as a multi - core CPU and a GPU , configurations and some regions of the video ( one slice or tile , or a limited
algorithms that are easy to perform parallel processing have number of slices or tiles ) as a sequence.
been employed in video coding and decoding processing. As [ 0012 ] There is also a problem in that the code amount of
an example of a configuration that is easy to be parallel, a intra pictures required for random access is very large.
picture partitioning unit of a slice ( Slice ) and a tile ( Slice )
has been introduced . A slice is a set of multiple continuous [ 0013 ] There is also a problem in that only the tile
CTUS , with no constraints on shape. A tile is different from requested from an application or the like cannot be extracted
a slice and is a rectangular region into which a picture is with reference only to a NAL unit header.
partitioned. In both , in a single picture, a slice or a tile is [ 0014 ] Therefore, the present invention has been made in
decoded without reference to information ( a prediction view of the above problems, and an object thereof is to
mode , an MV, a pixel value) outside of the slice or outside introduce a rectangular slice including the slice and the tile
of the tile . Therefore, a slice or a tile can be decoded put together to simplify the coding structure . This reduces
independently in a single picture (NPL 2 ) . However, for a unnecessary information related to a slice boundary or the
slice or a tile , in a case of referring to a different picture ( a like .
reference picture ) that has been already decoded , for an inter [ 0015 ] The present invention provides a mechanism for
prediction, the information ( a prediction mode , an MV, a ensuring independent decoding of the rectangular slice or a
pixel value) to which a target slice or a target tile refers on set of the rectangular slices in the spatial direction and the
a reference picture is not always information of the same temporal direction while suppressing a decrease of the
position as the target slice or the target tile on the reference coding efficiency.
picture , so the entire video is required to be regenerated even
in a case of regenerating only some regions of the video (one [ 0016 ] The present invention reduces the maximum code
slice or tile , or a limited number of slices or tiles ) . amount per picture by differently configuring an intra picture
[ 0007] In addition, in recent years , high resolution of insertion timing or period of the slice that can independently
videos has been advanced, which is represented by 4K , 8K , be decoded for each slice sequence . By signalling the
or VR , or videos that take up the 360 degree omnidirectional insertion period as coded data, random access is facilitated .
orientation such as 360 degree video . In a case of viewing [ 0017] The present invention facilitates the bitstream of
these images by a smartphone or a Head Mount Display independent slices by providing an extended region in a
( HMD ) , a portion of the high resolution video is cut out and NAL unit header and signalling a slice identifier Sliceld .
US 2021/0136407 A1 May 6 , 2021
2

Solution to Problem [ 0025 ] FIG . 5 is a diagram illustrating shapes of rectan


[ 0018 ] A video coding apparatus according to an aspect of gular slices .
the present invention includes : in coding of slices into which [ 0026 ] FIG . 6 is a diagram illustrating a rectangular slice .
a picture is partitioned, a first coder unit configured to code [ 0027] FIG . 7 is a syntax table related to rectangular slice
a sequence parameter set including information related to information and the like .
multiple pictures, a second coder unit configured to code [ 0028 ] FIG . 8 is a diagram illustrating a syntax of a
information indicating a position and a size of a slice on the general slice header.
picture; a third coder unit configured to code the picture in [ 0029 ] FIG . 9 is a syntax table related to insertion of an I
slice .
slice units, and a fourth coder unit configured to code a NAL ( 0030 ) FIG . 10 is a diagram illustrating reference of rect
header unit , wherein the first coder unit codes a flag indi angular slices in the temporal direction .
cating whether a shape of a slice is rectangular or not , a [ 0031 ] FIG . 11 is a diagram illustrating a syntax of a
position and a size of rectangular slices with a same slice ID rectangular slice header.
is not changed in a period of time in which each picture
refers to a same sequence parameter set in a case that the flag [ 0032 ] FIG . 12 is a diagram illustrating a temporal hier
indicates that a shape of a slice is rectangular, and the archy structure .
rectangular slices are coded independently without reference [ 0033 ] FIG . 13 is a diagram illustrating an insertion inter
to information of other slices within a picture and without val of an I slice .
reference to information of other rectangular slices among [ 0034 ] FIG . 14 is another diagram illustrating an insertion
pictures. interval of an I slice .
[ 0019 ] A video decoding apparatus according to an aspect [ 0035 ] FIG . 15 is a block diagram illustrating configura
of the present invention includes : in decoding of slices into tions of a video coding apparatus and video decoding
which a picture is partitioned, a first decoder unit configured apparatus according to the present invention .
to decode a sequence parameter set including information [ 0036 ] FIG . 16 is a flowchart illustrating operations
related to multiple pictures; a second decoder unit config related to an insertion of an I slice .
ured to decode information indicating a position and a size [ 0037] FIG . 17 is a syntax table related to a NAL unit and
of a slice on the picture ; a third decoder unit configured to a NAL unit header .
decode the picture in slice units , and a fourth decoder unit [ 0038 ] FIG . 18 is a diagram illustrating a configuration of
configured to decode a NAL header unit , wherein the first a slice decoder according to the present embodiment.
decoder unit decodes a flag indicating whether a shape of a [ 0039 ] FIG . 19 is a diagram illustrating intra prediction
slice is rectangular or not , a position and a size of rectangular modes .
slices with a same slice ID is not changed in a period of time [ 0040 ] FIG . 20 is a diagram illustrating rectangular slice
in which each picture refers to a same sequence parameter boundaries and a positional relationship between a target
set in a case that the flag indicates that a shape of a slice is block and a reference block .
rectangular, and the rectangular slices are decoded without [ 0041 ] FIG . 21 is a diagram illustrating a prediction target
reference to information of other slices within a picture and block and an unfiltered / filtered reference image .
without reference to information of other rectangular slices [ 0042 ] FIG . 22 is a block diagram illustrating a configu
among pictures. ration of an intra prediction image generation unit.
Advantageous Effects of Invention [ 0043 ] FIG . 23 is a diagram illustrating a CCLM predic
tion process .
[ 0020 ] According to an aspect of the invention , a scheme [ 0044 ] FIG . 24 is a block diagram illustrating a configu
is introduced that simplifies the hierarchy structure of coded ration of a LM predictor.
data and also ensures independence of coding and decoding [ 0045 ] FIG . 25 is a diagram illustrating a boundary filter.
of each rectangular slice for each individual tool. Accord [ 0046 ] FIG . 26 is a diagram illustrating reference pixels of
ingly , each rectangular slice can be independently coded and a boundary filter at a rectangular slice boundary.
decoded while suppressing a decrease in the coding effi [ 0047] FIG . 27 is another diagram illustrating a boundary
ciency. By controlling the intra insertion timing , the maxi filter .
mum code amount per picture can be reduced and the [ 0048 ] FIG . 28 is a diagram illustrating a configuration of
processing load can be suppressed . As a result , the region an inter prediction parameter decoder according to the
required for display or the like can be selected and decoded , present embodiment.
so that the amount of processing can be greatly reduced . [ 0049 ] FIG . 29 is a diagram illustrating a configuration of
BRIEF DESCRIPTION OF DRAWINGS a merge prediction parameter derivation unit according to
the present embodiment.
[ 0021 ] FIG . 1 is a schematic diagram illustrating a con [ 0050 ] FIG . 30 is a diagram illustrating an ATMVP pro
figuration of an image transmission system according to the cess .
present embodiment. [ 0051 ] FIG . 31 is a diagram illustrating a prediction vector
[ 0022 ] FIG . 2 is a diagram illustrating a hierarchy struc candidate list (merge candidate list) .
ture of data of a coding stream according to the present [ 0052 ] FIG . 32 is a flowchart illustrating operations of the
embodiment. ATMVP process .
[ 0023 ] FIG . 3 is a conceptual diagram illustrating an [ 0053 ] FIG . 33 is a diagram illustrating an STMVP pro
example of reference pictures and reference picture lists . cess .
[ 0024 ] FIG . 4 is a diagram illustrating general slices and [ 0054 ] FIG . 34 is a flowchart illustrating operations of the
rectangular slices . STMVP process .
US 2021/0136407 A1 May 6 , 2021
3

[ 0055 ] FIG . 35 is a diagram illustrating an example of 11 , a network 21 , a video decoding apparatus ( image decod
positions of blocks referred to for derivation of a motion ing apparatus) 31 , and a video display apparatus ( image
vector of a control point in an affine prediction . display apparatus) 41 .
[ 0056 ] FIG . 36 is a diagram illustrating a motion vector [ 0074 ] The video coding apparatus 11 codes an input
SpMvLX [ xi ] [ yi ] for each of subblocks constituting a PU , image T and outputs the coded input image T to the network
which is a target for predicting a motion vector. 21 .
[ 0057] FIG . 37 is a flowchart illustrating operations of the [ 0075 ] The network 21 transmits a coding stream Te
affine prediction . generated by the video coding apparatus 11 to the video
[ 0058 ] FIG . 38 is a diagram for describing Bilateral decoding apparatus 31. The network 21 is the Internet
matching and Template matching. (a ) is a diagram for ( internet ), a Wide Area Network (WAN ), a Local Area
describing Bilateral matching. ( b ) and ( c ) are diagrams for Network (LAN ), or combinations thereof. The network 21 is
describing Template matching. not necessarily a bidirectional communication network , but
[ 0059 ] FIG . 39 is a flowchart illustrating operations of a may be a unidirectional communication network configured
motion vector derivation process in a matching mode. to transmit broadcast wave such as digital terrestrial televi
[ 0060 ] FIG . 40 is a diagram illustrating a search range of sion broadcasting and satellite broadcasting. The network 21
a target block. may be substituted by a storage medium that records the
[ 0061 ] FIG . 41 is a diagram illustrating an example of a coding stream Te, such as a Digital Versatile Disc ( DVD )
target subblock and an adjacent block of OBMC prediction . and a Blue - ray Disc (BD : trade name) .
[ 0062 ] FIG . 42 is a flowchart illustrating a parameter [ 0076 ] The video decoding apparatus 31 decodes each of
derivation process of OBMC prediction . the coding streams Te transmitted by the network 21 , and
[ 0063 ] FIG . 43 is a diagram illustrating a bilateral tem generates one or multiple decoded images Td .
plate matching process . [ 0077] The video display apparatus 41 displays all or part
[ 0064 ] FIG . 44 is a diagram illustrating a configuration of of one or multiple decoded images Td generated by the
an AMVP prediction parameter derivation unit according to video decoding apparatus 31. For example, the video display
the present embodiment. apparatus 41 includes a display device such as a liquid
[ 0065 ] FIG . 45 is a diagram illustrating an example of crystal display and an organic Electro -luminescence ( EL )
pixels used for derivation of a prediction parameter of LIC display . Configurations of the display include stationary,
prediction . mobile , and HMD .
[ 0066 ] FIG . 46 is a diagram illustrating a configuration of
an inter prediction image generation unit according to the Operator
present embodiment. [ 0078 ] Operators used herein will be described below.
[ 0067] FIG . 47 is a block diagram illustrating a configu [ 0079 ] >> is a right bit shift, « < is a left bi tshift, & is a
ration of a slice coder according to the present embodiment. bitwise AND , I is a bitwise OR, and 1 = is an OR assignment
[ 0068 ] FIG . 48 is a schematic diagram illustrating a con operator.
figuration of an inter prediction parameter coder according
to the present embodiment. [ 0080 ] X ? y : z is a ternary operator to take y in a case that
[ 0069 ] FIG . 49 is a diagram illustrating configurations of x is true ( other than 0 ) , and take z in a case that x is false ( 0) .
a transmitting apparatus equipped with a video coding [ 0081 ] Clip3 ( a , b , c ) is a function to clip c in a value equal
apparatus and a receiving apparatus equipped with a video to or greater than a and equal to or less than b , and is a
decoding apparatus according to the present embodiment. function to return a in a case that c is less than a ( c <a ) , return
( a) illustrates the transmitting apparatus equipped with the b in a case that c is greater than b ( c > b ), and return c
video coding apparatus, and (b ) illustrates the receiving otherwise (however, a is equal to or less than b ( a<=b ) ) .
apparatus equipped with the video decoding apparatus. [ 0082 ] abs ( a) is a function to return an absolute value of
[ 0070 ] FIG . 50 is a diagram illustrating configurations of a.
a recording apparatus equipped with the video coding appa [ 0083 ] Int ( a) is a function to return an integer value of a .
ratus and a regeneration apparatus equipped with the video [ 0084 ] floor (a ) is a function to return a maximum integer
decoding apparatus according to the present embodiment. of a or less .
(a ) illustrates the recording apparatus equipped with the [ 0085 ] a/d represents the division of a by d ( rounding off
video coding apparatus, and (b ) illustrates the regeneration decimal point).
apparatus equipped with the video decoding apparatus . [ 0086 ] a % b is the remainder of a .
DESCRIPTION OF EMBODIMENTS Structure of Coding Stream Te
First Embodiment [ 0087] Prior to the detailed description of the video coding
apparatus 11 and the video decoding apparatus 31 according
[ 0071 ] Hereinafter, embodiments of the present invention to the present embodiment, the data structure of the coding
are described with reference to the drawings. stream Te generated by the video coding apparatus 11 and
[ 0072 ] FIG . 1 is a schematic diagram illustrating a con decoded by the video decoding apparatus 31 will be
figuration of an image transmission system 1 according to described .
the present embodiment. [ 0088 ] FIG . 2 is a diagram illustrating the hierarchy struc
[ 0073 ] The image transmission system 1 is a system ture of data in the coding stream Te. The coding stream Te
configured to transmit a coding stream of a coding target includes a sequence and multiple pictures constituting a
image that has been coded , decode the transmitted codes , sequence illustratively. (a ) to ( f) of FIG . 2 are diagrams
and display an image . The image transmission system 1 indicating a coding video sequence prescribing a sequence
includes a video coding apparatus (image coding apparatus) SEQ , a coding picture prescribing a picture PICT, a coding
US 2021/0136407 A1 May 6 , 2021
4

slice prescribing a slice S , a coding slice data prescribing [ 0097] Examples of slice types that can be specified by the
slice data, a coding tree unit included in the coding slice slice type specification information include ( 1 ) an I ( intra)
data, and Coding Units (CUS ) included in the coding tree slice using only an intra prediction in coding , ( 2 ) a P slice
unit , respectively . using a unidirectional prediction or an intra prediction in
coding , and ( 3 ) a B slice using a unidirectional prediction , a
Coding Video Sequence bidirectional prediction , or an intra prediction in coding , and
[ 0089 ] In the coding video sequence, a set of data referred the like . Note that an inter prediction is not limited to a
to by the video decoding apparatus 31 to decode a sequence uni -prediction or a bi - prediction, and a greater number of
SEQ of a processing target is prescribed . As illustrated in (a ) reference pictures may be used to generate a prediction
of FIG . 2 , the sequence SEQ includes a Video Parameter Set image . Hereinafter, in a case of being referred to as a P or
VPS , a Sequence Parameter Set SPS , a Picture Parameter B slice , such slice refers to a slice that includes a block that
Set PPS , a picture PICT, and Supplemental Enhancement may employ an inter prediction .
Information SEI . Here, the numbers after # indicate the [ 0098 ] Note that, the slice header SH may include a
numbers of the parameter sets or the pictures. reference (pic_parameter_set_id ) to the picture parameter
[ 0090 ] In the video parameter set VPS , in a video includ set PPS included in the coding video sequence .
ing multiple layers, a set of coding parameters common to Coding Slice Data
multiple videos and a set of coding parameters associated
with multiple layers and individual layers included in a [ 0099 ] In the coding slice data , a set of data referred to by
video are prescribed. the video decoding apparatus 31 to decode the slice data
[ 0091 ] In the sequence parameter set SPS , a set of coding SDATA of a processing target is prescribed . As illustrated in
parameters referred to by the video decoding apparatus 31 to (d) of FIG . 2 , the slice data SDATA includes Coding Tree
decode a target sequence is prescribed. For example, the Units ( CTUS , CTU blocks ) . Such CTU is a block of a fixed
width and the height of a picture are prescribed. Note that size ( for example, 64x64 ) constituting a slice , and may be
multiple SPSs may exist . In that case , any of multiple SPSs referred to as a Largest Coding Unit (LCU) .
is selected from PPSs .
[ 0092 ] In the picture parameter set PPS , a set of coding Coding Tree Unit
parameters referred to by the video decoding apparatus 31 to [ 0100 ] In ( e) of FIG . 2 , a set of data referred to by the
decode each picture in a target sequence is prescribed . For video decoding apparatus 31 to decode a coding tree unit of
example , a reference value (pic_init_qp_minus26 ) of a a processing target is prescribed. A coding tree unit is
quantization step size used for decoding of a picture and a partitioned by recursive quad tree partitioning (QT parti
flag ( weighted_pred_flag) indicating an application of a tioning ) or binary tree partitioning (BT partitioning) into
weighted prediction are included . Note that multiple PPSs Coding Units ( CUs ) , each of which is a basic unit of coding
may exist . In that case , any of multiple PPSs is selected from processing. A tree structure obtained by recursive quad tree
each slice header in a target sequence . partitioning or binary tree partitioning is referred to as a
Coding Picture Coding Tree (CT ) , and nodes of a tree structure are referred
to as Coding Nodes ( CNS ) . Intermediate nodes of a quad tree
[ 0093 ] In the coding picture, a set of data referred to by the or a binary tree are coding nodes, and the coding tree unit
video decoding apparatus 31 to decode the picture PICT of itself is also prescribed as the highest coding node.
a processing target is prescribed. As illustrated in (b ) of FIG .
2 , the picture PICT includes slices SO to SNS- 1 (NS is the Coding Unit
total number of slices included in the picture PICT ) . Slices [ 0101 ] As illustrated in ( f) of FIG . 2 , a set of data referred
include rectangular slices having a rectangular shape and to by the video decoding apparatus 31 to decode a coding
general slices with no constraint on shape, and there is only unit of a processing target is prescribed. Specifically, the
one type of them in one coding sequence. Details will be coding unit includes a prediction tree , a transform tree , and
described below .
[ 0094 ] Note that in a case that it is not necessary to a CU header CUH . In the CU header, a prediction mode , a
distinguish the slices SO to SNS- 1, subscripts of reference partitioning method (a PU partitioning mode) , and the like
signs may be omitted and described below . The same applies are prescribed .
to other data included in the coding stream Te described [ 0102 ] In the prediction tree , a prediction parameter ( a
below and described with an added subscript. reference picture index , a motion vector, and the like ) of
each prediction unit (PU ) where the coding unit is parti
Coding Slice tioned into one or multiple is prescribed. In another expres
sion, the prediction units are one or multiple non -overlap
[ 0095 ] In the coding slice , a set of data referred to by the ping regions constituting the coding unit . The prediction tree
video decoding apparatus 31 to decode the slice S of a includes one or multiple prediction units obtained by the
processing target is prescribed . As illustrated in ( c ) of FIG . above -mentioned partitioning. Note that, in the following, a
2 , the slice S includes a slice header SH and a slice data
SDATA .
unit of prediction where the prediction unit is further parti
tioned is referred to as a “ subblock ” . The subblock includes
[ 0096 ] The slice header SH includes a coding parameter multiple pixels . In a case that the sizes of the prediction unit
group referred to by the video decoding apparatus 31 to and the subblock are the same , there is one subblock in the
determine a decoding method of a target slice . Slice type prediction unit. In a case that the prediction unit is larger
specification information ( slice_type ) to specify a slice type than the size of the subblock , the prediction unit is parti
is one example of a coding parameter included in the slice tioned into subblocks . For example, in a case that the
header SH . prediction unit is 8x8 , and the subblock is 4x4 , the predic
US 2021/0136407 A1 May 6 , 2021
5

tion unit is partitioned into four subblocks formed by Reference Picture List
horizontal partitioning into two and vertical partitioning into [ 0112 ] A reference picture list is a list constituted by
two .
[ 0103 ] The prediction processing may be performed for reference pictures stored in a reference picture memory 306 .
FIG . 3 is a conceptual diagram illustrating an example of
each of these prediction units ( subblocks ) . reference pictures and reference picture lists . In FIG . 3 ( a ) , a
[ 0104 ] Generally speaking, there are two types of predic rectangle indicates a picture , an arrow indicates a reference
tions in the prediction tree , including a case of an intra relationship of pictures, a horizontal axis indicates time ,
prediction and a case of an inter prediction . The intra each of I , P, and B in the rectangle indicates an intra picture,
prediction is a prediction in the same picture, and the inter a uni- prediction picture, and a bi- prediction picture, and the
prediction refers to a prediction processing performed number in the rectangle indicates a decoding order. As
between mutually different pictures ( for example, between illustrated , the decoding order of the pictures is I0 , P1 , B2 ,
display times ). B3 , and B4 , and the display order is I0 , B3 , B2 , B4 , and P1 .
[ 0105 ] In a case of an intra prediction, the partitioning FIG . 3 (6 ) illustrates an example of reference picture lists .
method includes 2Nx2N ( the same size as the coding unit ) The reference picture list is a list to represent a candidate of
and NxN . a reference picture, and one picture ( slice) may include one
[ 0106 ] In a case of an inter prediction, the partitioning or more reference picture lists . In the illustrated example, a
method is coded by a PU partitioning mode (part_mode) of target picture B3 includes two reference picture lists , i.e. , a
the coded data . LO list RefPicList and a L1 list RefPicList1 . In a case that
[ 0107] In the transform tree , the coding unit is partitioned a target picture is B3 , the reference pictures are 10 , P1 , and
into one or multiple transform units TUs , and a position and B2 , the reference pictures includes these pictures as ele
a size of each transform unit are prescribed. In another ments . For an individual prediction unit, which picture in a
expression, the transform units are one or multiple non reference picture list RefPicListX ( X = 0 or 1 ) is actually
overlapping regions constituting the coding unit. The trans referred to is specified with a reference picture index
form tree includes one or multiple transform units obtained refldxLX . The diagram indicates an example where refer
by the above -mentioned partitioning. ence pictures P1 and B2 are referred to by refldxLO and
[ 0108 ] Partitioning in the transform tree include those to refldxL1. Note that LX is a description method used in a
allocate a region that is the same size as the coding unit as
case of not distinguishing the LO prediction and the Li
a transform unit, and those by recursive quad tree partition prediction, and hereinafter parameters for the LO list and
parameters for the Li list are distinguished by replacing LX
ing similar to the above -mentioned partitioning of CUs . with LO or L1 .
[ 0109 ] A transform processing is performed for each of [ 0113 ] Merge Prediction and AMVP Prediction Decoding
these transform units . ( coding) methods of a prediction parameter include a merge
prediction (merge) mode and an Adaptive Motion Vector
Prediction Parameter Prediction ( AMVP ) mode , and the merge flag merge_flag is
a flag to identify these . The merge mode is a mode to use to
[ 0110 ] A prediction image of Prediction Units ( PUs ) is derive from a prediction parameter of a neighbor PU that has
derived by a prediction parameter attached to the PUs . The been already processed , without including a prediction list
prediction parameter includes a prediction parameter of an utilization flag predFlagLX (or an inter prediction indicator
intra prediction or a prediction parameter of an inter pre inter_pred_idc), a reference picture index refldxLX , and a
diction . The prediction parameter of an inter prediction motion vector mvLX in a coded data . The AMVP mode is
( inter prediction parameters ) will be described below . The a mode to include an inter prediction indicator inter_pred_
inter prediction parameter is constituted by prediction list idc , a reference picture index refldxLX , and a motion vector
utilization flags predFlagLO and predFlagL1, reference pic mvLX in a coded data . Note that, the motion vector mvLX
ture indexes refldxLO and refldxLi, and motion vectors is coded as a prediction vector index mvp_1X_idx identify
mvLO and myL1 . The prediction list utilization flags pred ing a prediction vector mvpLX and a difference vector
FlagLO and predFlagL1 are flags to indicate whether or not mvdLX .
reference picture lists referred to as LO list and Li list [ 0114 ] The inter prediction indicator inter_pred_idc is a
respectively are used , and a corresponding reference picture value indicating types and the number of reference pictures,
list is used in a case that the value is 1. Note that, in a case and takes any value of PRED_LO, PRED_L1, and PRED_
that the present specification mentions " a flag indicating B1 . PRED_LO and PRED_L1 indicate to uses reference
whether or not XX ” , a flag being other than 0 ( for example , pictures managed in the reference picture list of the LO list
1 ) assumes a case of XX , and a flag being O assumes a case and the Li list respectively, and indicate to use one reference
of not XX , and 1 is treated as true and 0 is treated as false picture ( uni-prediction ). PRED_B1 indicates to use two
in a logical negation , a logical product, and the like (here reference pictures (bi -prediction BiPred ), and use reference
inafter, the same is applied) . However, other values can be pictures managed in the LO list and the L1 list . The predic
used for true values and false values in real apparatuses and tion vector index mvp_1X_idx is an index indicating a
methods. prediction vector, and the reference picture index refldxLX
[ 0111 ] For example, syntax elements to derive an inter is an index indicating a reference picture managed in a
prediction parameter included in a coded data include a PU reference picture list .
partitioning mode part_mode, a merge flag merge_flag, a [ 0115 ] The merge index merge_idx is an index to indicate
_idx , an inter prediction indicator inter_
merge index merge_1 to use which prediction parameter as a prediction parameter
pred_idc, a reference picture index ref_idx_1X (refldxLX ), a of a decoding target PU among prediction parameter can
prediction vector index mvp_1X_idx, and a difference vector didates (merge candidates) that have been derived from PUS
mvdLX . of which the processing has been completed.
US 2021/0136407 A1 May 6 , 2021
7

and in the vertical direction . The numbers in the rectangular WRS =wPict/ (num_rslice_columns_minus 1 + 1 )
slices are Slicelds . In the following, the rectangular slices HRS = hPict / (num_rslice_rows_minus1 + 1) ( Equation RSLICE - 1 )
will be described in detail.
[ 0128 ] FIG . 6 ( a ) is a diagram illustrating an example of [ 0134 ] In a case that the value of uniform_spacing_flag is
partitioning a picture into N rectangular slices ( rectangles of 0 , the width and the height of each rectangular slice of the
solid lines, the diagram is an example of N = 9 ) . The rectan picture may not be configured to be the same, and the width
gular slices are further partitioned into multiple CTUS column_width_minus1 [ i ] in a CTU unit and the height
( rectangles of dashed lines ) . The upper left coordinate of the row_height_minus1 [ i ] in a CTU unit of each rectangular
central rectangular slice of FIG . 6 (a ) is denoted as (xRSs , slice are coded for each rectangular slice .
YRSs ) , with wRS as the width and hRS as the height. The WRS = (column_width_minus1 [i ]+1) << CtbLog2Size Y
width and the height of the picture are denoted as wPict and
hPict . Note that information related to the number of parti hRS = (row_height_minus 1 [i]+1) << CtbLog2Size Equation RSLICE - 2 )
tioning and the size of the rectangular slice is referred to as
rectangular slice information , and the details will be Rectangular Slice Boundary Limitation
described later.
[ 0129 ] FIG . 6 ( b) is a diagram illustrating the coding or [ 0135 ] A rectangular slice is signalled by setting the value
decoding order of CTUs in a case of the picture partitioned of rectangular_slice_flag of seq_parameter_set_rbsp ( )
into rectangular slices . The numbers in ( ) set forth in each illustrated in FIG . 7 (a ) to 1. In this case , in a case that the
rectangular slice are Slicelds (the identifiers of the rectan rectangular slice information does not change throughout the
gular slices in the picture ), which are assigned in the raster CVS , that is , in a case that the value of rectangular_slice_
scan order from the upper left to the lower right for the flag is 1 , the value of num_rslice_columns_minus1, num_
rectangular slices in the picture, and the rectangular slices rslice_rows_minus , uniform_spacing_flag, column
are processed in the order of Sliceld . In other words, the width_minus1 [ ] , row_height_minus1 [ ] , loop_filter_
coding or decoding process is performed in the ascending across_rslices_enabled_flag (on or off of the loop filter at the
order of Sliceld . The CTUs are processed in the raster scan rectangular slice boundary ) signalled with a PPS is the same
order from the upper left to the lower right in each rectan throughout the CVS . In other words , in the case that the
gular slice , and after processing in one rectangular slice is value of rectangular_slice_flag is 1 , in a CVS , for rectan
finished , CTUs in the next rectangular slice is processed. gular slices with the same Sliceld , the rectangular slice
position ( the upper left coordinate, the width , and the height
[ 0130 ] In a general slice , the CTUs are processed in the of the rectangular slice ) on a picture is not changed even in
raster scan order from the upper left to the lower right of the pictures where the display orders (Picture Order Count
picture, so that the processing order of CTUs is different in (POC ) ) are different. In a case that the value of rectangular_
a rectangular slice and a general slice . slice_flag is 0 , that is , in a case of a general slice , the
[ 0131 ] FIG . 6 (c ) is a diagram illustrating continuous rect rectangular slice information is not signalled ( FIG . 7 (b ) and
angular slices in the temporal direction . As illustrated in FIG . 9 (a)) .
FIG . 6 (c ) , the video sequence is comprised of multiple [ 0136 ] FIG . 7 (a ) is a syntax table that extracts a part of the
continuous pictures in the temporal direction . The rectan sequence parameter set SPS . The rectangular slice flag
gular slice sequence is comprised of rectangular slices of rectangular_slice_flag is a flag for indicating whether or not
one or more times continuous in the temporal direction . Note it is a rectangular slice as described above, as well as for
that a Coded Video Sequence (CVS ) in the diagram is a indicating whether or not the sequence to which the rectan
group of pictures from a picture that refers to a certain SPS gular slice belongs can be independently coded or decoded
to a picture immediately prior to a picture that refers to a in the temporal direction in addition to in the spatial direc
different SPS . tion . In a case that the value of rectangular_slice_flag is 1 ,
it means that the rectangular slice sequence can be coded or
[ 0132 ] FIG . 7 and FIG . 9 are examples of the syntax decoded independently. In this case , the following con
related to the rectangular slices . straints may be imposed on the coding or decoding of the
[ 0133 ] The rectangular slice information may be repre rectangular slice and the syntax of the coded data .
sented by num_rslice_columns_minus1, num_rslice_rows_ [ 0137] ( Constraint 1 ) The rectangular slice does not refer
minus1, uniform_spacing_flag, column_width_minus1 [ ] , to information of a rectangular slice with a different Sliceld .
row_height_minus1 [ ] , for example, as illustrated in FIG . [ 0138 ] ( Constraint 2 ) The number of rectangular slices in
7 ( c ) , and is signalled with rectangular_slice_info of a the horizontal and vertical directions, the width of the
PPS , for example, as illustrated in FIG . 7 ( b ) . Alternatively, rectangular slices , and the height of the rectangular slices in
as illustrated in FIG . 9 (a ) , rectangular_slice_info ( ) may be the pictures signalled by a PPS are the same throughout the
signalled by a SPS . Here, num_rslice_columns_minus1 and CVS . Within the CVS , the rectangular slices with the same
num_rslice_rows_minus 1 are values obtained by subtracting Sliceld do not change the rectangular slice position ( the
1 from the number of rectangular slices in the horizontal and upper left coordinate , the width , and the height) of the
vertical directions in the picture, respectively. uniform_ rectangular slice on the pictures, even in pictures with
spacing_flag is a flag for indicating whether or not the different display orders ( POC ) .
picture is evenly partitioned into rectangular slices . In a case [ 0139 ] The above (Constraint 1 ) “ the rectangular slice
that the value of uniform_spacing_flag is 1 , the width and does not refer to information of a rectangular slice with a
the height of each rectangular slice of the picture are different Sliceld ” will be described in detail.
configured to be the same and may be derived from the [ 0140 ] FIG . 10 is a diagram illustrating a reference to a
number of rectangular slices in the horizontal and vertical rectangular slice in a temporal direction (between different
directions in the picture. pictures ). FIG . 10 (a ) is an example of partitioning an intra
US 2021/0136407 A1 May 6 , 2021
8

picture Pict ( t0 ) at time to into N rectangular slices . FIG . slice of the picture, dependent_slice_segment_flag for indi
10 ( b ) is an example of partitioning an inter picture Pict ( t ) cating whether or not the current slice is a dependent slice
at time t1 =t0 + 1 into N rectangular slices . Pict (t1 ) refers to is decoded (SYNO1). In the case that it is not the first slice
Pict (t0 ) . FIG . 10 ( c) is an example of partitioning an inter of the picture, the CTU address slice_segment_address at
picture Pict (t2) at time t2 = t0 + 2 into N rectangular slices . the beginning of the slice is decoded (SYN04 ). In a general
Pict (t2 ) refers to Pict ( t1 ). In the diagram , RSlice (n , t) slice , the POC is reset in an Instantaneous Decoder Refresh
represents a rectangular slice with SliceId = n (n=0 ... N - 1 ) (IDR) picture, so that the information slice_pic_order_cnt_
at time t . From (Constraint 2 ) described above , at any time , Isb for deriving the POC is not signalled in the IDR picture
the upper left coordinate, the width , and the height of the (SYNO2 ).
rectangular slices with SliceId = n are the same . [ 0146 ] On the other hand, in the rectangular slice illus
[ 0141 ] In FIG . 10 ( 6) , CUL , CU2 , and CU3 in the rectan trated in FIG . 11 (a ) , the syntax slice_id for indicating the
gular slice RSlice (n , t1 ) refer to blocks BLK1 , BLK2 , and Sliceld is signalled in the NAL unit header, so the slice
BLK3 of FIG . 10 (a ) . RSlice (n , t1 ) represents a rectangular position information is not signalled but derived from the
slice with SliceId = n at time t1 . In this case , BLK1 and BLK3 Sliceld and the rectangular slice information . For example,
are blocks that are included in rectangular slices different in a case of uniform_spacing_flag = 1, the coordinate ( sRSs ,
from the rectangular slice RSlice ( n, t0 ) , and thus to refer to yRSs ) of the first CTU of the slice is derived in the following
these requires not only RSlice (n , t0) but also decoding the equation .
entire Pict (t0 ) at time to . That is , decoding the rectangular SliceId = slice_id
slice sequence corresponding to SliceId = n at times to and t1
is not enough to decode the rectangular slice RSlice (n, t1 ) , (XRSS,YRSs )= ((SliceId % ( num_rslice_columns_mi
and in addition to SliceId = n , decoding of rectangular slice nus1 + 1 ) ) * WRS , (SliceId / (num_rslice_columns_
sequences other than SliceId = n is also necessary. Thus, in minus 1 + 1 ) * hRS) ( Equation RSLICE - 3 )
order to independently decode a rectangular slice sequence , Then , dependent_slice_segment_flag for indicating whether
reference pixels in a reference picture referred to in motion or not the current slice header is a dependent slice is decoded
compensation image derivation of CUs in the rectangular (SYN11 ). In a rectangular slice , Sliceld is assigned in a
slice is required to be included in a collocated rectangular rectangular slice unit , so that an independent slice and a
slice ( a rectangular slice at the same position on the refer dependent slice included in one rectangular slice have the
ence picture ) same Sliceld . The coordinate of the first CTU of an inde
[ 0142 ] In FIG . 10 (c ) , CU4 adjacent to the boundary of the pendent slice ( the vertical line block in FIG . 4 (c ) ) is ( sRSs ,
right end of the rectangular slice RSlice ( n, t2 ) refers to a yRSs ) derived in ( Equation RSLICE - 3 ) , while the informa
lower right block CU4BR of CU4 ' ( the block indicated by tion related to the coordinate of the first CTU of a dependent
the dashed line ) in the picture at time t1 illustrated in FIG . slice ( the horizontal line block in FIG . 4 ( c) ) is derived by
10 (b ) as a prediction vector candidate in the temporal decoding slice_segment_address ( SYN14 ). In a rectangular
direction , and the motion vector of CU4BR is stored as a slice , the POC is not always reset with an Instantaneous
prediction vector candidate in a prediction vector candidate Decoder Refresh ( IDR) picture , so that information slice_
list ( a merge candidate list ) . However, in a CU on the right pic_order_cnt_lsb for deriving the POC is always signalled
end of the rectangular slice , CU4BR is located outside of the (SYN12 ).
collocated rectangular slice , so that to refer to CU4BR [ 0147] Independent slices and dependent slices in a case
requires decoding of not only RSlice ( n , t1 ) but also at least that one picture is partitioned into four rectangular slices is
RSlice (n + 1, tl ) at time tl . That is , the rectangular slice illustrated in FIG . 4 (c ) . In each rectangular ce, an inde
RSlice (n , t2 ) cannot be decoded by simply decoding the pendent slice is a region of a rectangular pattern , followed
rectangular slice sequence of SliceId = n . Thus, in order to by zero or more dependent slices after the independent slice .
independently decode the rectangular slice sequence , a In a slice header of a dependent slice , only a part of the
block on a reference picture referred to as a prediction vector syntax of the slice header is signalled , so that the header size
candidate in the temporal direction needs to be included in is smaller than an independent slice . Compared to a general
a collocated rectangular slice . A specific implementation slice , a rectangular slice is limited in shape to rectangular, so
method of the above - described constraints will be described that code amount control per slice is difficult. A slice coder
in the following video decoding apparatus and video coding 2012 codes a rectangular slice by partitioning one rectan
apparatus. gular slice into two or more NAL units by inserting a
[ 0143 ] In a case that the value of rectangular_slice_flag is dependent slice header prior to exceeding a prescribed code
O , it means that the slice is not a rectangular slice , and may amount. In a transmission scheme with limited data amount,
not be able to be independently decoded in the temporal such as a packet adaptive scheme for use in network
direction .
transmission , a dependent slice is used to allow flexible code
Configuration of Slice Header amount control in accordance with an application while
suppressing the overhead of the slice header.
[ 0144 ] FIG . 8 and FIG . 11 (a ) are examples of the syntax [ 0148 ] By using Wavefront Parallel Processing ( WPP ) in
related to a slice header. The syntax of a slice header of a addition to parallel processing for each rectangular slice , the
general slice is FIG . 8 , and the syntax of a slice header of a degree of parallel processing can be further increased . FIG .
rectangular slice is FIG . 11 (a ) . The differences in the syntax 4d) is a diagram illustrating WPP . WPP is a process in a
in FIG . 8 and FIG . 11 (a ) will be described . CTU column unit in a slice , and the beginning address of the
[ 0145 ] In the general slice illustrated in FIG . 8 , the flag left end CTU of each slice on the coding stream is signalled
first_slice_segment_in_pic_flag for indicating whether or in the slice header other than the first column of the slice . A
not it is the first slice of the picture at the beginning of the slice decoder 2002 derives the beginning address of each
slice header is first decoded . In a case that it is not the first CTU column with reference to entry_point_offset_minus1
US 2021/0136407 A1 May 6 , 2021
9

of the slice header described in FIG . 8 or FIG . 11 (a ) ( adds hierarchy identifier is derived from the syntax nuh_tempo
1 to entry_point_offset_minus 1 ). Returning to FIG . 4 ( d ), for ral_id_plus1 signalled by nal_unit_header. The arrows in the
the rectangular slice of Sliceld- sid , the CTU at the position figures indicate the reference directions of the pictures. For
(x , y ) is represented by RS [ sid] [ x ] [ y] . The CTU ( RS [ O ] example, the picture of POC =3 in FIG . 12 (b ) uses the
[ 0 ] [ 1 ] ) at position ( 0,1 ) with SliceId=0 sets the CABAC pictures of POC = 2 and POC=4 for prediction. Accordingly,
in FIG . 12 (b ) , the decoding order and the output order of the
context of oft- th CTU ( RS [ O ] [oft ][ 0 ]) from the left of the
one - upper CTU column as the CABAC context. In the pictures are different from each other . In FIGS . 12 (c ) and ( d ),
example of FIG . 4 ( d ), oft is equal to 2 so that the slice the decoding order and the output order of the pictures are
decoder 2002 sets the CABAC context of RS [ O ] [ 2 ] [ O ] for different from each other as well . In a case that the maximum
the CABAC context of RS [ 0 ] [ 0 ] [ 1 ] . In FIG . 4 ( d ), a block Tid (maxTid ) is 0 , i.e. , the decoding order and the output
with horizontal lines is a left end block of each rectangular order of the pictures are the same , the insertion positions of
slice , and a block with diagonal lines is a block that refers the I slices are optional in the rectangular slice sequence .
to the CABAC context from the left end block . The slice However, in a case that the decoding order and the output
decoder 2002 may perform a decoding process in parallel in order of the pictures are different from each other, the
a unit of CTU rows from the beginning address of each CTU insertion positions of the I slices are limited to the pictures
column on the coding stream . This further allows parallel of Tid = 0 . This is because , in a case that an I slice is inserted
decoding in a unit of CTU rows in addition to parallel into a picture other than those, a problem may occur in
decoding in a unit of rectangular slices . which a coding stream of an I slice has not received at a time
[ 0149 ] Note that in a rectangular slice , the number of CTU of decoding a picture that utilizes the I slice for prediction .
columns for each slice is known ( for example, row_height_ [ 0154 ] FIGS . 13 and 14 are diagrams illustrating the
minus1 [ ] ) , so that notification of num_entry_point_offset insertion positions of I slices in rectangular slices . The
( SYN05 ) illustrated in FIG . 8 is not necessary in FIG . 11 (a ) numerical values in FIGS . 13 (a ) and ( d ) and FIG . 14 (a )
(SYN15 ). indicate Slicelds , and “ I” in FIGS . 13 ( b ) , ( c) , and (e ) to ( j),
[ 0150 ] As described above, by introducing a rectangular and FIGS . 14 ( b) to 14 (e ) indicates I slices . FIG . 13 (a ) is a
slice instead of a tile and switching a general slice and a case that one picture is partitioned into four rectangular
rectangular slice in a unit of CVS , a complex coding slices , and is a case that the insertion period ( PIslice ) of an
structure such as further partitioning a slice into tiles or I slice in each rectangular slice is 8 , with maxTid=2 .
further partitioning a tile into slices can be simplified . maxTid=2 denotes the coding structure of FIG . 12 (C ) . In
Intra Slice Control and Notification Thereof
POC =0 (FIG . 13 ( b) ) and POC =4 (FIG . 13 ( C ) ) with Tid=0 ,
each of SliceId=0 and 2 and SliceId= 1 and 3 is coded with
[ 0151 ] In order to allow random access , conventionally, an I slices . That is , as illustrated in FIG . 13 (a ) , in the case of
intra ( Intra Random Access Point ( IRAP ) picture is inserted four rectangular slices , maxTid = 2 , PIslice = 8 , the IRAP
that ensures independent decoding in a picture unit. Spe picture , which is a conventional key frame, is partitioned
cifically, the prediction is reset with the IRAP picture , and into substantially two, and half of a picture is coded as an I
playback of pictures from the middle of the sequence , or slice at a time . Therefore , since an I slice having a large code
special playback such as fast forward , and the like is amount is partitioned into two pictures, it is possible to avoid
performed . However, the code amount is concentrated in the concentrating the code amount to one picture . A rectangular
IRAP pictures, so that there is a problem in that the amount slice sequence does not refer to a rectangular slice sequence
of processing of each picture is imbalance and the process with a different Sliceld , and thus random access can be
ing is delayed . performed at the time of all rectangular slices coded with the
[ 0152 ] A temporal independent slice is independent in not I slices ( POC =4 in FIG . 12 (c ) ) beginning from POC =0 .
only the spatial direction but also in the temporal direction , [ 0155 ] FIG . 13 ( d ) is a case that one picture is partitioned
so by not inserting an IRAP in which all slices are intra slices into six rectangular slices , and is a case of maxTid = 1 and
but by inserting I slices distributed in multiple pictures for PIslice = 16 . maxTid = 1 denotes the coding structure of FIG .
each rectangular slice sequence , imbalance in the amount of 12 ( b ) . In POC =0 , 2 , 4 , 6 , 8 , and 10 with Tid = 0 (FIGS . 13 ( e )
processing or delay due to the code amount being concen to ( j) ), each of SliceId=0 , 1 , 2 , 3 , 4 , and 5 is coded with I
trated in a single picture can be avoided . The following slices . That is , as illustrated in FIG . 13 ( d ), in a case of six
describes the method of inserting an I slice in a rectangular rectangular slices , maxTid = 1 , and PIslice = 16 , the IRAP
slice sequence and its notification method . picture, which is a conventional key frame, is partitioned
[ 0153 ] FIG . 12 is a diagram illustrating a temporal hier into substantially six , and 1 % of a picture is coded as an I slice
archy structure . FIGS . 12 ( a ) to ( d ) are cases that the inser at a time . Therefore, since an I slice having a large code
tion interval of I slices is 16 , FIG . 12 ( e) is a case that the amount is partitioned into six pictures, it is possible to avoid
insertion interval of I slices is 8 , and FIG . 12 ()) is a case that concentrating the code amount to one picture. A rectangular
the insertion interval of I slices is 32. The squares in the slice sequence does not refer to a rectangular slice sequence
figures indicate the pictures and the numbers in the squares with a different Sliceld , and thus random access can be
indicate the decoding order of the pictures. The upper side performed at the time of all rectangular slices coded with the
numerical values of the squares indicate the POC (the I slices ( POC = 10 in FIG . 12 (b ) ) beginning from POC =0 .
display order of the pictures). FIGS . 12 ( a ) , ( e) , and % ) are [ 0156 ] FIG . 14 (a ) is a case that one picture is partitioned
cases that the temporal hierarchy identifier Tid ( Tempo into 10 rectangular slices , and is a case of maxTid=3 and
ralID ) is 0 , FIG . 12 (b ) is a case that the temporal hierarchy PIslice = 32. maxTid = 3 denotes the coding structure of FIG .
identifier Tid ( TemporalID ) is 0 or 1 , FIG . 12 (c ) is a case that 12 ( d ). In POC =0 , 8 , 16 , 24 with Tid=0 (FIGS . 14 ( 6) to (e ) ) ,
the temporal hierarchy identifier Tid ( TemporalID ) is 0 , 1 , or each of SliceId=0 , 4 , and 8 (FIG . 14 ( b) ) , SliceId= 1 , 5 , and
2 , and FIG . 12 ( d ) is a case that the temporal hierarchy 9 (FIG . 14 (c ) ) , SliceId=2 and 6 ( FIG . 14 ( d ) ), and SliceId = 3
identifier Tid (TemporalID ) is 0 , 1 , 2 , or 3. The temporal and 7 (FIG . 14 (e ) ) is coded with I slices . That is , as
US 2021/0136407 A1 May 6 , 2021
10

illustrated in FIG . 14 (a ) , in a case of 10 rectangular slices , SliceId =j in the i -th picture of Tid=0 is not an I slice . In FIG .
maxTid = 3, and PIslice =32 , the IRAP picture , which is a 14 ( b ) , for the O -th picture (POC =0 ) of Tid=0 , rectangular
conventional key frame, is partitioned into substantially slices of the SliceId=0 , 4 , and 8 are I slices , and the other
four, and approximately 1/4 of a picture is coded as an I slice rectangular slices are not I slices , so that islice_flag [ 0 ] [ ] is
at a time . Therefore , since an I slice having a large code ( 1,0,0,0,1,0,0,0,1,0 } as illustrated in FIG . 9 (d ).
amount is partitioned into approximately four pictures, it is [ 0163 ] In FIG . 9 ( C ) , the insertion period ( PIslice ) islice_
possible to avoid concentrating the code amount to one period of I slices in each rectangular slice and the maximum
picture. A rectangular slice sequence does not refer to a value of Tid max_tid are signalled in islice_info ( ) . By
rectangular slice sequence with a different Sliceld , and thus substituting them into ( Equation POC - 1 ) to (Equation POC
random access can be performed at the time of all rectan 3 ) , the positions of the I slices in each rectangular slice are
gular slices coded with the I slices (POC =24 ) beginning derived .
from POC =0 . [ 0164 ] In a case of utilizing rectangular slices , information
[ 0157] FIG . 13 and FIG . 14 are examples of combinations related to the I slice insertion cannot be changed in the CVS .
of the number of rectangular slices , the maximum value In a case of changing the timing of the I slice insertion for
maxTid of Tid , and the insertion period PIslice of I slices , scene changes or other reasons , the CVS needs to be
and the POC for inserting I slices can be expressed , for terminated and information islice ( ) related to I slice
example, by the following equation . insertion needs to be signalled by a new SPS .
TID2 = 2 *maxTid (Equation POC - 1 ) Configuration of Video Decoding Apparatus
POC ( SliceId) = ( SliceId * TID2 ) % PIslice ( Equation POC - 2 ) [ 0165 ] FIG . 15 ( a ) illustrates the video decoding apparatus
Here, POC ( Sliceld) is the POC for coding the rectangular (image decoding apparatus ) 31 according to the present
slice of Sliceld with an I slice . “ 2” a ” indicates a power of 2 invention . The video decoding apparatus 31 includes a
( 2 to the power of a ) . header information decoder 2001 , slice decoders 2002a to
[ 0158 ] As another example, the POC for inserting I slices 2002n , and a slice combining unit 2003. FIG . 16 (6 ) is a
can be expressed as the following equation . flowchart of the video decoding apparatus 31 .
THPI = floor(PIslice / TID2) ( Equation POC - 3 ) [ 0166 ] The header information decoder 2001 decodes
header information ( SPS /PPS or the like ) from a coding
POC ( Sliceld) = ( SliceId * TID2 ) % PIslice (THPI > =2 ) stream Te input from the outside and coded in units of a
network abstraction layer (NAL ) unit . Here , the NAL unit
POC ( SliceId) = ( SliceId * TID2 * THPI ) % PIslice ( other than above) and a NAL unit header will be described in FIG . 17 .
[ 0159 ] In (Equation POC - 3 ) , in a case that the period of Extension of NAL Unit Header
inserting I slices is long , the I slices are inserted being more
distributed than (Equation POC - 2 ) , so the concentration of [ 0167] FIGS . 17 (a ) and (b) are the syntax indicating a
the code amount to a particular picture can be further NAL unit and a NAL unit header of a general slice . The NAL
reduced . However, the I slices are gradually decoded , so it unit includes a NAL unit header and subsequent coded data
takes time to gather the entire picture. In a case of shortening in a unit of byte ( such as a parameter set , coded data of slice
the time involved in random access , maxTid may be smaller, data or lower, and the like ). The NAL unit header notifies the
and the insertion interval of I slices may be shortened . identifier nal_unit_type for indicating the type of NAL unit,
[ 0160 ] The insertion interval of the I slices described nul_layer_id for indicating the layer to which NAL belongs ,
above is signalled , for example, in a sequence parameter set and nuh_temporal_id_plus 1 for indicating the temporal hier
SPS . FIGS . 9 (b ) and (c ) are examples of the syntax related archy identifier Tid . Tid described above is derived by the
I slices . following equation.
[ 0161 ] In FIG . 9 ( b) , in a case ofrectangular_slice_flag = 1, Tid = nuh_temporal_id_plus1-1
information islice ( ) related to I slice insertion is signalled . [ 0168 ] For a rectangular slice , the syntax of the NAL unit
Specific examples of islice ( ) are illustrated in FIGS . 9 (b ) of FIG . 17 ( a ) and the NAL unit header of FIG . 17 ( d ), for
and ( c) . In FIG . 9 ( b ) , in the insertion period of one I slice , example, is used . The difference from a general slice is to
the number of pictures num_islice_picture including I slices signal slice_id in the NAL unit header in a rectangular slice .
and information islice_flag for indicating which slices are I In a case that video coded data of the slice layer or lower is
slices in each picture including the I slices are signalled . transmitted in the NAL unit ( nal_unit_type < = RSV_
Here , NumRSlice is the number of rectangular slices in the VCL31 ) , data of the NAL unit includes a slice header and
picture, and is derived by the following equation from
num_rslice_column_minus1 and num_rslice_rows_minus1 notifies the syntax slice_id for indicating the Sliceld . The
of rectangular_slice_info illustrated in FIG . 7 ( c ) . NAL unit header is desirably fixed in length , so slice_id is
fixed length coded with v bit . Note that in a case that slice_id
NumRSlice = (num_rslice_column_minus1 + 1 ) * (num_ is not signalled , 0xFFFF is set to slice_id .
rslice_rows_minus1 + 1 ) ( Equation POC - 4)
[ 0169 ] As another example, the syntax of the NAL unit of
[ 0162 ] In the case of FIG . 14 ( a ), the pictures including the FIG . 17 (c ) , the NAL unit header of FIG . 17 ( 6 ) , and the
I slices are POC =0 , 8 , 16 , and 24 , which are pictures of extended NAL unit header of FIG . 17 ( e) is used to signal
Tid=0 , so num_islice_picture is 4. In a case that i =0 , 1 , 2 , slice_id . In FIG . 17 (c ) , the extended NAL unit header is
and 3 correspond to POC =0 , 8 , 16 , and 24 , respectively , signalled in a case that nal_unit_header_extension_flag is
islice_flag [ i ] [ ] is determined as illustrated in FIG . 9 ( d ). true, but instead of nal_unit_header_extension_flag, the
Here, islice_flag [ i ] [ j] = 1 indicates that the rectangular slice extended NAL unit header may be signalled in a case that the
of SliceId =j in the i -th picture of Tid=0 is an I slice , and NAL unit includes video coded data of slices or lower
islice_flag [ i ] [ ] =0 indicates that the rectangular slice of (nal_unit_type is RSV_VCL31 or less ) . For the extended
US 2021/0136407 A1 May 6 , 2021
11

NAL unit header of FIG . 17 ( e) , slice_id is signalled in a case the slice decoders 2002a to 2002n , the decoding processing
that the NAL unit includes video coded data of slices or can be performed efficiently , such as performing only the
lower (nal_unit_type is RSV_VCL31 or less ) . In a case that minimum necessary decoding processing to decode the
slice_id is not signalled, slice_id is set to xFFFF for indi images required for display.
cating that it is not a rectangular slice . The slice_id notifi [ 0175 ] In the case of rectangular_slice_flag = 1, the slice
cation by the NAL unit header and rectangular_slice_flag combining unit 2003 refers to the rectangular slice informa
signalled by the SPS need to be linked . That is , in a case that tion transmitted from the header information decoder 2001
slice_id is signalled, rectangular_slice_flag is 1 . and the Sliceld of the rectangular slice to be decoded , and
[ 0170 ] Position information for a target slice is derived in the rectangular slice decoded by the slice decoders 2002a to
combination of slice_id and the rectangular slice informa 2002n , to generate and output decoded images Td required
tion signalled by the SPS or the PPS . Since nal_unit_type for for display. There are no such constraints in the case of
indicating the type of NAL unit ( whether or not the current rectangular_slice_flag = 0, i.e. , in the case of a general slice ,
slice is an IRAP ) is also signalled in the NAL unit header, and the entire picture is displayed.
the video decoding apparatus can know the information
required for random access and the like in advance at the Configuration of Slice Decoder
time of decoding the NAL unit header and a relatively higher [ 0176 ] The configuration of the slice decoders 2002a to
parameter set.
[ 0171 ] In a case that the decoding target is a rectangular 2002n will be described . As an example below, the configu
slice ( S1611 ) , the header information decoder 2001 derives ration of the slice decoder 2002a will be described with
the rectangular slices ( Sliceld ) required for display from the reference to FIG . 18. FIG . 18 is a block diagram illustrating
control information input from the outside, indicating the the configuration of 2002 , which is one of the slice decoders
image region to be displayed on a display or the like . The 2002a to 2002n . The slice decoder 2002 includes an entropy
header information decoder 2001 also decodes the informa decoder 301 , a prediction parameter decoder ( a prediction
tion related to the I slice insertion from the SPS /PPS image decoding apparatus) 302 , a loop filter 305 , a reference
( S1612 ) , and derives a rectangular slice for inserting an I picture memory 306 , a prediction parameter memory 307 , a
slice ( S1613 ) . The header information decoder 2001 extracts prediction image generation unit ( a prediction image gen
the coding rectangular slices TeS required for display from eration apparatus) 308 , an inverse quantization and inverse
the coding stream Te and transmits the coding rectangular transform processing unit 311 , and an addition unit 312 .
slices Tes to the slice decoders 2002a to 2002n . The header Note that there is a configuration in which the loop filter 305
information decoder 2001 also decodes the SPS/ PPS and is not included in the slice decoder 2002 , in accordance with
transmits the rectangular slice information ( the information the slice coder 2012 described below .
related to partitioning of the rectangular slices ) and the like [ 0177] The prediction parameter decoder 302 includes an
to the rectangular slice combining unit 2003. By signalling inter prediction parameter decoder 303 and an intra predic
slice_id in the NAL unit header or its extended portion, the tion parameter decoder 304. The prediction image genera
derivation of the rectangular slices needed for display can be tion unit 308 includes an inter prediction image generation
simplified unit 309 and an intra prediction image generation unit 310 .
[ 0172 ] The slice decoders 2002a to 2002n decode each [ 0178 ] Examples in which CTUS , CUS, PUs , and TUs are
coded slice from the coded rectangular slice TeS and the I used as the units of processing are described below, but the
slice insertion position ( S1614 ) , and transmit the decoded present invention is not limited to these examples, and may
slice to the slice combining unit 2003. In a case that the be processed in CU units instead of TU or PU units.
coding stream TeS is comprised of general slices , there is no Alternatively, the CTUS , CUS , PUs , and TUs are interpreted
control information or rectangular slice information, and the as blocks , and the present invention may be processed in
entire picture is decoded . As illustrated in FIG . 1 ( b) , for a block units.
general slice , with slice_id = 0xFFFF at the time of decoding [ 0179 ] The entropy decoder 301 performs entropy decod
the NAL unit header, the slice header is decoded according ing on the coding stream TeS input from the outside , and
to the syntax of FIG . 8. For a rectangular slice , with other separates and decodes individual codes ( syntax elements ).
than slice_id ! = 0xFFFF, the slice header is decoded accord The separated codes include a prediction parameter to
ing to the syntax of FIG . 11 ( a ) . generate a prediction image and residual information to
[ 0173 ] Here , in a case of rectangular_slice_flag = 1, the generate a difference image and the like .
slice decoders 2002a to 2002n performs decoding process [ 0180 ] The entropy decoder 301 outputs a part of the
ing on the rectangular slice sequence as one independent separated codes to the prediction parameter decoder 302. For
video sequence , and thus do not refer to prediction infor example, the part of the separated codes includes a predic
mation between rectangular slice sequences temporally nor tion mode predMode , a PU partitioning mode part_mode, a
spatially in a case of performing the decoding processing. merge flag merge_flag, a merge index merge_idx, an inter
That is , the slice decoders 2002a to 2002n do not refer to a prediction indicator inter_pred_ide, a reference picture
rectangular slice of another rectangular slice sequence (with index ref_idx_1X , a prediction vector index mvp_1X_idx ,
a different Sliceld) in a case of decoding a rectangular slice and a difference vector mvdLX . The control of which code
in a picture. There are no such constraints in the case of to decode is performed based on an indication of the
rectangular_slice_flag = 0, i.e. , in the case of a general slice . prediction parameter decoder 302. The entropy decoder 301
[ 0174 ] Thus, in the case of rectangular_slice_flag = 1, the outputs a quantization transform coefficient to the inverse
slice decoders 2002a to 2002n decode each of the rectan quantization and inverse transform processing unit 311. This
gular slices , so that decoding processing can be performed quantization transform coefficient is a coefficient obtained
in parallel on multiple rectangular slices , or only one rect by performing a frequency transform such as a Discrete
angular slice may be decoded independently . As a result, by Cosine Transform ( DCT) , a Discrete Sine Transform ( DST) ,
US 2021/0136407 A1 May 6 , 2021
12

a Karyhnen Loeve Transform (KLT ), and the like on a of filters described above, and may be , for example, a
residual signal to quantize in the coding processing . configuration with only a deblocking filter.
[ 0181 ] The inter prediction parameter decoder 303 [ 0185 ] The reference picture memory 306 stores a
decodes an inter prediction parameter with reference to a decoded image of a CU generated by the addition unit 312
prediction parameter stored in the prediction parameter in a predetermined position for each picture and CTU or CU
memory 307 , based on a code input from the entropy of a decoding target. Pictures stored in the reference picture
decoder 301. The inter prediction parameter decoder 303 memory 306 is managed in association with the POC
also outputs the decoded inter prediction parameter to the (display order) on the reference picture list . For a picture in
prediction image generation unit 308 , and also stores the which the whole picture is I slices such as an IRAP picture ,
decoded inter prediction parameter in the prediction param the POC is set to 0 , and all of the pictures stored in the
eter memory 307. Details of the inter prediction parameter reference picture memory are discarded . However, in a case
decoder 303 will be described later.
that the picture is rectangular slices and a part of the picture
[ 0182 ] The intra prediction parameter decoder 304 is coded with I slices , the pictures stored in the reference
decodes an intra prediction parameter with reference to a picture memory needs to be retained .
prediction parameter stored in the prediction parameter
memory 307 , based on a code input from the entropy [ 0186 ] The prediction parameter memory 307 stores a
decoder 301. The intra prediction parameter decoder 304 prediction parameter in a predetermined position for each
outputs the decoded intra prediction parameter to the pre picture and prediction unit ( or a subblock , a fixed size block ,
diction image generation unit 308 , and also stores the and a pixel) of a decoding target. Specifically, the prediction
decoded intra prediction parameter in the prediction param parameter memory 307 stores an inter prediction parameter
eter memory 307 . decoded by the inter prediction parameter decoder 303 , an
[ 0183 ] The intra prediction parameter decoder 304 intra prediction parameter decoded by the intra prediction
decodes a luminance prediction mode IntraPredMode Y as a parameter decoder 304 , and the like . For example, inter
prediction parameter of luminance, and decodes a chromi prediction parameters stored include a prediction list utili
nance prediction mode IntraPredModeC as a prediction zation flag predFlagLX ( the inter prediction indicator inter
parameter of chrominance . The intra prediction parameter pred_ide ), a reference picture index refldxLX , and a motion
decoder 304 decodes the flag for indicating whether or not vector mvLX .
the chrominance prediction is an LM prediction, and in a [ 0187] To the prediction image generation unit 308 , a
case that the flag indicates an LM prediction , the intra prediction mode predMode input from the entropy decoder
prediction parameter decoder 304 decodes information 301 is input, and a prediction parameter is input from the
related to an LM prediction ( information for indicating prediction parameter decoder 302. The prediction image
whether or not it is a CCLM prediction , or information for generation unit 308 reads a reference picture from the
specifying a downsampling method ). Here, the LM predic reference picture memory 306. The prediction image gen
tion will be described . The LM prediction is a prediction eration unit 308 generates a prediction image of a PU (block)
scheme using a correlation between a luminance component or a subblock by using a prediction parameter input and a
and a color component, and is a scheme for generating a reference picture (a reference picture block) read , with a
prediction image of a chrominance image (Cb, Cr ) by using prediction mode indicated by the prediction mode pred
a linear model, based on a decoded luminance image . LM Mode .
predictions include a Cross - Component Linear Model pre
diction (CCLM) prediction and a Multiple Model ccLM [ 0188 ] Here, in a case that the prediction mode predMode
( MMLM ) prediction . The CCLM prediction is a prediction indicates an inter prediction mode, the inter prediction
scheme using one linear model for predicting chrominance image generation unit 309 generates a prediction image of a
from luminance for one block . The MMLM prediction is a PU ( block ) or a subblock by an inter prediction by using an
prediction scheme using two or more linear models for inter prediction parameter input from the inter prediction
predicting chrominance from luminance for one block. In a parameter decoder 303 and a read reference picture ( a
case that the chrominance format is 4 : 2 : 0 , the luminance reference picture block) .
image is downsampled to the same size as the chrominance [ 0189 ] For a reference picture list (an LO list or an Li list )
image to create a linear model. In a case that the flag where a prediction list utilization flag predFlagLX is 1 , the
indicates that it is a different prediction from an LM pre inter prediction image generation unit 309 reads a reference
diction , either a planar prediction, a DC prediction, an picture block from the reference picture memory 306 in a
Angular prediction , or a DM prediction is decoded as position indicated by a motion vector mvLX , based on a
IntraPredModeC . FIG . 19 is a diagram illustrating intra decoding target PU , from reference pictures indicated by the
prediction modes . The directions of the straight lines corre reference picture index refldxLX . The inter prediction
sponding to 2 to 66 in FIG . 19 represent the prediction image generation unit 309 performs an interpolation based
directions , and more accurately, indicate the directions of the on the read reference picture block and generates a predic
pixels on the reference regions R (described later) to which tion image ( an interpolation image or a motion compensa
a prediction target pixel refers . tion image) of a PU . The inter prediction image generation
[ 0184 ] The loop filter 305 applies a filter such as a unit 309 outputs the generated prediction image of the PU to
deblocking filter, a sample adaptive offset ( SAO) , and an the addition unit 312. Here, the reference picture block
adaptive loop filter (ALF ) on a decoded image of a CU refers to a set of pixels ( referred to as a block because it is
generated by the addition unit 312. Note that in a case that normally rectangular ) on a reference picture, and is a region
the loop filter 305 is paired with the slice coder 2012 , the that is referred to for generating a prediction image of a PU
loop filter 305 need not necessarily include the three types or a subblock .
US 2021/0136407 A1 May 6 , 2021
13

Rectangular Slice Boundary Padding upper left coordinate of the target rectangular slice is (xRSS ,
[ 0190 ] For a reference picture list of the prediction list YRSs ) , and the width and the height of the target rectangular
utilization flag predFlagLX = 1, the reference picture block slice are wRS and HRS , the motion vector mvLX of the
( reference block) is a block on a reference picture indicated block is input and a limited motion vector mvLX is output.
by the reference picture index refldxLX , at the position [ 0198 ] The left end posL , the right end posR , the upper
indicated by the motion vector mvLX , based on the position end posU , and the lower end posD of the reference pixels in
of the target CU (block) . As previously described , there is no the generation of the interpolation image of the target block
guarantee that the pixels of the reference block are located are the following. Note that NTAP is the number of taps of
within a rectangular slice (collocated rectangular slice ) on a the filter used for the generation of the interpolation image .
reference picture with the same Sliceld as the target rect posL = xb + (mvLX [0 ]>> log 2 ( M ) ) - NTAP / 2 + 1
angular slice . Thus, as an example, in the case of rectangu
lar_slice_flag = 1, the reference block may be read without posR = xb + W - 1 + mvLX
( [ 0 ] >> log 2 ( M ) ) + NTAP / 2
reference to pixel values outside of the collocated rectan posU =yb + (mvLX [ 1 ]>> log 2 ( M ) ) -NTAP / 2 + 1
gular slice by padding (making up with pixel values of the
rectangular slice boundary ) the outside of each rectangular posD = yb + H - 1 + (mvLX [ 1 ]>> log 2 ( M ) ) + NTAP / 2 (Equation CLIP1 )
slice as illustrated in FIG . 20 (a ) in a reference picture.
[ 0191 ] Rectangular slice boundary padding ( rectangular [ 0199 ] The limitations for the above reference pixels to
slice outside padding) is achieved by using the pixel value enter into the collocated rectangular slice are as follows.
refImg [ xRef + i] [yRef+ j] at the following position ( xRef + i, posL > = xRSS
yRef + j) as the pixel value at the position (xIntL + i, yIntL + j)
of the reference pixel in motion compensation by a motion posR < =xRSS + WRS - 1
compensation unit 3091 described below . That is , this is
achieved by clipping the reference positions at the positions posU > = yRSS
of the upper, lower, left, and right boundary pixels of the
rectangular slice in reference to reference pixels . posD < = yRSs + hRS - 1. ( Equation CLIP2 )
xRef + i = Clip3(xRSs, xRSS + WRS - 1 ,xIntL + i) [ 0200 ] The limitations of the motion vector can be derived
yRef + j = Clip3 (yRSs ,yRSs + hRS - 1 ,yIntL + j) ( Equation PAD - 1) from the following equation by transforming ( Equation
CLIP1 ) and ( Equation CLIP2 ) .
[ 0192 ] Here , ( xRSs, yRSs ) is the upper left coordinate of MvLX [ 0 ] = Clip3 (vxmin ,vxmax,mvLX [ 0 ])
the target rectangular slice at which the target block is
located , and wRS and HRS are the width and the height of MvLX [ 1 ] = Clip3 (vymin ,vymax,mvLX [ 1 ]) (Equation CLIP4)
the target rectangular slice .
[ 0193 ] Note that, assuming that the upper left coordinate Here
of the target block relative to the upper left coordinate of the vxmin = (xRSs - xb + NTAP /2–1) << log 2 ( M )
picture is ( xb, yb ) and the motion vector is (mvLX [ 0] ,
mvLX [ 1 ] ) , xIntL and yIntL may be derived by : vxmax = (xRSS +WRS - xb - W - NTAP / 2 ) << log 2 ( M )
xIntL = xb + (mvLX [ 0 ]>> log 2 ( M ))
vymin = (yRSs - yb + NTAP /2–1 )<< log 2 ( M )
yIntL = yb + (mvLX [ 1 ]>> log 2 ( M ) ). ( Equation PAD - 2 )
vymax = (VRSs + hRS - yb - H -NTAP / 2 )<< log 2 ( M ) (Equation CLIP5 )
Here , M indicates that the accuracy of the motion vector is
1 /M pel. [ 0201 ] In the case of rectangular_slice_flag = 1 , by limiting
[ 0194 ] By reading the pixel value of the coordinate ( xRef + the motion vector in this manner, the motion vector can
i , yRef + j ), the padding of FIG . 20 (a ) can be achieved . always point inside of the collocated rectangular slice for an
[ 0195 ] In the case of rectangular_slice_flag = 1 , by padding inter prediction. In this configuration as well , a rectangular
the rectangular slice boundary in this way, even in a case that slice sequence can be decoded independently by using an
the motion vector points outside of the collocated rectan inter prediction.
gular slice for an inter prediction , the reference pixels are [ 0202 ] In a case that the prediction mode predMode indi
eplaced by using the pixel values within the collocated cates an intra prediction mode , the intra prediction image
rectangular slice , so that the rectangular slice sequence can generation unit 310 performs an intra prediction by using an
be decoded independently by using an inter prediction . intra prediction parameter input from the intra prediction
[ 0196 ] Rectangular Slice Boundary Motion Vector Limi parameter decoder 304 and read reference pixels . Specifi
tation Limiting methods other than the rectangular slice cally, the intra prediction image generation unit 310 reads an
boundary padding include rectangular slice boundary adjacent PU , which is a picture of a decoding target, in a
motion vector limitation . In the present processing , in the predetermined range from the decoding target PU , among
case of rectangular_slice_flag = 1, for motion compensation PUs already decoded, from the reference picture memory
by the motion compensation unit 3091 described below , the 306. The predetermined range is , for example, any of
motion vector is limited ( clipped) so that the position adjacent PUs in left, upper left, upper, and upper right in a
(xIntL + i, yIntL + j) of the reference pixel is within the case that the decoding target PU moves in order of so - called
collocated rectangular slice . raster scan sequentially, and varies according to intra pre
[ 0197] In the present processing , in a case that the upper diction modes . The order of the raster scan is an order to
left coordinate of the target block ( the target subblock or the move sequentially from the left end to the right end in each
target block ) is ( xb , yb ), the size of the block is ( W , H ) , the picture for each row from the upper end to the lower end .
US 2021/0136407 A1 May 6 , 2021
14

[ 0203 ] The intra prediction image generation unit 310 Unfiltered Reference Image Configuration Unit 3102
performs a prediction by a prediction mode indicated by the [ 0210 ] The unfiltered reference image configuration unit
intra prediction mode IntraPredMode, based on a read 3102 configures a peripheral region adjacent to the predic
adjacent PU , and generates a prediction image of a PU . The tion target block to the reference region R , based on the
intra prediction image generation unit 310 outputs the gen prediction target block size and the prediction target block
erated prediction image of the PU to the addition unit 312 . position of the prediction target block information . Subse
[ 0204 ] In a Planar prediction , a DC prediction , and an quently , each pixel value in the reference region R ( the
Angular prediction , the peripheral region that has been unfiltered reference image , the boundary pixels ) is set with
decoded and is adjacent to ( proximate to ) the prediction each decoded pixel value at the corresponding position on
target block is configured as the reference region R. Sche the reference picture memory 306. In other words, the
matically , these prediction modes are prediction schemes for unfiltered reference image r [x ] [y ] is configured by the
generating a prediction image by extrapolating pixels on the following equation by using the decoded pixel value u [ ] [
reference region R in a particular direction . For example , the ] of the target picture expressed in terms of the upper left
reference region R can be configured as an inverse L - shaped coordinate of the target picture.
region ( for example, the region indicated by the diagonal r [x ] [ y ] = u [xB + x ] [yB + y ] ( INTRAP - 1)
rounded pixel of FIG . 21 ) including left and upper (or even
upper left, upper right, lower left) of the prediction target [ 0211 ] x = -1, y = -1 ... (BS* 2–1 ), and x = 0 ... (BS* 2-1),
block . y= -1
[ 0212 ] Here , (xB , YB ) denotes the upper left coordinate of
Detail of Prediction Image Generation Unit the prediction target block , and BS denotes the larger value
of the width W or the height H of the prediction target block .
[ 0205 ] Next , the configuration of the intra prediction [ 0213 ] In the above equation, as illustrated in FIG . 21 ( a ) ,
image generation unit 310 will be described in detail with the line r [x ] [ -1 ] of the decoded pixels adjacent to the
reference to FIG . 22 . prediction target block upper side and the column r [ -1 ] [y ]
[ 0206 ] As illustrated in FIG . 22 , the intra prediction image of the decoded pixels adjacent to the prediction target block
generation unit 310 includes a prediction target block con left side are the unfiltered reference images . Note that, in a
figuration unit 3101 , an unfiltered reference image configu case that a decoded pixel value corresponding to the refer
ration unit 3102 (a first reference image configuration unit ) , ence pixel position does not exist or cannot be referred to ,
a filtered reference image configuration unit 3103 ( a second a prescribed value ( for example, 1 << (bitDepth - 1 ) in a case
reference image configuration unit ), a predictor 3104 , and a that the pixel bit depth is bitDepth ) may be configured as an
prediction image correction unit 3105 ( a prediction image unfiltered reference image , or a decoded pixel value that can
correction unit , a filter switching unit , or a weighting coef be referred to as being present in the vicinity of the corre
ficient change unit ) . sponding decoded pixel value may be configured as an
unfiltered reference image . “ y = -1 ... (BS* 2–1)” indicates
[ 0207 ] The filtered reference image configuration unit that y may take ( BS * 2 + 1 ) values from -1 to ( BS * 2–1 ) , and
3103 applies a reference pixel filter (a first filter ) to each “ x = 0 ... (BS * 2–1)” indicates that x may take (BS * 2 ) values
reference pixel (an unfiltered reference image ) on the input from 0 to (BS * 2–1 ) .
reference region R to generate a filtered reference image and [ 0214 ] In the above equation, as described below with
outputs the filtered reference image to the predictor 3104 . reference to FIG . 21 ( a ) , the decoded images included in the
The predictor 3104 generates a temporary prediction image row of the decoded pixels adjacent to the prediction target
( pre - correction prediction image ) of the prediction target block upper side and the decoded images included in the
block, based on the input intra prediction mode , the unfil column of the decoded pixels adjacent to the prediction
tered reference image , and the filtered reference image , and target block left side is unfiltered reference images .
outputs the generated image to the prediction image correc
tion unit 3105. The prediction image correction unit 3105 Filtered Reference Image Configuration Unit 3103
corrects the temporary prediction image in accordance with
the input intra prediction mode, and generates a prediction [ 0215 ] The filtered reference image configuration unit
image ( corrected prediction image) . The prediction image 3103 applies ( performs) a reference pixel filter (a first filter)
generated by the prediction image correction unit 3105 is to the input unfiltered reference image in accordance with an
output to the summer 15 . intra prediction mode , to derive and output a filtered refer
[ 0208 ] Hereinafter, each unit included in the intra predic ence image s [ x] [ y ] at each position (x , y ) on the reference
tion image generation unit 310 will be described . region R ( FIG . 21 (b ) ) . Specifically, the filtered reference
image configuration unit 3103 applies a low pass filter to the
unfiltered reference image at position (x , y ) and its surround
Prediction Target Block Configuration Unit 3101 ings to derive a filtered reference image . Note that the low
pass filter need not necessarily be applied to the all intra
[ 0209 ] The prediction target block configuration unit 3101 prediction modes , but the low pass filter may be applied to
configures the target CU to the prediction target block , and at least some of intra prediction modes . Note that, a filter that
outputs information related to the prediction target block is applied to the unfiltered reference image on the reference
( prediction target block information ). The prediction target region R at the filtered reference pixel configuration unit
block information includes at least an index for indicating 3103 before entering the predictor 3104 in FIG . 22 is
the prediction target block size , the prediction target block referred to as a “ reference pixel filter ( a first filter ) ” , while
position , and whether the prediction target block is lumi a filter that corrects the temporary prediction image derived
nance or chrominance . by the predictor 3104 by using the unfiltered reference pixel
US 2021/0136407 A1 May 6 , 2021
15

value at the prediction image correction unit 3105 described small , and a filtered reference image ( the reference pixel
later is referred to as a “ boundary filter (a second filter ) ”. filter is turned on) may be used otherwise .
[ 0216 ] For example, as in an intra prediction of HEVC , in Planar Prediction
a case of a DC prediction or in a case that the prediction
target block size is 4x4 pixels , an unfiltered reference image [ 0220 ] The Planar predictor 31041 generates a temporary
may be used as is as a filtered reference image . A flag prediction image by linearly adding multiple filtered refer
decoded from the coded data may switch between applying ence images in accordance with the distance between the
and not applying the low pass filter. Note that in the case that prediction target pixel position and the reference pixel
the intra prediction mode is an LM prediction, an unfiltered position , and outputs the prediction image to the prediction
reference image is not directly referred to in the predictor image correction unit 3105. For example , the pixel value q
3104 , and thus a filtered reference pixel value s [x ] [y ] may [x ] [y ] of the temporary prediction image is derived from the
not be output from the filtered reference pixel configuration following equation by using the filtered reference pixel
unit 3103 . value s [ x] [ y] and the width W and the height H of the
prediction target block previously described .
Configuration of Intra Predictor 3104 q [x ][ y ] = ( ( W - 1 - x ) * s [ -1 ][ y ] + ( x + 1 ) * s [ W ][ - 1 ] + ( H - 1 - y )
[ 0217] The intra predictor 3104 generates a temporary * $ [ x ] [ - 1 ] + ( + 1 ) * s [ -1 ] [ H ] +max ( W , H ) ) >> ( k + 1) ( INTRAP - 2 )
prediction image (a temporary prediction pixel value , a [ 0221 ] Here, x= 0 ... W - 1 , y =0 >>H - 1 , and k = log 2 (max
pre -correction prediction image of the prediction target ( W , H)) is defined .
block, based on the intra prediction mode , the unfiltered
reference image , and the filtered reference image , and out DC Prediction
puts the generated image to the prediction image correction
unit 3105. The predictor 3104 includes a Planar predictor [ 0222 ] The DC predictor 31042 derives an DC prediction
31041 , a DC predictor 31042 , an Angular predictor 31043 , value corresponding to the average value of the input filtered
and an LM predictor 31044 therein . The predictor 3104 reference image s [ x ] [y ] , and outputs a temporary prediction
selects a specific predictor in accordance with the input intra image q [ x] [ y ] , with the derived DC prediction value as the
prediction mode , and inputs an unfiltered reference image pixel value .
and a filtered reference image . The relationships between the Angular Prediction
intra prediction modes and the corresponding predictors are
as follows. [ 0223 ] The Angular predictor 31043 generates a tempo
rary prediction image q [ x ] [y ] by using a filtered reference
Planar prediction Planar predictor 31041 image s [x ] [y ] in the prediction direction ( the reference
DC prediction DC predictor 31042 direction) indicated by the intra prediction mode , and out
Angular prediction Angular predictor 31043 puts the generated image to the prediction image correction
LM prediction LM predictor 31044 unit 3105 .

[ 0218 ] The predictor 3104 generates a prediction image of LM Prediction


the prediction target block ( a temporary prediction image q [ 0224 ] The LM predictor 31044 predicts a pixel value of
[ x] [y ] ) , based on a filtered reference image in an intra chrominance, based on the pixel value of luminance .
prediction mode . In another intra prediction mode , the [ 0225 ] The CCLM prediction process will be described
predictor 3104 may generate a temporary prediction image with reference to FIG . 23. FIG . 23 is a diagram illustrating
q [x ] [ y] by using an unfiltered reference image . The a situation in which the decoding processing for the lumi
predictor 3104 may also have a configuration in which the nance components has ended and the prediction processing
reference pixel filter is turned on in a case that a filtered of the chrominance components is performed in the target
reference image is used , and the reference pixel filter is block . FIG . 23 (a ) is a decoded image uL [ ] [ ] of luminance
turned off in a case that an unfiltered reference image is used . components of the target block , and (c ) and ( d ) are tempo
[ 0219 ] In the following, an example is described in which rary prediction images of Cb and Cr components qCb [ ] [
a temporary prediction image q [ x] [y ] is generated by using ] , and qCr [ ] [ ]. In FIGS. 23 (a ) , ( c ) , and ( d ), the regions rL
a unfiltered reference image r [ ] [ ] in a case of an LM [ ] [ ] , rCb [ ] [ ] , and rCr [ ] [ ] of the outside of each of the
prediction, and a temporary prediction image q [ x] [ y] is target blocks are an unfiltered reference image adjacent to
generated by using a filtered reference images [ ] [ ] in a case each of the target blocks . FIG . 23 (b ) is a diagram in which
of a Planar prediction , a DC prediction , or an Angular the target block and the unfiltered reference image of the
prediction, but the selection of an unfiltered reference image luminance components illustrated in FIG . 23 ( a ) are down
or a filtered reference image is not limited to this example. sampled , and duL [ ] [ ] and drL [ ] [ ] are the decoded image
For example, which of an unfiltered reference image or a and the unfiltered reference image of the luminance com
filtered reference image to use may be switched depending ponents after downsampling. The temporary prediction
on a flag that is explicitly decoded from the coded data , or images of the Cb and Cr components are generated from
may be switched based on a flag derived from other coding these downsampled luminance images duL [ ] [ ] and drL [
parameters. INTRA PREDICTION MODE For example , in ][]
the case of an Angular prediction, an unfiltered reference [ 0226 ] FIG . 24 is a block diagram illustrating an example
image (the reference pixel filter is turned off) may be used of a configuration of the LM predictor 31044 included in the
in a case that difference between the intra prediction mode intra prediction image generation unit 310. As illustrated in
of the prediction target block and the intra prediction mode FIG . 24 (a ) , the LM predictor 31044 includes a CCLM
number of a vertical prediction or a horizontal prediction is predictor 4101 and an MMLM predictor 4102 .
US 2021/0136407 A1 May 6 , 2021
16

[ 0227] The CCLM predictor 4101 downsamples the lumi SSD2 = EX (rC [x ] [y ] - (a2C * drL [x ] [ y ] +62) ) (if drL [x ] [y ]
th_mmlm ) ( Equation CCLM - 6 )
nance image in a case that the chrominance format is 4 : 2 : 0 ,
and calculates the decoded image duL [ ] [ ] and the [ 0234 ] Here, EE is the sum of x and y, and rC [ ] [ ] is rCb
unfiltered reference image drL [ ] [ ] of the downsampled [ ] [ ] , and (a1C , b1C ) is (a1Cb , b1Cb ) for a Cb component,
luminance components in FIG . 23 (b ) . and rc [ ] [ ] is rCr [ ] [ ] , and ( a1C , b1C ) is (alCr, biCr )
[ 0228 ] Next , the CCLM predictor 4101 derives a param for a Cr component.
eter ( a CCLM parameter) (a , b ) of a linear model from the [ 0235 ] MMLM has fewer samples of unfiltered reference
unfiltered reference image drL [ ] [ ] of the downsampled images available for derivation of each linear model than
luminance components and the unfiltered reference images CCLM , so that it may not operate properly in a case that the
rCb [ ] [ ] and rCr [ ] [ ] of the Cb and Cr components. target block size is small or in a case that the number of
Specifically, the CCLM predictor 4101 calculates a linear samples is a few . Thus, as illustrated in FIG . 24 ( b) , a
model ( aC , bC ) that minimizes the square error SSD switching unit 4103 is provided in the LM predictor 31044 ,
between the unfiltered reference image drL [ ] [ ] of the and in a case that any of the conditions described below is
luminance components and the unfiltered reference image satisfied, MMLM is turned off and a CCLM prediction is
rC [ ] [ ] of the chrominance components. performed .
SSD = XX (rC [x ][ y ] - (aC * drL [ x ][ y ] + bC )) (Equation CCLM - 3 ) [ 0236 ] Target block size is equal to or less than TH_M
MLMB ( for example, TH_MMLMB is 8x8 )
[ 0229 ] Here , EE is the sum of x and y. In the case of a Cb [ 0237 ] Number of samples of the unfiltered reference
component, rC [ ] [ ] is rCb [ ] [ ], and (aC , bC ) is (aCb , bCb ), image rCb [ ] [ ] of the target block is less than
and in the case of a Cr component, rC [ ] [ ] is rCr [ ] [ ], TH_MMLMR ( for example , TH_MMLMR is 4 )
and (aC , bC ) is ( aCr, bCr ). [ 0238 ] Unfiltered reference image of the target block is
[ 0230 ] The CCLM predictor 4101 also calculates a linear not on both the upper side and the left side of the target
block (not in the rectangular slice)
model aResi that minimizes the square error SSD between [ 0239 ] These conditions can be determined by the size and
the unfiltered reference image rCb [ ] [ ] of the Cb compo position information of the target block , and thus, a notifi
nents and the unfiltered reference image rCr [ ] [ ] of the Cr cation of a flag for indicating whether CCLM or not may be
components, in order to utilize the correlation of the pre omitted .
diction error of the Cb components and the Cr components . [ 0240 ] In a case that a portion of the unfiltered reference
SSD = E (rCr [ x ] [y ] - (aResi * rCb [ x ][ y ]) ( Equation CCLM - 4) image is outside of the rectangular slice, the LM prediction
may be turned off . In a block that uses an intra prediction ,
[ 0231 ] Here , EE is the sum for x and y. These CCLM the flag for indicating whether a CCLM prediction or not is
parameters are used to generate the temporary prediction signalled at the beginning of the intra prediction information
images qCb [ ] [ ] and q?r [ ] [ ] of the chrominance of the chrominance component, and thus the code amount
components in the following equation . can be reduced by not signalling the flag. That is , on and off
qCb [ x ] [ y ] = aCb * dul [x ] [y ] + bCb control of CCLM is performed at a rectangular slice bound
ary.
qCr[x ][y ]= aCr * dul [x]+ a Resi* ResiCb [x ][y ]+ bCr(Equation CCLM - 5 ) [ 0241 ] Typically, in a case that the chrominance compo
nent of the target block has a higher correlation with the
[ 0232 ] Here , ResiCb [ is a prediction error of the Cb luminance component in the target block at the same posi
components. tion than the same chrominance component of adjacent
[ 0233 ] The MMLM predictor 4102 is used in a case that blocks , an LM prediction is applied in an intra prediction to
the relationship between the unfiltered reference images generate a more accurate prediction image and to reduce a
between the luminance components and the chrominance prediction residual, so that the coding efficiency is increased .
components is categorized into two or more linear models. As described above , by reducing the information required
In a case that there are multiple regions in the target block , for an LM prediction and making an LM prediction easier to
such as foreground and background, the linear model select, a reduction in the coding efficiency can be suppressed
between the luminance components and the chrominance while independently performing an intra prediction of a
components differs in each region . In such a case , multiple rectangular slice , even in a case that a reference image
linear models can be used to generate a temporary prediction adjacent to the target block is outside of the rectangular slice .
image of the chrominance components from the decoded [ 0242 ] Note that an LM prediction generates a temporary
image of the luminance components. For example, in a case prediction image by using an unfiltered reference image , so
that there are two linear models, the pixel values of the that a correction process at the prediction image correction
unfiltered reference image of the luminance components are unit 3105 is not performed on the temporary prediction
divided into two categories at a certain threshold value image of an LM prediction.
th_mmlm , and the linear models that minimizes the square [ 0243 ] Note that the configuration described above is one
error SSD between the unfiltered reference image drL [ ] [ example of the predictor 3104 , and the configuration of the
] of the luminance components and the unfiltered reference predictor 3104 is not limited to the above configuration.
image rc [ ] [ ] of the chrominance components are Configuration of Prediction Image Correction Unit 3105
calculated for each of category 1 in which the pixel value is
equal to or less that a threshold value th_mmlm , and [ 0244 ] The prediction image correction unit 3105 corrects
category 2 in which the pixel value is greater than the a temporary prediction image that is the output of the
threshold value th_mmlm . predictor 3104 in accordance with the intra prediction mode .
SSD1 = EX (rC [x ] [ y ] - (alC * drL [x ] [ y ] +b1 )) (if drL [ x ][ y ] Specifically, the prediction image correction unit 3105
< = th_mmlm ) weighs (weighting average ) an unfiltered reference image
US 2021/0136407 A1 May 6 , 2021
17

and a temporary prediction image in accordance with the [ 0248 ] FIG . 25 ( c) is a derivation equation of a distance
distance between the reference region R and the target weight k [ x ] . The distance weight k [ x ] is set with a value
prediction pixel for each pixel of the temporary prediction floor ( x /dx ) that monotonically increases in accordance with
image , and outputs a prediction image ( a corrected predic the horizontal distance x between the target prediction pixel
tion image ) Pred in which the temporary prediction image is and the reference region R. Here, dx is a prescribed param
modified . Note that in some intra prediction modes , the eter according to the size of the prediction target block .
prediction image correction unit 3105 does not correct the [ 0249 ] FIG . 25 ( d ) illustrates an example of dx . In FIG .
temporary prediction image, and the output of the predictor 25 ( d ), dx = 1 is configured in a case that the width W of the
3104 may be the prediction image as is . The prediction prediction target block is equal to or less than 16 , and dx = 2
image correction unit 3105 may have a configuration to is configured in a case that W is greater than 16 .
switch between the output of the predictor 3104 ( the tem [ 0250 ] The distance weight k [y ] can utilize a definition in
porary prediction image, or the pre -correction prediction which the horizontal distance x is replaced by the vertical
image ) , and the output of the prediction image correction distance y in the aforementioned distance weight k [ x] . The
unit 3105 ( the prediction image , or the corrected prediction values of the distance weights k [ x] and k [y ] become
image ) in accordance with a flag that is explicitly decoded smaller as the values of x or y is larger.
from the coded data or a flag that is derived from the coding [ 0251 ] According to the derivation method of an target
parameter. prediction image by using the equation described above in
[ 0245 ] The processing for deriving the prediction pixel FIG . 25 , the larger the reference distance (x , y) , which is the
value Pred [ x ] [ y ] at the position ( x , y ) within the prediction distance between the target prediction pixel and the refer
target block by using the boundary filter at the prediction ence region R , the greater the value of the distance weight
image correction unit 3105 will be described with reference ( k [x ] , k [ y ] ) . Thus, the value of the weighting coefficient for
to FIG . 25. ( a) of FIG . 25 is a derivation equation of the an unfiltered reference image resulting from the right shift of
prediction image Pred [ x ] [ y ] . The prediction image Pred a prescribed reference intensity coefficient by the distance
[ x] [y ] is derived by weighting (weighted averaging) a tem weight is a small value . Therefore , the closer the position
porary prediction image q [ x ] [ y ]and an unfiltered reference within the prediction target block is to the reference region
image ( for example , r [x ] [ -1], r [-1 ] [ y] , r [ -1 ] ( -1 ] ) . The R , the greater the weight of the unfiltered reference image is
boundary filter is a weighted addition of an unfiltered used to derive the prediction image in which the temporary
reference image and a temporary prediction image of a prediction image is corrected . In general, the closer to the
reference region R. Here, rshift is a prescribed positive reference region R , the more likely the unfiltered reference
integer value corresponding to the adjustment term for image is suitable as an estimate value of the target prediction
expressing the distance weight k [ ] as an integer, and is block as compared to a temporary prediction image . There
referred to as a normalization adjustment term . For example, fore, the prediction image derived by the equation in FIG . 25
rshift = 4 to 10 is used . For example, rshift is 6 . has a higher prediction accuracy compared to a case that a
temporary prediction image is used as the prediction image .
[ 0246 ] Weighting coefficients of an unfiltered reference In addition , according to the equation in FIG . 25 , the
image are derived by right shifting reference intensity coef weighting coefficient using an unfiltered reference image
ficients C = (clv, cih , c2v, c2h) predetermined for each can be derived by multiplying the reference intensity coef
prediction direction by a distance weight k (k [x ] or k [ y] ) ficient by the distance weight. Therefore , by calculating the
that depends on the distance ( x or y ) to the reference region distance weight in advance for each reference distance and
R. More specifically , as the weighting coefficient (a first storing it in a table , the weighting coefficient can be derived
weighting coefficient wlv) of the unfiltered reference image without using a right shift operation or a division .
r [x ] [ -1 ] on the upper side of the prediction target block , the [ 0252 ] Example of Filter Mode and Reference Intensity
reference intensity coefficient clv is shifted to the right by Coefficient C The reference intensity coefficient C ( clv, c2v,
the distance weight k [y ] ( the vertical direction distance clh , c2h ) of the prediction image correction unit 3105 ( a
weight). As the weighting coefficient (a second weighting boundary filter) is dependent on the intra prediction mode
coefficient wih) of the unfiltered reference image r [ -1 ] [y ] IntraPredMode, and is derived by reference to the table
on the left side of the prediction target block, the reference ktable corresponding to the intra prediction mode.
intensity coefficient clh is shifted to the right by the distance [ 0253 ] Note that the unfiltered reference image r [ -1 ] [ - 1 ]
weight k [x ] ( the horizontal direction distance weight ). As is necessary for the correction processing of a prediction
the weighting coefficient ( a third weighting coefficient w2 ) image , but in a case that the prediction target block shares
of the unfiltered reference image r [ -1 ] [ - 1 ] in the upper left the boundary with the rectangular slice boundary, r [ -1 ] [ -1 ]
of the prediction target block , a sum of the reference cannot be referred to , so the following configuration of the
intensity coefficient c2v shifted to the right by the distance rectangular slice boundary boundary filter is used .
weight k [y ] and the reference intensity coefficient c2h
shifted to the right by the distance weight k [x ] is used . Rectangular Slice Boundary Boundary Filter 1
[ 0247] FIG . 25 (b ) is a derivation equation of a weighting [ 0254 ] As illustrated in FIG . 26 , the intra prediction image
coefficient b [ x ] [y ] for a temporary prediction pixel value q generation unit 310 uses pixels in a position that can be
[ x] [ y ] . The weighting coefficient b [x ] [ y ] is derived so that referred to instead of the upper left Boundary pixel r [ -1 ]
the sum of the products of the weighting coefficient and the [ -1 ] to apply a boundary filter, in a case that the prediction
reference intensity coefficient matches ( 1 <<rshift ). This target block shares the boundary with the rectangular slice
value is configured for the purpose of normalizing the boundary.
product of the weighting coefficient and the reference inten [ 0255 ] FIG . 26 ( a ) is a diagram illustrating a process for
sity coefficient in consideration with the right shift operation deriving the prediction pixel value Pred [x ] [ y] at a position
of rshift in FIG . 25 ( a ) . ( x , y ) within the prediction target block by using the
US 2021/0136407 A1 May 6 , 2021
18

boundary filter in a case that the prediction target block prediction target block shares the boundary with the rect
shares the boundary with the boundary on the left side of the angular slice boundary , so the coding efficiency is increased .
rectangular slice . Blocks adjacent to the left side of the
prediction target block are outside of the rectangular slice Rectangular Slice Boundary Boundary Filter 2
and cannot be referred to , but the pixels of the block that is [ 0260 ] A configuration will be described in which, in the
adjacent to the upper side of the prediction target block can unfiltered reference image configuration unit 3102 of the
be referred to . Thus, the upper left neighboring upper intra prediction image generation unit 310 , a boundary filter
boundary pixel r [ 0 ] [ -1 ] is referred to instead of the upper is applied to a rectangular slice boundary by generating an
left boundary pixel r [ -1 ] [ -1 ] , and the boundary filter unfiltered reference image from a reference image that can
illustrated in FIG . 27 ( a ) is applied instead of FIG . 25 ( a ) or be referred to , in a case that an unfiltered reference image
( b) to derive a prediction pixel value Pred [ x] [y ] . That is , the presents that cannot be referred to . In this configuration, a
intra prediction image generation unit 310 calculates and boundary pixel ( an unfiltered reference image) r [x ] [y ] is
derives the prediction image Pred [ x] [ y] with reference to derived in accordance with the process including the fol
the temporary prediction pixel q [ x ] [ y ] , the upper boundary lowing steps.
pixel r [ x] [ -1 ] , and the upper left neighboring upper [ 0261 ] Step 1 : In a case that r [ -1 ] [ H * 2–1 ] cannot be
boundary pixel r [ 0 ] [ -1 ] and by weighting (weighted referred to , scan the pixels in sequence from ( x, y ) = ( - 1 ,
average ).
[ 0256 ] Alternatively, the upper right neighboring upper H * 2–1 ) to ( x, y ) = ( - 1 , -1 ) . In a case that there is a pixel r
boundary pixel r [ W - 1 ] [ -1 ] is referred to instead of the [ -1 ] [y ] that can be referred to during the scanning, the
upper left boundary pixel r [ -1 ] ( -1 ) , and the boundary filter scanning is ended and r [ -1 ] [ y] is configured to r [ -1 ]
illustrated in FIG . 27 ( b) is applied instead of FIG . 25 ( a ) or [ H * 2-1 ] . Subsequently, in a case that r [ W * 2-1 ] [ - 1 ] cannot
( b) to derive a prediction pixel value Pred [x ] [y ] . Here, W be referred to , scan the pixels in sequence from (x , y ) =
is the width of the prediction target block . That is , the intra (W * 2-1 , -1 ) to ( x , y ) = ( 0 , -1 ) . In a case that there is a pixel
prediction image generation unit 310 calculates and derives r [x ] [ -1 ] that can be referred to during the scanning, the
the prediction image Pred [ x] [y ] with reference to the scanning is ended and r [ x] [ -1 ] is configured to r [ W * 2–1 ]
temporary prediction pixel q [ x] [ y] , the upper boundary ( -1 ) .
pixel r [ x ] [ -1 ] , and the upper right neighboring upper [ 0262 ] Step 2 : Scan the pixels in sequence from ( x,
boundary pixel r [ W - 1 ] [ -1 ] by weighting (weighted aver y ) = ( - 1 , H * 2–2 ) to (x , y) = ( - 1 , -1 ) , and in a case that r [ -1 ]
age ) . [y ] cannot be referred to , r [ -1 ] [y + 1 ] is configured to r [ -1 ]
[y ] .
[ 0257] FIG . 26 ( b) is a diagram illustrating a process for [ 0263 ] Step 3 : Scan the pixels in sequence from ( x,
deriving the prediction pixel value Pred [ x ] [ y ] at a position y ) = ( W * 2-2 , -1 ) to ( x, y ) = ( 0 , -1 ) , and in a case that r [x ]
(x , y ) within the prediction target block by using the [ -1 ] cannot be referred to , r [x + 1 ] [ -1 ] is configured to r [x ]
boundary filter in a case that the prediction target block ( -1 )
shares the boundary with the boundary on the upper side of [ 0264 ] Note that the boundary pixel r [x ] [ y] cannot be
the rectangular slice . Blocks adjacent to the upper side of the referred to is a case that a reference pixel is not present in
prediction target block are outside of the rectangular slice the same rectangular slice as the target pixel or is outside of
and cannot be referred to , but the pixels of the block that is the picture boundary. The above process is also referred to
adjacent to the left side of the prediction target block can be as a boundary pixel replacement process ( unfiltered image
referred to . Thus, the upper left neighboring left boundary replacement process ).
pixel r [ -1 ] [ 0 ] is referred to instead of the upper left [ 0265 ] The inverse quantization and inverse transform
boundary pixel r [ -1 ] [ - 1 ] , and the boundary filter illustrated processing unit 311 performs inverse quantization on a
in FIG . 27 (c ) is applied instead of FIG . 25 (a ) or ( b ) to derive quantization transform coefficients input from the entropy
a prediction pixel value Pred [ x] [y ] . That is , the intra decoder 301 to calculate transform coefficients . The inverse
prediction image generation unit 310 calculates and derives quantization and inverse transform processing unit 311
the prediction image Pred [ x ] [ y] with reference to the performs inverse frequency transform such as inverse DCT,
temporary prediction pixel q [ x ] [ y] , the left boundary pixel inverse DST, and inverse KLT for the calculated transform
r [ -1 ] [ y] , and the upper left neighboring left boundary pixel coefficients to calculate a prediction residual signal . The
r [ -1 ] [ 0 ] and by weighting (weighted average).
[ 0258 ] Alternatively, the lower right neighboring left inverse quantization and inverse transform processing unit
boundary pixel r [ -1 ] [ H - 1 ] is referred to instead of the 311 outputs the calculated residual signal to the addition unit
312 .
upper left boundary pixel r [ -1 ] [ -1 ] , and the boundary filter [ 0266 ] The addition unit 312 adds a prediction image of a
illustrated in FIG . 27 ( d ) is applied instead of FIG . 25 ( a ) or PU input from the inter prediction image generation unit 309
( b) to derive a prediction pixel value Pred [ x] [y ] . Here, H or the intra prediction image generation unit 310 and a
is the height of the prediction target block . That is , the intra residual signal input from the inverse quantization and
prediction image generation unit 310 calculates and derives inverse transform processing unit 311 for each pixel , and
the prediction image Pred [ x] [y ] with reference to the generates a decoded image of a PU . The addition unit 312
temporary prediction pixel q [ x ] [ y ] , the left boundary pixel
r [ -1 ] [ y ] , and the lower left neighboring left boundary pixel outputs the generated decoded image of the block to at least
r [ -1 ] [ H - 1 ] and by weighting ( weighted average ). one of a deblocking filter, a sample adaptive offset ( SAO )
unit, or ALF.
[ 0259] In this manner, by replacing the upper left bound
ary pixel r [ -1 ] [ -1 ] with a pixel that can be referred to , it Configuration of Inter Prediction Parameter Decoder
is possible to apply a boundary filter while independently
performing an intra prediction to a rectangular slice even in [ 0267] Next, a configuration of the inter prediction param
a case that one of the left side or the upper side of the eter decoder 303 will be described .
US 2021/0136407 A1 May 6 , 2021
19

[ 0268 ] FIG . 28 is a schematic diagram illustrating a con temporal subblock predictor 30371 , an affine predictor
figuration of the inter prediction parameter decoder 303 30372 , a matching motion derivation unit 30373 , and an
according to the present embodiment. The inter prediction OBMC predictor 30374 that perform a subblock prediction
parameter decoder 303 includes an inter prediction param in a subblock prediction mode .
eter decoding control unit 3031 , an AMVP prediction [ 0274 ] Subblock Prediction Mode Flag Here , a method of
parameter derivation unit 3032 , an addition unit 3035 , a deriving a subblock prediction mode flag subPb MotionFlag ,
merge prediction parameter derivation unit 3036 , a subblock which indicates whether or not a prediction mode for a
prediction parameter derivation unit 3037 , and a BTM certain PU is a subblock prediction mode in the slice decoder
predictor 3038 . 2002 or the slice coder 2012 (details will be described later )
[ 0269 ] The inter prediction parameter decoding control will be described . The slice decoder 2002 or the slice coder
unit 3031 indicates the entropy decoder 301 to decode codes 2012 derives the subblock prediction mode flag subPbMo
( a syntax elements) associated with an inter prediction , and tionFlag , based on which one of a spatial subblock predic
extracts the codes ( syntax elements ) included in the coded tion SSUB , a temporal subblock prediction TSUB , an affine
data . prediction AFFINE , and a matching motion derivation MAT
[ 0270 ] The inter prediction parameter decoding control described later to use . For example, in a case that the
unit 3031 first extracts the merge flag merge_flag. In a case prediction mode selected for a certain PU is N ( for example,
of expressing the inter prediction parameter decoding con N is a label for indicating the selected merge candidate ), the
trol unit 3031 extracting a certain syntax element, it means subblock prediction mode flag subPbMotionFlag may be
that the inter prediction parameter decoding control unit derived by the following equation .
3031 indicates the entropy decoder 301 to decode a certain subPbMotionFlag = (N == TSUB )|| (N == SSUB ) ||
syntax element, and reads the corresponding syntax element ( N == AFFINE )|| ( N == MAT)
from the coded data .
[ 0271 ] In a case that the merge flag merge_flag indicates Here, | indicates a logical sum (as below ) .
0 , that is , an AMVP prediction mode , the inter prediction [ 0275 ] The slice decoder 2002 and the slice coder 2012
parameter decoding control unit 3031 extracts an AMVP may be configured to perform some of the predictions of the
prediction parameter from the coded data by using the spatial subblock prediction SSUB , the temporal subblock
entropy decoder 301. The AMVP prediction parameters prediction TSUB , the affine prediction AFFINE , the match
include an inter prediction indicator inter_pred_idc, a ref ing motion derivation MAT, and the OBMC prediction
erence picture index refldxLX , a prediction vector index OBMC . In other words , in a case that the slice decoder 2002
mvp_1X_idx, and a difference vector mvdLX, for example. and the slice decoder 2002 are configured to perform the
The AMVP prediction parameter derivation unit 3032 spatial subblock prediction SSUB and the affine prediction
derives the prediction vector mvpLX from the prediction AFFINE , the subblock prediction mode flag subPbMotion
vector index mvp_1X_idx. Details will be described below . Flag may be derived as described below .
The inter prediction parameter decoding control unit 3031 subPbMotionFlag = ( N == SSUB )||( N == AFFINE )
outputs the difference vector mvdLX to the addition unit
3035. In the addition unit 3035 , the prediction vector FIG . 29 is a schematic diagram illustrating a configuration
mvpLX and the difference vector mvdLX are added of the merge prediction parameter derivation unit 3036
together, and a motion vector is derived. according to the present embodiment. The merge prediction
[ 0272 ] In a case that the merge flag merge_flag indicates parameter derivation unit 3036 includes a merge candidate
1 , i.e. , a merge prediction mode , the inter prediction param derivation unit 30361 , a merge candidate selection unit
eter decoding control unit 3031 extracts the merge index 30362 , and a merge candidate storage unit 30363. The merge
merge_idx as a prediction parameter related to the merge candidate storage unit 30363 stores the merge candidate
prediction. The inter prediction parameter decoding control input from the merge candidate derivation unit 30361. Note
unit 3031 outputs the extracted merge index merge_idx to that the merge candidate includes a prediction list utilization
the merge prediction parameter derivation unit 3036 ( details flag predFlagLX , a motion vector mvLX , and a reference
will be described later), and outputs a subblock prediction picture index refldxLX . In the merge candidate storage unit
mode flag subPbMotionFlag to the subblock prediction 30363 , a stored merge candidate is assigned an index
parameter derivation unit 3037. The subblock prediction according to a prescribed rule .
parameter derivation unit 3037 partitions a PU into multiple [ 0276 ] The merge candidate derivation unit 30361 derives
subblocks in accordance with the value of the subblock a merge candidate by using the motion vector and the
prediction mode flag subPbMotionFlag, and derives the reference picture index refldxLX of an adjacent PU , which
motion vector in a subblock unit . In other words , in the has already been decoded . In addition to the above - described
subblock prediction mode , the prediction block is predicted example, the merge candidate derivation unit 30361 may
in units of small blocks of 4x4 or 8x8 . In a slice coder 2012 derive a merge candidate by using an affine prediction . This
described below , a method of partitioning a CU into multiple method will be described in detail below . The merge can
partitions ( PUs such as 2NxN , NX2N , NxN , and the like ) didate derivation unit 30361 may use an affine prediction in
and coding the syntax of the prediction parameter in parti a spatial merge candidate derivation process , a temporal
tion units is used , while in the subblock prediction mode, merge candidate derivation process , a joint merge candidate
multiple subblocks are gathered into a group ( set) , and the derivation process, and a zero merge candidate derivation
syntax of the prediction parameter is coded for each set, so process described later. Note that the affine prediction is
that motion information of many subblocks can be coded performed in subblock units , and the prediction parameter is
with smaller code amount . stored in the prediction parameter memory 307 for each
[ 0273 ] Specifically, the subblock prediction parameter subblock . Alternatively, the affine prediction may be per
derivation unit 3037 includes at least one of a spatial formed in pixel units .
US 2021/0136407 A1 May 6 , 2021
20

Spatial Merge Candidate Derivation Process right end of the rectangular slice as in FIG . 20 ( d ). Then, the
[ 0277] As a spatial merge candidate derivation process , merge candidate derivation unit 30361 may configure the
the merge candidate derivation unit 30361 reads a prediction position of the block BR to the lower right in the collocated
parameter (a prediction list utilization flag predFlagLX , a block , as illustrated in FIG . 2007 ). This position is also
motion vector mvLX , a reference picture index refldxLX referred to as BRmod . For example, the position (xColBr,
and the like) stored in the prediction parameter memory 307 yColBr) of BRmod , which is a block boundary position ,
in accordance with a prescribed rule, derives the read may be derived by the following equation .
prediction parameter as a merge candidate, and stores the xColBr = xPb + W - 1
prediction parameter in a merge candidate list mergeCan y ColBr= yPb + H - 1 (Equation BR1 )
dList [ ] ( a prediction vector candidate list mvpListLX [ ] ) .
The prediction parameter to be read is a prediction param [ 0282 ] Furthermore, to make the position of BRmod a
eter related to each of PU ( for example, some or all of PUs multiple of 2 to the power of M , a process of left shift may
adjoining each of the lower left end, the upper left end, and be added after the following right shift. For example, M may
the upper right end of the decoding target PU as illustrated be 2 , 3 , 4 or the like . In a case that the position of reference
in FIG . 20 (b ) ) which is within a predetermined range from to the motion vector is limited by this, the memory required
the decoding target PU . for the storage of the motion vector can be reduced .
Temporal Merge Candidate Derivation Process xColBr = ( (xPb + W - 1 ) >> M )<< M
[ 0278 ] As a temporal merge derivation process , the merge yColBr = ( (YPb + H - 1 )>> M )<< M ( Equation BR2 )
candidate derivation unit 30361 reads a prediction parameter [ 0283 ] In a case that the target block is not located at the
of the lower right ( block BR) of the collocated block lower end of the rectangular slice , the merge candidate
illustrated in FIG . 21 (c ) in the reference picture , or the block derivation unit 30361 may derive the Y coordinate yColBr
(block C ) including the coordinate of the center of the of the BRmod position by ( Equation BR1 ) and ( Equation
decoding target PU from the prediction parameter memory BR2 ) by the following equation, which is the position within
307 as a marge candidate to store in the merge candidate list the block boundary .
mergeCandList [ ] . The motion vector of the block BR is
more distant from the block position that would be a spatial yColBr = yPb + H ( Equation BR3 )
merge candidate than the motion vector of the block C , so [ 0284 ] In Equation BR3 as well , the position (the block
that the block BR is more likely to have a motion vector that boundary position, a position within the round block) may
is different from the motion vector of the spatial merge be configured to a multiple of 2 to the power of M.
candidate. Therefore , in general, the block BR is added to
the merge candidate list mergeCandList [ ] with priority, and yColBr= ( (YPb + H )>> M ) << M ( Equation BR4 )
the motion vector of the block C is added to the prediction [ 0285 ] The block BR (or BRmod ) at the lower right
vector candidate in a case that the block BR does not have position can be referred to as a temporal merge candidate
a motion vector ( for example, an intra prediction block) or because a block outside the rectangular slice is not referred
in a case that the block BR is located outside of the picture. to in the position within the block boundary or the position
By adding a different motion vector as a prediction candi within the round block . Note that configuring the temporal
date, selection options of a prediction vector increase and the merge candidate block BR to the position in FIG . 20 (/) may
coding efficiency increases. The method of specifying the be applied regardless of the position of all target blocks , or
reference picture may be , for example, using a reference may be limited to a case that the target block is located at the
picture index refldxLX specified in the slice header, or may right end of the rectangular slice . For example, assuming
be specifying by using a minimum of reference picture index that a function for deriving Sliceld at a certain position (x ,
refldxLX of a PU adjacent to the decoding target PU . y ) is getSliceID ( x, y ) , in a case of getSliceID (xColBr,
[ 0279 ] For example, the merge candidate derivation unit yColBr )! = " Sliceld of the rectangular slice including the
30361 may derive the position (xColCtr, yColCtr) of the target block ” , the position of BR (BRmod) may be derived
block C and the position (xColBr, yColBr) of the block BR by any of the above equations. In the case of rectangular_
in the following equation . slice_flag = 1, the position of BR ( BRmod ) may be config
xColCtr = xPb + ( W >> 1 ) ured to the lower right BRmod in the collocated block . For
yColCtr = yPb + ( H >> 1 )
example, the merge candidate derivation unit 30361 may
derive the block BR at the block boundary position ( Equa
xColBr=xPb + W
tion BRO ) in the case of rectangular_slice_flag = 0 , and may
derive the block BR at a position within the block boundary
y ColBr= yPb + H ( Equation BRO ) ( Equation BR1 ) or ( Equation BR2 ) in the case of rectangu
lar_slice_flag = 1.
[ 0280 ] Here , (xPb , yPb ) is the upper left coordinate of the [ 0286 ] In the case of rectangular_slice_flag = 1, the merge
target block , and ( W , H ) is the width and the height of the candidate derivation unit 30361 may also derive the block
target block. BR at the round block boundary position (Equation BR3 ) or
Rectangular Slice Boundary BR , BRmod at the position within the block boundary ( Equation BR4 ) in
a case that the target block is not located at the lower end of
[ 0281 ] Incidentally, the block BR, which is one of the the rectangular slice .
blocks referred to as a temporal merge candidate illustrated [ 0287] In this way, by configuring the lower right block
in FIG . 20 (C ) , is located outside of the rectangular slice as in position of the collocated block to the lower right position
FIG . 20 (e ) in a case that the target block is located at the BRmod in the collocated rectangular slice illustrated in FIG .
US 2021/0136407 A1 May 6 , 2021
21

20 (4 ), in the case of rectangular_slice_flag = 1, the rectangu picture ) that is temporally adjacent to the target PU , or a
lar slice sequence can be decoded independently without motion vector of a PU that is spatially adjacent to the target
decreasing the coding efficiency by using a merge prediction PU . Specifically, the spatial- temporal subblock predictor
in a temporal direction . 30371 derives a motion vector spMvLX [ xi ] [ yi ] for each
subblock in the target PU by scaling the motion vector of a
Joint Merge Candidate Derivation Process PU on the reference picture in accordance with the reference
[ 0288 ] As a joint merge derivation process , the merge picture referred to by the target PU ( a temporal subblock
candidate derivation unit 30361 derives a joint merge can prediction )
didate by combining a motion vector and a reference picture [ 0295 ] The spatial- temporal subblock predictor 30371
index of two different derived merge candidates that have may also derive a motion vector spMvLX [ xi ] [ yi ] for each
already been derived and stored in the merge candidate subblock in the target PU by calculating the weighted
storage unit 30363 as motion vectors for LO and L1 , respec average of the motion vector of a PU adjacent to the target
tively , and stores it in the merge candidate list mergeCan PU in accordance with the distance from the subblock
dList [ ] . obtained by partitioning the target PU (a spatial subblock
[ 0289 ] Note that, in a case that a motion vector derived in prediction ). Here, (xPb , yPb ) is the upper left coordinate of
the spatial merge candidate derivation process , the temporal the target PU , W , H is the size of the target PU , BW , BH is
merge candidate derivation process , and the joint merge the size of the subblock , and ( xi, yi ) = (xPb + BW * i , yPb +
candidate derivation process described above indicates even BH * j), i =0 , 1 , 2, ... , W /BW - 1, j= 0 , 1 , 2 , ... , H /BH - 1.
a part of the outside of the collocated rectangular slice of the [ 0296 ] The candidate TSUB for a temporal subblock pre
rectangular slice in which the target block is located , the diction and the candidate SSUB for a spatial subblock
motion vector may be clipped ( the rectangular slice bound prediction described above are selected as one mode ( a
ary motion vector limitation) to be modified to refer to only merge candidate ) of merge modes .
inside of the collocated rectangular slice . This process Motion Vector Scaling
requires the slice coder 2012 and the slice decoder 2002 to
select the same process. [ 0297] Amethod of deriving the scaling of a motion vector
Zero Merge Candidate Derivation Process will be described . Assuming the motion vector as Mv, the
picture including the block with the motion vector My as
[ 0290 ] As a zero merge candidate derivation process , the Pic , the reference picture of the motion vector My as Ric2 ,
merge candidate derivation unit 30361 derives a merge the motion vector after scaling as Mv, the picture including
candidate having the reference picture index refldxLX being the block with the motion vector after scaling sMv as Pict3 ,
0 , and the X component and the Y component of the motion the reference picture referred to by the motion vector after
vector mvLX both being 0 , and stores the merge candidate scaling sMv as Pic4 , the derivation function MvScale (Mv,
in the merge candidate list mergeCandList [ ] . Pic1 , Pic2 , Pic3 , Pic4 ) of sMv is represented by the follow
[ 0291 ] The merge candidates described above derived by ing equation .
the merge candidate derivation unit 30361 are stored in the sMv = MvScale (Mv,Pic1,Pic2,Pic3 ,Pic4 ) = Clip3 (-R1,
merge candidate storage unit 30363. The order of storing in R1-1,sign (distScaleFactor*Mv)* ((abs
the merge candidate list mergeCandList [ ] is { L , A , AR , BL , (distScaleFactor *Mv)+ round1-1) >> shift1))
AL , BR / C , joint merge candidate, and zero merge candi distScaleFactor= Clip3 (-R2,R2–1 ,( tb * tx + round2)
date } in FIGS . 20 ( b) and ( C) . BR / C means to use the block >> shift2 )
C in a case that the block BR is not available . Note that
reference blocks that are not available ( the block is outside tx = ( 16384 + abs (td )>> 1 ) / td
of the rectangular slice , an intra prediction, and the like ) are
not stored in the merge candidate list . td = DiffPicOrderCnt( Pic1 ,Pic2)
[ 0292 ] The merge candidate selection unit 30362 selects a
merge candidate assigned with an index corresponding to th = DiffPicOrderCnt(Pic3, Pic4) (Equation MVSCALE - 1 )
the merge index merge_idx input from the inter prediction [ 0298 ] Here, roundl , round2, shifti, and shift2 are
parameter decoding control unit 3031 as the inter prediction rounded values and shifted values for division by using a
parameter of the target PU among the merge candidates reciprocal, such as round1 = 1 << ( shift1-1), round2 = 1 <<
stored in the merge candidate list mergeCandList [ ] of the ( shift2-1), shift1 = 8 , shift2 = 6 , and the like . DiffPicOrderCnt
merge candidate storage unit 30363. The merge candidate (Pici , Pic2 ) is a function to return a difference in temporal
selection unit 30362 stores the selected merge candidate in information ( for example, POC ) between Picl and Pic2 . R1 ,
the prediction parameter memory 307 and also outputs the R2 , and R3 are to limit the range of values to perform
selected merge candidate to the prediction image generation processing with limited accuracy, such as R1 =32768 ,
unit 308 . R2 = 4096 , R3 = 128 , and the like.
Subblock Predictor [ 0299 ] A scaling function MvScale (Mv, Pic1 , Pic2 , Pic3 ,
Pic4 ) may also be the following equation .
[ 0293 ] Next , the subblock predictor will be described . MvScale (Mv ,Pic1 ,Pic2 ,Pic3 ,Pic4 )
=Mv* DiffPicOrderCnt (Pic3 ,Pic4 )/DiffPicOrder
Spatial- Temporal Subblock Predictor 30371 Cnt(Pic1,Pic2) ( Equation MVSCALE - 2 )
[ 0294 ] The spatial- temporal subblock predictor 30371 [ 0300 ] That is , Mv may be scaled depending on the ratio
derives a motion vector of a subblock which is obtained by between the difference in temporal information between
partitioning a target PU , from a motion vector of a PU on a Picl and Pic2 and the difference in temporal information
reference picture ( for example, the immediately preceding between Pic3 and Pic4 .
US 2021/0136407 A1 May 6 , 2021
22
[ 0301 ] As a specific spatial-temporal subblock prediction subblock (k1 , 11 ) does not exist , the motion vector BMV and
method, an Adaptive Temporal Motion Vector Prediction the reference picture BRef of the block level are set as the
(ATMVP) and a Spatial- Temporal Motion Vector Prediction motion vector SpRefMvLX [ kl ] [ 11 ] and the reference
( STMVP ) will be described . picture SpRef [kl ] [ 11 ] of the subblock ( k1, 11 ) .
ATMVP, Rectangular Slice Boundary ATMVP Step 4 ) Motion Vector Scaling
[ 0302 ] The ATMVP is a method for deriving a motion [ 0308 ] A motion vector SpMvLX [ k] [ l ] for each subblock
vector for each subblock of a target block , based on motion on the target block is derived by the scaling function
vectors of spatial adjacent blocks (L , A , AR , BL , AL ) of the MvScale ( ) from a motion vector SpRefMvLX [k] [ l ] and
target block of the target picture PCur illustrated in FIG . a reference picture SpRef [ k] [ 1 ] of each subblock on the
20 (b ) , and for generating a prediction image in units of reference picture.
subblocks, and is processed by the following procedure. SpMvLX [k ] [/ ] =MvScale (SpRefMvLX [k ] [/ ],Bref,
SpRef [k ] [/ ],PCur,RefPicListX [ refldx0 ]) (Equation ATMVP - 1)
Step 1 ) Initial Vector Derivation
[ 0309 ] Here , RefPicListX [ refldxO ] is a reference picture
[ 0303 ] A first adjacent block available is determined in the of the subblock level of the target block , such as the
order of the spatial adjacent blocks L , A , AR , BL , AL . In a reference picture RefPicListX [ refldxATMVP ], refldx
case that an available adjacent block is found , the motion ATMVP = 0 .
vector and the reference picture of that block are set as the [ 0310 ] Note that the reference picture of the subblock
initial vector IMV and the initial reference picture IRef of level of the target block may not be the reference picture
the ATMVP, and the process proceeds to step 2. In a case that RefPicListX [ refldx0 ], but a reference picture specified by
all adjacent blocks are not available (non available) , the the index ( collocated_ref_idx) used for prediction motion
ATMVF is turned off and the processing is terminated . The vector derivation in a temporal direction signalled in the
meaning of “ ATMVP being turned off” is that the motion slice header illustrated in FIG . 8 ( SYNO3 ) and FIG . 11 (a )
vector by the ATMVP is not stored in the merge candidate ( SYN13 ). In this case , the reference picture of the subblock
list . level of the target block is RefPicListX [ collocated_ref_idx ],
[ 0304 ] Here, the meaning of an “ available adjacent block ” and the calculation equation for the motion vector SpMvLX
is , for example , that the position of the adjacent block is [k] [ l ] of the subblock level of the target block is described
included in the target rectangular slice and the adjacent below .
block has a motion vector .
SpMvLX [ k ] [/ ] =MvScale (SpRefMvLX [k ] [ 7 ],Bref,
Step 2 ) Rectangular Slice Boundary Check of Initial Vector SpRef [k ][?],PCur,RefPicListX [ collocated_ref_
idx ]) (Equation ATMVP - 2)
[ 0305 ] It is checked whether or not the block referred to by
using IMV by the target block is within a collocated rect Step 5 ) Rectangular Slice Boundary Check of Subblock
angular slice on the initial reference picture IRef. In a case Vector
that the block is in a collocated rectangular slice , IMV and [ 0311 ] In the reference picture of the subblock level of the
IRef are set as the motion vector BMV and the reference
picture BRef of the block level of the target block , respec target block , it is checked whether or not the subblock to
tively, and the process is transferred to step 3. In a case that which the target subblock refers by using SpMvLX [k ] [ l ] is
the block is not in a collocated rectangular slice , as illus within a collocated rectangular slice . In a case that the target
trated in FIG . 30 ( a ) , it is checked whether or not the block pointed by a subblock motion vector SpMvLX [k2] [ 12 ] is
referred to by using the sIMV derived by using the scaling not in a collocated rectangular slice in a certain subblock
function MvScale ( IMV, PCur, IRef, PCur, RefPicListX (k2, 12 ) , any of the following processing 1 (processing 1A to
[ refldx ]) from the IMV is in a collocated rectangular slice on processing 1D ) is performed.
the reference pictures RefPicListX [Refldx] (Refldx = 0 ... [ 0312] [ Processing 1A] Rectangular Slice Boundary Pad
ding
the number of reference pictures - 1) stored in the reference [ 0313 ] Rectangular slice boundary padding ( rectangular
picture list RefPicListX sequentially . In a case that the block slice outside padding) is achieved by clipping the reference
is in a collocated rectangular slice , this SIMV and RefPi positions at the positions of the upper, lower, left, and right
cListX [Refldx ] are set as the motion vector BMV and the bounding pixels of the rectangular slice , as previously
reference picture BRef of the block level of the target block , described . For example, in a case that the upper left coor
respectively, and the process is transferred to step 3 . dinate of the target subblock relative to the upper left
[ 0306 ] Note that in a case that no such block is found in coordinate of the picture is (xs , ys ) , the width and the height
all reference pictures stored in the reference picture list , the of the target subblock are BW and BW , the upper left
ATMVF is turned off and the process is terminated . coordinate of the target rectangular slice in which the target
Step 3 ) Subblock Motion Vector subblock is located is ( xRSs, yRSs ) , the width and the height
of the target rectangular slice are w?RS and hRS , and the
[ 0307] As illustrated in FIG . 30 (b ) , on the reference pic motion vector is spMvLX [ k2 ] [ 12 ] , the reference pixel
ture BRef, for the target block , a block at a position shifted (xRef, yRef) of the subblock level is derived with the
by the motion vector BMV is partitioned into subblocks , and following equation.
a motion vector SpRefMvLX [ k ] [ 1 ] (k=0 ... NBW - 1 , 1–0 xRef + i = Clip3 (xRSS,XRSs + wRS - 1,XS + (SpMvLX [k2 ]
... NBH - 1 ) and a reference picture SpRef [k ] [ 1 ] of each [ 12 ] [ 0 ] >>log 2 ( M )) + i)
subblock are obtained . Here, the NBW and the NBH are the
number of subblocks in the horizontal and vertical direc y Ref+ j = Clip3 ( yRSsyRSs + hRS - 1,ys + (SpMvLX [k2 ]
tions , respectively. In a case that a motion vector of a certain [ 12 ] [ 1 ] >>log 2 ( M ) ) + j) (Equation ATMVP - 3)
US 2021/0136407 A1 May 6 , 2021
23

[ 0314 ] [ Processing 1B ] Rectangular Slice Boundary [ 0328 ] The spatial- temporal subblock predictor 30371
Motion Vector Limitation (Rectangular Slice Outside determines the presence or absence of a block based motion
Motion Vector Limitation) vector BMV by which the reference block points within a
[ 0315 ] The subblock motion vector SpMvLX [k2 ] [ 12 ] is collocated rectangular slice , and in a case that there is a
clipped so that the motion vector SpMvLX [ k2 ] [ 12 ] of the BMV, BRef is acquired and the process proceeds to S2306 ,
subblock level does not refer to outside of the rectangular and in a case that there is no BMV, the process proceeds to
slice . For the rectangular slice boundary motion vector S2311 (S2305 ) .
limitations, there are methods such as , for example, (Equa [ 0329 ] The spatial- temporal subblock predictor 30371
tion CLIP1 ) to (Equation CLIP5 ) described above . acquires a subblock based motion vector SpRefMvLX [ k ] [ 1 ]
[ 0316 ] [ Processing 1C ] Rectangular Slice Boundary and a reference picture SpRef [ k ] [ 1 ] of a collocated block by
Motion Vector Replacement (Replacement by Alternative using the block based motion vector BMV and the reference
Motion Vector Outside of Rectangular Slice ) picture BRef of the target block ( S2306 ) .
[ 0317] In a case that the target pointed by the subblock [ 0330 ] The spatial -temporal subblock predictor 30371
motion vector SpMvLX [ k2 ] [ 12 ] is not inside of a collocated derives the subblock based motion vector spMvLX [ k] [ l ] of
rectangular slice , an alternative motion vector SpMvLX [k3 ] the target block by scaling , in a case of the reference picture
[ 13 ] inside of a collocated rectangular slice is copied . For configured to RefPicListX [ refIdXATMVP ], by using the
example, (k3, 13 ) may be an adjacent subblock of (k2, 12 ) or motion vector SpRefMvLX [k] [ 1 ] and the reference picture
a center of the block . SpRef ( S2307 ) .
SpMvLX [k2 ][ 12 ] [0 ] = SpMvLX [k3 ][ 13 ] [ 0 ] [ 0331 ] The spatial -temporal subblock predictor 30371
SpMvLX [k2 ][ 12 ][ 1 ] = SpMvLX [k3 ] [13 ] [ 1 ] (Equation ATMVP - 4 ) determines whether or not each of the blocks pointed by the
motion vector spMvLX [k] [ 1 ] all refers inside of a collo
[ 0318 ] [ Processing 1D ] Rectangular Slice Boundary cated rectangular slice on the reference picture RefPicListX
ATMVP Off (Rectangular Slice Outside ATMVP Off) [ refldxATMVP ]. In a case that all of the blocks refer only
[ 0319 ] In a case that the number of subblocks in which the inside of the collocated rectangular slice , the process pro
target pointed by the subblock motion vector SpMvLX [k2 ] ceeds to S2310 , or otherwise the process proceeds to S2309
[ 12 ] is not within a collocated rectangular slice exceeds a ( S2308 ) .
prescribed threshold value , the ATMVP is turned off and the [ 0332 ] In a case that at least some of the blocks shifted by
process is terminated. For example, the prescribed threshold motion vector spMvLX [ k ] [ l] are outside of the collocated
value may be 1/2 of the total number of subblocks within the rectangular slice , the spatial -temporal subblock predictor
target block. 30371 copies the motion vector of the subblock level of the
[ 0320 ] Note that the processing 1 requires the slice coder adjacent subblocks , having the motion vector of the sub
2012 and the slice decoder 2002 to select the same process . block level in which the subblock after shift is inside of the
[ 0321 ] Step 6 ) The ATMVP is stored in the merge candi collocated rectangular slice ( S2309 ) .
date list . An example of the order of the merge candidates [ 0333 ] The spatial- temporal subblock predictor 30371
stored in the merge candidate list is illustrated in FIG . 31 . stores the motion vector of the ATMVP in the merge
From among this list , a merge candidate for the target block candidate list mergeCandList [ ] illustrated in FIG . 31
is selected by using the merge_idx derived by the inter ( S2310 ) .
prediction parameter decoding control unit 3031 .
[ 0322 ] In a case that the ATMVP is selected as a merge [ 0334 ] The spatial -temporal subblock predictor 30371
candidate , an image on the reference picture RefPicListX does not store the motion vector of the ATMVP in the merge
[ refIdxATMVP ] shifted by SpMvLX [k] [ l ] from each candidate list mergeCandList [ ] ( S2311 ) .
subblock of the target block is read as a prediction image as [ 0335 ] Note that, in addition to copying the motion vectors
illustrated in FIG . 30 ( b) . of the adjacent blocks , the processing of S2309 may be a
[ 0323 ] The merge candidate list derivation process related padding processing of the rectangular slice boundary of the
to ATMVP described in steps 1 ) to 6 ) will be described with reference picture or a clipping processing of the motion
reference to the flowchart of FIG . 32 . vector of the subblock level of the target block , as described
[ 0324 ] The spatial -temporal subblock predictor 30371 in step 5 ) . The ATMVP may also be turned off and the
searches five adjacent blocks of the target block ( S2301 ) . process may proceed to S2311 in a case that the number of
[ 0325 ] The spatial - temporal subblock predictor 30371 subblocks that are not available is greater than a prescribed
determines the presence or absence of a first available threshold value .
adjacent block , and the process proceeds to S2303 in a case [ 0336 ] By the above process , the merge candidate list
that there is an available adjacent block, and the process related to the ATMVP is derived .
proceeds to S2311 in a case that there is no available [ 0337 ] By deriving the motion vector of the ATMVP and
adjacent block ( S2302 ) . generating the prediction image in this manner, the pixel
[ 0326 ] The spatial- temporal subblock predictor 30371 values can be replaced by using the reference pixels in the
configures the motion vector and the reference picture of the collocated rectangular slice even in a case that the motion
available adjacent block as the initial vector IMV and the vector points outside of the collocated rectangular slice for
initial reference picture IRef of the target block ( S2303 ) . an inter prediction, so that the inter prediction can be
[ 0327 ] The spatial- temporal subblock predictor 30371 performed independently on the rectangular slice . Thus,
searches a block based motion vector BMV and a reference even in a case that some of the reference pixels are not
picture BRef of the target block , based on the initial vector included in the collocated rectangular slice , an ATMVP can
IMV and the initial reference picture IRef of the target block be selected as one of the merge candidates. In a case that the
( S2304 ) . performance is higher than that of a merge candidate other
US 2021/0136407 A1 May 6 , 2021
24

than an ATMVP, the ATMVP can be used to generate the smvA_col =MvScale (mvA_col,PCur ,RefA_col,PCur,
prediction image , so that the coding efficiency can be RefPicListX [ collocated_ref_idx ]) ( Equation STMVP - 1 )
increased . [ 0344 ] An unavailable motion vector is set to 0 .
STMVP [ 0345 ] Here, the scaling function MvScale (Mv, Pici ,
[ 0338 ] The STMVP is a scheme to derive a motion vector Pic2 , Pic3 , Pic4 ) is a function for scaling the motion vector
My as described above.
for each subblock of the target block , and generate a
prediction image in units of subblocks , based on the motion [ 0346 ] Step 6 ) The average of smvA_above, smvA_left ,
vectors of the spatial adjacent blocks ( a , b , c , d, ... ) of the and smvA_col is calculated and set as the motion vector
target block of the target picture PCur illustrated in FIG . spMvLX [A] of the subblock A. The reference picture of the
33 (a ) , and the collocated blocks (A' , B ' , C ' , D ' , ) of the subblock A is RcfpicListX [ collocated_ref_idx ].
target block illustrated in FIG . 33 (b ) . A , B , C , and D in FIG . SpMvLX [ A ] = (smvA_above + smvA_left + smvA_col)/
33 (a ) are examples of subblocks into which the target block cnt (Equation STMVP - 2 )
is partitioned . A' , B ' , C ' , and D ' in FIG . 33 ( b) are the [ 0347] For integer computation , for example, it may be
collocated blocks of the subblocks A , B , C , and D in FIG . derived as follows. In a case of cnt == 2, two vectors are
33 (a ) . Ac ' , Bc ' , Cc ' , and Dc ' in FIG . 33 ( 6) are regions located described sequentially as mvA_cnt0 , mvA_cnt1, which may
in the center of A' , B ' , C ' , and D ' , and Abr ', Bbr' , Cbr', and be derived in the following equation .
Dbr' are regions located at the lower right of A' , B ' , C ' , and
D ' . Note that Abr', Bbr ', Cbr', and Dbr' may be not in the SpMvLX [ A ] = (smvA_cntO + smvA_cnt1 ) >> 1
lower right positions outside of A' , B ' , C ' , and D ' illustrated
in FIG . 33 (6 ) , but may be in the lower right positions inside [ 0348 ] In a case of cnt == 3, it may be derived by the
of A' , B ' , C ' , and D ' illustrated in FIG . 33 ( g ). In FIG . 33 ( g ), following equation .
Abr ', Bbr', Cbr' , and Dbr' take positions within the collocated SpMvLX [ A ] = ( 5 * smvA_above + 5 * smvA_left +
rectangular slices . The STMVP is processed with the fol 6 * smvA_col) >> 4
lowing procedure.
[ 0339 ] Step 1 ) The target block is partitioned into sub [ 0349 ] Step 7 ) In the reference picture RefPicListX [col
blocks , and a first available block is determined from the located_ref_idx ], it is checked whether or not the block at
upper adjacent block of the subblock A in the right direction . the position in which the collocated block is shifted by
In a case that an available adjacent block is found , the spMvLX [ A ] is within the collocated rectangular slice . In a
motion vector and the reference picture of that first block is case that some or all of the blocks are not in the collocated
set as the upper vector mvA_above and the reference picture rectangular slice , any of the following processing 2 (pro
RefA_above of the STMVP, with the count cnt= 1 . In a case cessing 2A to processing 2D ) is performed.
that there is no available adjacent block , the count is set as [ 0350 ] [ Processing 2A ] Rectangular Slice Boundary Pad
cnt = 0 . ding
[ 0340 ] Step 2 ) An available first block is determined from [ 0351 ] Rectangular slice boundary padding ( rectangular
the left side adjacent block b of the subblock A in the slice outside padding ) is achieved by clipping the reference
downward direction . In a case that an available adjacent positions at the positions of the upper, lower, left, and right
block is found, the motion vector and the reference picture bounding pixels of the rectangular slice, as previously
of that first block are set as the left side vector mvA_left and
the reference picture RefA_left, and the count cnt is incre described . For example, in a case that the upper left coor
mented by one. In a case that there is no available adjacent dinate of the subblock A relative to the upper left coordinate
block, the count cnt is not updated. of the picture is (xs , ys ) , the width and the height of the
[ 0341 ] Step 3 ) It is checked whether or not a block is subblock A are BW and BH , the upper left coordinate of the
available in the collocated block A' of the subblock A in the target rectangle slice in which the subblock A is located is
order of the lower right position A'br and the A'c . In a case (xRSs, yRSs ) , and the width and the height of the target
that an available region is found , the first motion vector and rectangular slice are wRS and hRS , the reference pixel
the reference picture in that block are set as the collocated ( xRef, yRef) of the subblock A is derived with the following
vector mvA_col and the reference picture RefA_col, and the equation .
count is incremented by one . In a case that there is no xRef + i = Clip3 (xRSs , xRSs + wRS - 1 ,XS + ( spMvLX [ A ] [ 0 ]
>>log 2 ( M ) ) + i)
available block , the count cnt is not updated .
[ 0342 ] Step 4 ) In a case of cnt = 0 ( there is no available y Ref +j= Clip3 (YRSs,yRSs+hRS- 1,yS + (spMvLX [A ][ 1]
motion vector ), the STMVP is turned off and the processing >>log 2 ( M ) ) + j) (Equation STMVP - 3 )
is terminated .
[ 0343 ] Step 5 ) In a case that ctn is not 0 , the temporal [ 0352 ] Note that the processing 2 requires the slice coder
information of the target picture PCur and the reference 2012 and the slice decoder 2002 to select the same process .
picture RefPicListX [ collocated_ref_idx ] of the target block [ 0353 ] [ Processing 2B ] Rectangular Slice Boundary
is used to scale the available motion vectors found in steps Motion Vector Limitation
1 ) to 3 ) . The scaled motion vectors are denoted as smvA_
above , smvA_left, and smvA_col. [ 0354 ] The subblock motion vector spMvLX [ A] is
smvA_above =MvScale (mvA_above ,PCur,RefA_ clipped so that the motion vector spMvLX [ A] of the
above ,PCur,RefPicListX [ collocated_ref_idx ]) subblock level does not refer to outside of the rectangular
slice . For the rectangular slice boundary motion vector
smvA_left =MvScale (mvA_left ,PCur, RefA_left,PCur, limitations, there are methods such as , for example, (Equa
RefPicListX [collocated_ref_idx ]) tion CLIP1 ) to ( Equation CLIP5 ) described above .
US 2021/0136407 A1 May 6 , 2021
25

[ 0355 ] [Processing 2C ] Rectangular Slice Boundary [ 0368 ] The spatial- temporal subblock predictor 30371
Motion Vector Replacement (Replacement by Alternative determines whether or not a block in which the collocated
Motion Vector) subblock on the reference picture is shifted by the motion
[ 0356 ] In a case that the target pointed by the subblock vector spMvLX [ ] is inside of the collocated rectangular
motion vector SpMvLX [ k2] [ 12 ] is not inside of a collocated slice , and in a case that the block is inside of the collocated
rectangular slice , an alternative motion vector SpMvLX rectangular slice , the process proceeds to S2608 , and in a
[ k3 ][ 13 ] inside of a collocated rectangular slice is copied . For case that even a portion is not inside of the collocated
example , (k3, 13 ) may be an adjacent subblock of (k2, 12 ) or rectangular slice , the process proceeds to S2607 ( S2606 ) .
a center of the block . [ 0369 ] The spatial- temporal subblock predictor 30371
SpMvLX [k2 ][ 12 ] [0 ] = SpMvLX [k3 ][ 13 ] [ 0 ] clips the motion vector spMvLX [ ] in a case that the block
shifted by the motion vector spMvLX [ ] is outside of the
SpMvLX [k2 ][12 ][ 1 ] = SpMvLX [k3 ][ 13 ] [ 1] ( Equation STMVP - 4) collocated rectangular slice ( S2607 ) .
[ 0357 ] [ Processing 2D ] Rectangular Slice Boundary [ 0370 ] The spatial- temporal subblock predictor 30371
STMVP Off checks whether or not the subblock during processing is the
[ 0358 ] In a case that the number of subblocks in which the last subblock of the target block ( S2608 ) , and the process
target pointed by the subblock motion vector SpMvLX [k2 ] proceeds to S2610 in a case of the last subblock, and
[ 12 ] is not within a collocated rectangular slice exceeds a otherwise the processing target is transferred to the next
prescribed threshold value , the STMVP is turned off and the subblock and the process proceeds to S2602 ( S2609 ) , and
process is terminated . For example, the prescribed threshold S2602 to S2608 are processed repeatedly.
value may be 1/2 of the total number of subblocks within the [ 0371 ] The spatial -temporal subblock predictor 30371
target block. stores the motion vector of the STMVP in the merge
[ 0359 ] Step 8 ) The processes in steps 1 ) to 7 ) described candidate list mergeCandList [ ] illustrated in FIG . 31
above are performed on each subblock of the target block , ( S2610 ) .
such as the subblocks B , C , and D , and the motion vectors [ 0372 ] The spatial -temporal subblock predictor 30371
of the subblocks are determined as in FIGS . 33 ( d ), ( e) , and does not store the motion vector of the STMVP in the merge
(7 ). However, in the subblock B , an upper side adjacent block candidate list mergeCandList [ ] in a case that there is no
is searched from d to the right direction . In the subblock C , available motion vector, and the process is terminated
the upper side adjacent block is A , and a left side adjacent ( S2611 ) .
block is searched from a in the downward direction . In the
subblock D , the upper side adjacent block is B , and the left [ 0373 ] Note that, in addition to the clipping process of the
side adjacent block is C. motion vector of the target subblock, the processing of
[ 0360 ] Step 9 ) The motion vectors of the STMVP are S2607 may be a padding process of the rectangular slice
stored in the merge candidate list . The order of the merge boundary of the reference picture , as described in 7 ) .
candidates stored in the merge candidate list is illustrated in [ 0374 ] By the above process , the merge candidate list
FIG . 31. From among this list , a merge candidate for the related to the STMVP is derived .
target block is selected by using the merge_idx derived by [ 0375 ] By deriving the motion vector of the STMVP and
the inter prediction parameter decoding control unit 3031 . generating the prediction image in this manner, the pixel
[ 0361 ] In a case that the STMVP is selected as a merge values can be replaced by using the reference pixels in the
candidate , an image on the reference picture RefPicListX collocated rectangular slice even in a case that the motion
[ collocated_ref_idx] shifted by the motion vector from each vector points outside of the collocated rectangular slice for
subblock of the target block is read as a prediction image . an inter prediction , so that the inter prediction can be
[ 0362 ] The merge candidate list derivation process related performed independently on the rectangular slice . Thus,
to STMVP described in steps 1 ) to 9 ) will be described with even in a case that some of the reference pixels are not
reference to the flowchart of FIG . 34 ( a ) . included in the collocated rectangular slice , an STMVP can
[ 0363 ] The spatial-temporal subblock predictor 30371 be selected as one of the merge candidates. In a case that the
partitions the target block into subblocks (S2601). performance is higher than that of a merge candidate other
[ 0364 ] The spatial -temporal subblock predictor 30371 than an STMVP, the STMVP can be used to generate the
searches adjacent blocks on the upper side and the left side, prediction image , so that the coding efficiency can be
and in the temporal direction of the subblocks ( S2602 ) . increased
[ 0365 ] The spatial- temporal subblock predictor 30371
determines the presence or absence of an available adjacent Affine Predictor
block, and the process proceeds to S2604 in a case that there
is an available adjacent block , and the process proceeds to [ 0376 ] The affine predictors 30372 and 30321 derive an
S2610 in a case that there is no available adjacent block affine prediction parameter of the target PU . In the present
( S2603 ) . embodiment, motion vectors (mv0_x , mv0_y ) and (mvl_x,
[ 0366 ] The spatial- temporal subblock predictor 30371 mvl_y ) of two control points ( VO , V1 ) of the target PU is
scales the motion vectors of the available adjacent blocks derived as afline prediction parameters . Specifically, a
depending on the temporal distance between the target motion vector of each control point may be derived by
picture and the reference pictures of the multiple adjacent prediction from a motion vector of an adjacent PU of the
blocks ( S2604) . target PU ( the affine predictor 30372 ) , or a motion vector of
[ 0367 ] The spatial- temporal subblock predictor 30371 cal each control point may be derived from the sum of the
culates an average value of the scaled motion vectors to set prediction vector derived as the motion vector of the control
as the motion vector spMvLX [ ] of the target subblock point and the difference vector derived from the coded data
( S2605 ) . ( the affine predictor 30321 ).
US 2021/0136407 A1 May 6 , 2021
26

Motion Vector Derivation Process of Subblock reference block with reference to mvpV2_LX_idx. Then, the
motion vector of the selected AMVP reference block is set
[ 0377] As a further specific example of an embodiment as the prediction vector mvpV2LX of the representative
configuration, a processing flow in which the afline predic point V2.
tors 30372 and 30321 derive the motion vector mvLX of [ 0383 ] For example, as in FIG . 35 ( c - 2 ) , in a case that the
each subblock by using the affine prediction will be left side of the target block shares the boundary with the
described in steps below . The process in which the affine rectangular slice boundary, the control points are VO and V1 ,
predictors 30372 and 30321 use the affine prediction to and the reference block of the control point VO is B. In this
derive the motion vector mvLX of the target subblock case , mvpV_LO_idx is not required. Note that, in a case that
includes three steps of ( STEP 1 ) to ( STEP 3 ) described the reference block B is an intra prediction , the affine
below.
prediction may be turned off ( the affine prediction is not
( STEP1 ) Derivation of Control Point Vector performed, affine_flag = 0 ), or the affine prediction may be
performed by coping the prediction vector of the control
[ 0378 ] This is a process of deriving a motion vector of point VI as the prediction vector of the control point VO .
each of the representative points of the target block ( here, the These may be processed the same as the affine predictor
upper left point V of the block and the upper right point V1 11221 of the slice coder 2012 .
of the block) as two control points used in the affine [ 0384 ] As in FIG . 35 ( c - 1 ) , in a case that the upper side of
prediction for deriving the candidate by the affine predictors the target block shares the boundary with the rectangular
30372 and 30321. Note that a representative point of the slice boundary, the control points are VO and V2 , and the
block uses a point on the target block . In the present reference block of the control point VO is C. In this case ,
specification, a representative point of a block used for a mvpVO_LO_idx is not required. Note that, in a case that the
control point of an affine prediction is referred to as a “ block reference block C is an intra prediction, the affine prediction
control point". may be turned off ( the affine prediction is not performed ), or
[ 0379 ] First , each of the processes of the AMVP mode and the affine prediction may be performed by coping the
the merge mode ( STEP1 ) will be described with reference to prediction vector of the control point V2 as the prediction
FIG . 35 , respectively . FIG . 35 is a diagram illustrating an vector of the control point V0 . These may be processed the
example of a position of a reference block utilized for same as the affine predictor 11221 of the slice coder 2012 .
derivation of a motion vector for a control point in the
AMVP mode and the merge mode. Derivation of Motion Vector of Control Point in Merge
Mode
Derivation of Motion Vector of Control Point in AMVP
Mode [ 0385 ] The affine predictor 30372 refers to the prediction
[ 0380 ] The affine predictor 30321 adds a prediction vector parameter memory 307 to check whether or not an affine
mvpVNLX and a difference vector of two control points prediction is used for the blocks including L , A , AR , LB , and
(VO , V1 ) to derive a motion vector mvN = (mvN_x, mvN_y ), AL as illustrated in FIG . 35 ( d ). The affine predictor 30372
respectively. N represents a control point. searches the blocks L , A , AR , LB , and AL in that order, and
[ 0381 ] More specifically, the affine predictor 30321 selects a first found block that utilizes an affine prediction
derives a prediction vector candidate of a control point VN ( referred to here as L in FIG . 35 ( d )) as a reference block ( a
(N =0 ... 1 ) to store in the prediction vector candidate list merge reference block) to derive a motion vector.
mvpList VNLX [ ] . Furthermore, the affine predictor 30321 [ 0386 ] The affine predictor 30372 derives a motion vector
derives a prediction vector index mvpVN_LX_idx of the (mvN_x, mvN_y ) (N = 0 ... 1 ) of a control point ( for
point VN from the coded data , and a motion vector (mvN_x, example VO or V1 ) from motion vectors (mvvN_x , mvvN_
mvN_y ) of the control point VN from a difference vector y ) (N=0 ... 2 ) of the block including three points of the
mvdVNLX by using the following equation. selected merge reference block (the point v0 , the point v1 ,
mvN_x =mvNLX [ 0 ] =mvpListVNLX [mvpVN_LX_
and the point v2 in FIG . 35 ( e) ) . Note that in the example
idx ] [O ] +mvdVNLX [ O ] illustrated in FIG . 35 (e ) , the horizontal width of the target
block is W , the height is H , and the lateral width of the merge
mvN_y =mvNLX [ 1 ] =mvpListVNLX [mvpVN_LX_ reference block (the block including L in the example
idx ][ 1 ] +mvdVNLX [ 1 ] ( Equation AFFIN - 1 ) illustrated in the drawing) is w and the height is h .
[ 0382 ] As illustrated in FIG . 35 (a ) , the affine predictor MVO_x = mvOLX[0 ]= mvvo_x + (mvvl_x -mvvo_x )/w * w
30321 selects either of the blocks A , B , and C adjacent to one (mvv2_y - mvvo_y )/ h * (h - H )
of the representative points as a reference block (an AMVP mv_y = MVOLX [ 1 ] =mvvo_y + (m_v2_y -mwo_y )/ h * w +
reference block) with reference to mvpVO_LX_idx. Then , (mvvl_x -mvvo_x ) / w * ( h - H )
the motion vector of the selected AMVP reference block is
set as the prediction vector mvp VOLX of the representative myl_x = mv1LXTO ] =mvvo_x + (mvvl_x -mvvo_x )/ w * ( w +
point VO . Furthermore , the affine predictor 30321 selects W ) - (mvv2_y -mvvo_y ) / h * ( h - H )
either of the blocks D and E as an AMVP reference block
with reference to mvpV1_LX_idx. Then , the motion vector mvl_y = mv1LX [ 1 ] = mvvo_y + (mvv2_y -mvvo_y ) /h * ( w+
W ) + (mvvl_x - mvvo_x )/ w * (h - H ) (Equation AFFINE - 2 )
of the selected AMVP reference block is set as the prediction
vector mvp ViLX of the representative point Vi. Note that [ 0387] In a case that the reference picture of the derived
a position of a control point in ( STEP1 ) is not limited to the motion vectors mvo and mvl is different from the reference
above position , and instead of V1 may be the position of the picture of the target block , it may be scaled based on the
lower left point V2 of the block illustrated in FIG . 35 (b ) . In inter picture distance between each of the reference pictures
this case , any of the blocks F and G is selected as an AMVP and the target picture.
US 2021/0136407 A1 May 6 , 2021
27

[ 0388 ] Next , in a case that the motion vector (mvN_x , replaced by the motion vector of V2, a motion vector of each
mvN_y ) ( N = 0 . 1 ) of the control points VO and V1 derived subblock can be derived in a similar manner for the control
by the affine predictors 30372 and 30321 in ( STEP1 ) points points VO and V2 as well .
to outside of the rectangular slice (in the reference picture, [ 0399 ] FIG . 36 (a ) is a diagram illustrating an example of
some or all of the blocks at the positions to which collocated deriving a motion vector spMvLX of each subblock consti
blocks are shifted by mvN are not inside of the collocated tuting the target block from the motion vector (mv0_x,
rectangular slice ) , any of the following processes 4 (pro mv0_y) of the control point VO and the motion vector
cessing 4A to processing 4D ) is performed . (mvl_x, mvl_y) ofV1. The motion vector spMvLX of each
[ 0389 ] [ Processing 4A] Rectangular Slice Boundary Pad subblock is derived as a motion vector for each point located
ding in the center of each subblock , as illustrated in FIG . 36 ( a ) .
[ 0390 ] A rectangular slice boundary padding process is [ 0400 ] The affine predictors 30372 and 30321 derive a
performed at STEP3 . In this case , an additional processing motion vector spMvLX [xi ] [vi ] ( xi = xb + BW * i, yj = yb +
is not particularly performed in ( STEP1 ) . Rectangular slice BH * j, i = 0 , 1 , 2 , ... , W /BW - 1, j= 0,1,2 , ... , H /BH - 1 ) of
boundary padding ( rectangular slice outside padding ) is each subblock in the target PU , based on the motion vectors
achieved by clipping the reference positions at the positions (mv0_x, mv0_y ) and (mvl_x , mvl_y ) of the control points
of the upper, lower, left, and right bounding pixels of the VO and Vi by using the following equation.
rectangular slice , as previously described . For example, in a SpMvLX [xi][vi][0 ] =mvo_x + (mv1_x -mvo_x )/W * (xi+
case that the upper left coordinate of the target subblock BW / 2 ) - (mvl_y - mvO_y ) / W * ( yi + BH / 2 )
relative to the upper left coordinate of the picture is ( xs, ys ) ,
the width and the height of the target block are W and H , the SpMvLX [xi][yi][1 ] =mvo_y + (mvl_y -mvo_y)/ W * (xi +
upper left coordinate of the target rectangular slice in which BW /2 ) + (mvl_x -mvO_x )/ W * ( vi + BH / 2 ) ( Equation
the target subblock is located is (xRSs , yRSs ) , and the width AFFINE- 4 )
and the height of the target rectangular slice are wRS and [ 0401 ] Here, xb and yb are the upper left coordinate of the
hRS , a reference pixel ( xRef, yRef) of the subblock level is target PU , W and H are the width and the height of the target
derived in the following equation . block , and BW and BH are the width and the height of the
xRef + i = Clip3 (xRSs,xRSS + WRS - 1 ,XS + (SpMvLX [k2 ] subblock .
[12 ] [0 ] >>log 2 ( M )) + i) [ 0402 ] FIG . 36 (b ) is a diagram illustrating an example in
which a target block ( the width W and the height H ) is
yRef+ j = Clip3(yRSS,YRSs+ hRS - 1,ys+ (SpMvLX [k2 ] partitioned into subblocks having the width BW and the
[12 ] [ 1 ] >>log 2 ( M )) + j) (Equation AFFINE - 3 ) height BH .
[ 0391 ] [ Processing 4B ] Rectangular Slice Boundary [ 0403 ] The points of a subblock position ( i , j ) and a
Motion Vector Limitation subblock coordinate (xi, yj) are the intersections of the
[ 0392 ] The subblock motion vector spMvLX [k2 ] [ 12 ] is dashed lines parallel to the x axis and the dashed lines
clipped so that the motion vector spMvLX [ k2 ] [ 12 ] of the parallel to the y axis in FIG . 36 (b ) . FIG . 36 (b ) illustrates , by
subblock level does not refer to outside of the rectangular way of example, the point of the subblock position (i ,
slice . For the rectangular slice boundary motion vector j ) = ( 1,1 ) , and the point of the subblock coordinate (xi ,
limitations, there are methods such as , for example, (Equa yj) = (x1, y1 ) = (BW +BW / 2, BH + BH/ 2 ) for the subblock posi
tion CLIP1 ) to (Equation CLIP5 ) described above. tion ( 1 , 1 ) .
[ 0393 ] [Processing 4C ] Rectangular Slice Boundary
Motion Vector Replacement ( Alternative Motion Vector ( STEP3 ) Subblock Motion Compensation
Replacement) [ 0404 ] This is a process in which the motion compensation
[ 0394 ] A motion vector is copied from an adjacent sub unit 3091 performs a motion compensation in subblock
block with a motion vector pointing inside of a collocated units, based on the prediction list utilization flag pred
rectangular slice . FlagLX input from the inter prediction parameter decoder
[ 0395 ] [ Processing 4D ] Rectangular Slice Boundary 303 , the reference picture index refldxLX, the motion vector
Affine Off spMvLX of the subblock derived in ( STEP2 ) , in a case of
[ 0396 ] In a case that it is determined to point to outside of affine_flag = 1. Specifically, the motion compensation unit
the collocated rectangular slice , affine_flag = 0 is set (an 3091 generates a motion compensation image PredLX by
affine prediction is not performed ). In this case , the process reading and filtering a block at a position shifted by the
ing described above is not performed . motion vector spMvLX , starting from the position of the
[ 0397] Note that the processing 4 requires to select the target subblock, on the reference picture specified by the
same processing by the affine predictor of the slice coder reference picture index refldxLX , from the reference picture
2012 and the affine predictor of the slice decoder 2002 . memory 306.
( STEP2 ) Derivation of Subblock Vector [ 0405 ] In a case that the motion vector of the subblock
derived in ( STEP2 ) points to outside of the rectangular slice ,
[ 0398 ] This is a process in which the affine predictors the pixel is read by padding the rectangular slice boundary.
30372 and 30321 derives a motion vector of each subblock [ 0406 ] Note that in the slice decoder 2002 , in a case that
included in the target block from a motion vector of block there is affine_flag signalled from the slice coder 2012 , the
control points ( the control points VO and V1 or VO and V2 ) processing described above may be performed only in a case
being representative points of the target block derived at of affine_flag = 1.
( STEP1 ) . By ( STEP1 ) and ( STEP2 ) , a motion vector [ 0407] FIG . 37 ( a ) is a flowchart illustrating operations of
spMvLX of each subblock is derived . Note that, in the the affine prediction described above .
following, an example of the control points VO and V1 is [ 0408 ] The affine predictor 30372 or 30321 derives a
described , but in a case that the motion vector of V1 is motion vector of the control point ( S3101 ) .
US 2021/0136407 A1 May 6 , 2021
28

[ 0409 ] Next , the affine predictor 30372 or 30321 deter Matching Motion Derivation Unit 30373
mines whether or not the derived motion vector of the [ 0421 ] The matching motion derivation unit 30373 derives
control point points to outside of the rectangular slice
( S3102 ) . In a case that the motion vector does not point to a motion vector spMvLX of a block or a subblock consti
outside of the rectangular slice ( N at S3102 ) , the process tuting a PU by performing matching processing of either the
proceeds to S3104 . In a case that the motion vector points to bilateral matching or the template matching. FIG . 38 is a
outside of the rectangular slices even partially (Y in S3102 ) , diagram for describing (a ) Bilateral matching, and ( b ) Tem
the process proceeds to S3103 . plate matching. The matching motion derivation mode is
[ 0410 ] In the case that the motion vector points to outside selected as one merge candidate (matching candidate ) in
of the rectangular slice even partially, the affine predictor merge modes .
30372 or 30321 any of the processes 4 described above, for [ 0422 ] The matching motion derivation unit 30373 derives
example, clipping the motion vector to modify the motion a motion vector by matching of regions in multiple reference
vector to point to inside of the rectangular slice . pictures, assuming that an object is moving at an equal
[ 0411 ] These S3101 to S3103 are the processes corre speed . In the bilateral matching , a motion vector of the target
sponding to ( STEP1 ) described above. PU is derived by matching between the reference pictures A
[ 0412 ] The affine predictor 30372 or 30321 derives the and B , assuming that an object passes through a certain
motion vector of each subblock , based on the derived motion region of the reference picture A , a target PU of the target
vector of the control point ( S3104 ) . S3104 is a process picture Cur_Pic, and a certain region of the reference picture
corresponding to ( STEP2 ) described above . B at an equal speed . In the template matching, a motion
[ 0413 ] The motion compensation unit 3091 determines vector is derived by matching of an adjacent region Temp_
whether or not affine_flag = 1 ( S3105 ) . In a case of not Cur (template) of the target PU and an adjacent region
affine_flag = 1 ( N in S3105 ), the motion compensation unit Temp_L0 of the reference block on the reference picture,
3091 does not perform an affine prediction , and terminates assuming that the motion vector of the adjacent region of the
the affine prediction process . In a case of affine_flag = 1 (Y in target PU and the motion vector of the target PU are equal.
S3105 ) , the process proceeds to S3106 . In the matching motion derivation unit , the target PU is
[ 0414 ] The motion compensation unit 3091 determines partitioned into multiple subblocks, and the bilateral match
whether or not the motion vector of the subblock points to ing or the template matching described later is performed in
outside of the rectangular slice (3106 ) . In a case that the units of partitioned subblocks ,
motion vector does not point to outside of the rectangular [ 0423 ] to derive a motion vector of a subblock spMvLX
slice ( N at S3106 ) , the process proceeds to S3108 . In a case [xi ] [ yi ] ( xi = xPb + BW * i, yj =yPb + BH * j, i =0,1,2 ,
that the motion vector points to outside of the rectangular W /BW - 1 , j = 0 , 1 , 2 , ... , H /BH - 1 ).
slices even partially ( Y in S3106 ) , the process proceeds to [ 0424 ] As illustrated in (a ) of FIG . 38 , in the bilateral
S3107 . matching, two reference pictures are referred to for deriving
[ 0415 ] In a case that the motion vector of the subblock a motion vector of the target block Cur_block in the target
points to outside of the rectangular slice even partially , the picture Cur_Pic. More specifically, first, in a case that the
motion compensation unit 3091 performs padding to the coordinate of the target block Cur_block is expressed as
rectangular slice boundary ( S3107 ) . (xCur, yCur ), a region within the reference picture Refo
[ 0416 ] The motion compensation unit 3091 generates a ( referred to as the reference picture A) specified by the
motion compensation image by an affine prediction, by reference picture index refldxLO, the region Block_A having
using the motion vector of the subblock ( S3108 ) . the upper left coordinate (xPoso , yPoso ) specified by :
[ 0417] These S3105 to S3108 are the processes corre (xPos0 ,y Pos0 )= (xCur +mvo [0 ],yCur +mvo [1 ]) ( Equation FRUC - 1)
sponding to ( STEP3 ) described above .
[ 0418 ] FIG . 37 ( b ) is a flowchart illustrating an example of and , for example , a region within the reference picture Refl
determining a control point in a case of an AMVP prediction (referred to as the reference picture B ) specified by the
at S3101 in FIG . 37 (a ) . reference picture index refldxL1, the region Block_B having
[ 0419 ] The affine predictor 30321 determines whether or the upper left coordinate (xPos1 , yPosl ) specified by
not the upper side of the target block shares the boundary (xPos1,yPosl )= (xCur +mv1 [0 ],xCur +mv1[1 ]) = (xCur
with the rectangular slice boundary ( S3110 ) . In a case that mvo [0 ]* DiffPicOrderCnt(Cur_Pic ,Refl)/DiffPi
it shares the boundary with the upper side boundary of the OrderCnt (Cur_Pic ,Refo ),yCur-mvo [ 1 ] *DiffPi
rectangular slice (Yin S3110 ) , the process proceeds to S3111 OrderCnt( Cur_Pic,Refl )/DiffPicOrderCnt( Cur_
(Equation FRUC - 2 )
and the control points are set to VO and V2 ( S3111 ) . Pic ,Refo )
Otherwise (N at S3110 ) , the process proceeds to S3112 and are configured.
the control points are set to VO and V1 ( S3112 ) .
[ 0420 ] In an affine prediction , even in a case that the [ 0425 ] Here, DiffPicOrderCnt (Cur_Pic , Refo ) and Dif
adjacent block is outside of the rectangular slice , or the fPicOrderCnt ( Cur_Pic , Refl ) represent a function of return
motion vector points to outside of the rectangular slice , by ing a difference in temporal information between the target
configuring a control point, deriving a motion vector of the picture Cur_Pic and the reference picture A , and a function
affine prediction, and generating a prediction image as of returning a difference in temporal information between
described above , the reference pixel can be replaced by the target picture Cur_Pic and the reference picture B ,
using a pixel value within the rectangular slice . Therefore , a respectively, as illustrated in ( a) of FIG . 38 .
reduction in the frequency of use of an affine prediction [ 0426 ] Next, (mvo [ 0 ] , mv0 [ 1 ] ) is determined so that the
processing can be suppressed , and the rectangular slices can matching costs of Block_A and Block_B are minimized .
be independently performed an inter prediction, so that the (mvo [ 0 ] , mv0 [ 1 ] ) derived in this way is the motion vector
coding efficiency can be increased . applied to the target block . Based on the motion vector
US 2021/0136407 A1 May 6 , 2021
29

applied to the target block , a motion vector spMVLO is [ 0439 ] Note that, among the steps illustrated in FIG . 39 ( a ) ,
derived for each subblock into which the target block is S3201 to S3205 are a block search performed at a block
partitioned . level . That is , a pattern match is used to derive a motion
[ 0427] Meanwhile , (b ) of FIG . 38 is a diagram illustrating vector across a block (CU or PU) .
the Template matching among the matching processes [ 0440 ] S3206 to S3207 are a subblock search performed at
described above . a subblock level. That is , a pattern match is used to a derive
[ 0428 ] As illustrated in (b ) of FIG . 38 , in the template motion vector in subblock units that constitute a block .
matching , reference is made to one reference picture at a ( 0441 ] First , in S3201 , the matching predictor 30373 con
time in order to derive a motion vector of the target block figures an initial vector candidate for the block level in the
Cur_block in the target picture Cur_Pic. target block . The initial vector candidate is a motion vector
[ 0429 ] More specifically, for example, a region within the of an adjacent block , such as an AMVP candidate, a merge
reference picture Refo ( referred to as the reference picture candidate , or the like of the target block .
A) specified by the reference picture index refIdxLO, the [ 0442 ] Next, at S3202 , the matching predictor 30373
region referred to as the reference block Block_A having the searches a vector having a minimum matching cost among
upper left coordinate (xPoso , yPoso ) identified by the initial vector candidates configured above to set as an
( xPos0 ,yPos0 ) = (xCur + mvo [0 ],yCur + mvo [ 1 ]) (Equation FRUC - 3 ) initial vector being a basis of a vector search . The matching
cost is expressed as , for example, the following equation .
[ 0430] is identified . SAD = XXabs ( Block_A [ x ][ y ]-Block_B [ x ][ y ]) ( Equation FRUC - 5 )
[ 0431 ] Here, (xCur, yCur ) is the upper left coordinate of
the target block Cur_block . [ 0443 ] Here , EX is the sum of x and y, Block_A [ ] [ ] and
[ 0432 ] Next, the template region Temp_Cur adjacent to Block_B [ ] [ ] are blocks in which the upper left coordinates
the target block Cur_block in the target picture Cur_Pic and of the blocks are represented by ( xPos0 , yPos0 ) and (xPos1,
the template region Temp_L0 adjacent to Block_A in the yPosl ) in (Equation FRUC - 1) and ( Equation FRUC - 2 ),
reference picture A are configured. In the example illustrated respectively , and the initial vector candidate is substituted
in (b ) of FIG . 38 , the template region Temp_Cur is consti into (mvo [ O ] , mv0 [ 1 ] ) . Then , the vector with the minimum
tuted by a region adjacent to the upper side of the target matching cost is set again to (mvo [ 0] , mv0 [ 1 ] ) .
block Cur_block and a region adjacent to the left side of the [ 0444 ] Next, at S3203 , the matching predictor 30373
target block Cur_block . The template region Temp_LO is determines whether or not the initial vector determined at
comprised of a region adjacent to the upper side of Block_A S3202 points to outside of the rectangular slice (in the
and a region adjacent to the left side of Block_A . reference picture , some or all of the blocks at the positions
[ 0433 ] Next , (mv0 [ 0 ] , mv0 [ 1 ] ) by which the matching in which the collocated block is shifted by mvN ( N = 0 ...
cost of Temp_Cur and Temp_LO is minimized is determined , 1 ) are not inside of the collocated rectangular slice ) . In a case
as a motion vector applied to the target block . Based on the that the initial vector does not point outside of the rectan
motion vector applied to the target block , a motion vector gular slice ( N at S3203 ) , the process proceeds to S3205 . In
spMvLO is derived for each subblock into which the target a case that the initial vector points to outside of the rectan
block is partitioned . gular slices even partially (Y at S3203 ) , the process proceeds
to S3204 .
[ 0434 ] The template matching may also be processed for [ 0445 ] In S3204 , the matching predictor 30373 performs
two reference pictures Ref0 and Refl . In this case , matching any of the following processes 5 (processing 5A to process
of the reference picture Ref0 described above and matching ing 5C ) .
of the reference picture Refl are performed sequentially . A [ 0446 ] [ Processing 5A ] Rectangular Slice Boundary Pad
region in the reference picture Ref1 ( referred to as the ding
reference picture B ) specified by the reference picture index [ 0447] The rectangular slice boundary padding is per
refldxL1, the region being the reference block Block_B formed by the motion compensation unit 3091 .
having the upper left coordinate (xPosl , yPosl ) identified by [ 0448 ] The pixel pointed by the initial vector (mvo [ 0] ,
( xPos1 , yPos1 ) = (xCur +mv1 [ 0 ],yCur +mv1 [ 1 ]) ( Equation FRUC - 4 ) mv0 [ 1 ] ) is clipped so as not to refer to outside of the
[ 0435 ] is identified , and the template region Temp_L1 rectangular slice . In a case that the upper left coordinate of
adjacent to Block_B in the reference picture B is configured . the target block relative to the upper left coordinate of the
[ 0436 ] Finally, (mvl [ 0] , mv1 [ 1 ] ) by which the matching picture is ( xs, ys ) , the width and the height of the target block
cost of Temp_Cur and Temp_L1 is minimized is determined , are W and H , the upper left coordinate of the target rectan
as a motion vector applied to the target block . Based on the gular slice in which the target block is located is (xRSS ,
motion vector applied to the target block , a motion vector yRSs ) , and the width and the height of the target rectangular
spMvL1 is derived for each subblock into which the target slice are wRS and HRS, a reference pixel (xRef, yRef) of a
block is partitioned. subblock is derived in the following equation .
xRef + i = Clip3 ( xRSs,xRSs + wRS - 1 ,XS + (mvo [ 0 ]>> log
Motion Vector Derivation Process by Matching Processing 2 ( M )) + i)
[ 0437] The flow of motion vector derivation ( pattern y Ref + j= Clip3 (yRSsyRSs + hRS - 1,ys + mv1 [ 1 ]>>log
2 (M )) + j) (Equation FRUC - 6 )
match vector derivation) process in a matching mode will be
described with reference to the flowchart of FIG . 39 . [ 0449 ] [ Processing 5B ] Rectangular Slice Boundary
[ 0438 ] The process illustrated in FIG . 39 is executed by Motion Vector Limitation
the matching predictor 30373. FIG . 39 ( a ) is a flowchart of [ 0450 ] The initial vector mv0 is clipped so that motion
the bilateral matching processing, and FIG . 39 ( b ) is a vector mv0 of the initial vector does not refer to outside of
flowchart of the template matching processing . the rectangular slice . For the rectangular slice boundary
US 2021/0136407 A1 May 6 , 2021
30

motion vector limitations, there are methods such as , for Otherwise ( N at S3211 ) , the process proceeds to S3212 , and
example , (Equation CLIP1 ) to ( Equation CLIP5 ) described any of the following processes 6 (Processing 6A to process
above . ing 6E ) is performed.
[ 0451 ] [ Processing 5C ] Rectangular Slice Boundary [ 0463 ] [ Processing 6A ] Rectangular Slice Boundary Pad
Motion Vector Replacement ( Alternative Motion Vector ding
Replacement) [ 0464 ] The motion compensation unit 3091 performs a
[ 0452 ] In a case that the target pointed by the motion rectangular slice boundary padding ( for example, ( Equation
vector mv0 is not inside of a collocated rectangular slice , an FRUC -6 ) described above ).
alternative motion vector inside of a collocated rectangular [ 0465 ] [ Processing 6B ] Rectangular Slice Boundary
slice is copied Motion Vector Limitation
[ 0453 ] [ Processing 5D ] Rectangular Slice Boundary Bilat [ 0466 ] The motion vector is clipped so that the motion
eral Matching Off vector does not refer to outside of the rectangular slice . For
[ 0454 ] In a case that referring to outside of the collocated the rectangular slice boundary motion vector limitations ,
rectangular slice is determined , BM_flag that indicates on or there are methods such as , for example, ( Equation CLIP1 ) to
off of the bilateral matching is set to 0 , and the bilateral (Equation CLIP5 ) described above .
matching is not performed ( the process proceeds to end ). [ 0467] [ Processing 6C ] Rectangular Slice Boundary
[ 0455 ] Note that the processing 5 requires the slice coder Motion Vector Replacement ( Alternative Motion Vector
2012 and the slice decoder 2002 to select the same process . Replacement)
[ 0456 ] In S3205 , the matching predictor 30373 performs [ 0468 ] In a case that the target pointed by the subblock
local search of the block level in the target block . In a local motion vector is not inside of a collocated rectangular slice ,
search , local regions with the center of the initial vector an alternative motion vector inside of a collocated rectan
derived from S3202 or S3204 ( for example, regions of D gular slice is copied .
pixels centered on the initial vector) are further searched , [ 0469 ] [ Processing 6D ] Template Matching Off
and a vector having a minimum matching cost is searched to [ 0470 ] In a case that referring to outside of the collocated
set as the motion vector of the final target block . rectangular slice is determined , TM_flag that indicates on or
[ 0457 ] Next, the following process is performed for each off of the template matching is set to 0 , and the template
subblock included in the target block ( S3206 to S3207 ) . matching is not performed the process proceeds to end ).
[ 0458 ] At S3206 , the matching predictor 30373 derives an [ 0471 ] [Processing 6E] in a Case that Either One of the
initial vector of a subblock in the target block ( initial vector Upper Adjacent Region and the Left Adjacent Region is
search ). The initial vector candidate of the subblock is a within the Rectangular Slice , that Adjacent Region is Set as
motion vector of the block level derived at S3205 , a motion a Template.
vector of an adjacent block in the spatial - temporal direction [ 0472 ] Note that the processing 6 requires the slice coder
of the subblock , an ATMVP or STMVP vector of the 2012 and the slice decoder 2002 to select the same process .
subblock , and the like. Among these candidate vectors , a [ 0473 ] Next, at S3201 , the matching predictor 30373
vector that minimizes the matching cost is set as the initial configures an initial vector candidate of the block level in the
vector of the subblock . Note that the vector candidates used target block . The processing of S3201 is the same as the
for the initial vector search of the subblock are not limited S3201 in FIG . 39 (a) .
to the vectors described above . [ 0474 ] Next, at S3202 , the matching predictor 30373
[ 0459 ] Next , at S3207 , the matching predictor 30373 searches a vector having a minimum matching cost among
performs a step search or the like (local search ) in a local the initial vector candidates configured above to set as an
region centered on the initial vector of the subblock selected initial vector being a basis of a vector search . The matching
at S3206 ( for example, a region of 3D pixels centered on the cost is expressed as , for example, the following equation .
initial vector). Then , matching costs of the vector candidates SAD = XXabs( Temp_Cur [ x ] [ y ]-Temp_L0 [x ][ y ]) (Equation FRUC - 7 )
near the initial vector of the subblock are derived , and the
minimum vector is derived as the motion vector of the Here, EE is the sum of x and y, Temp_LO [ ] [ ] is a template
subblock . of the target block illustrated in FIG . 38 ( b) , and is a region
[ 0460 ] Then , after processing is completed for all of the adjacent to the upper side and the left side of Block_A ,
subblocks included in the target block, the pattern match where (xPoso, yPoso ) indicated by (Equation FRUC - 3 ) is
vector derivation process of the bilateral matching ends . the upper left coordinate . (mvo [ 0] , mv0 [ 1 ] ) in (Equation
[ 0461 ] Next , a pattern matching vector derivation process FRUC - 3 ) is replaced by the initial vector candidate . Then ,
of the template matching will be described with reference to the vector with the minimum matching cost is set again to
FIG . 39 ( b ) . Among the steps illustrated in FIG . 39 ( b ) , S3211 (mv0 [ 0 ] , mv0 [ 1 ] ) . Note that, in a case that only the upper
to S3205 are a block search performed at the block level. side or the left side region of the target block is set to the
S3214 to S3207 are a subblock search performed at a template in the S3212 , Temp_LO [ ] [ ] is the same shape .
subblock level . [ 0475 ] The processing of S3203 and S3204 is the same
[ 0462 ] First , at S3211 , the matching predictor 30373 deter processing as S3203 and S3204 in FIG . 39 (a ) . Note that in
mines whether or not a template Temp_Cur of the target processing 5 of S3204 in FIG . 39 ( b) , in a case that the
block ( both the upper adjacent region and the left adjacent template matching is turned off, TM_flag is set to 0 .
region of the target block) is present in the rectangular slice . [ 0476 ] In S3205 , the matching predictor 30373 performs
In a case of being determined as present ( Y at S3211 ) , as local search of the block level in the target block . In a local
illustrated in FIG . 38 ( c) , Temp_Cur is set with the upper search , local regions with the center of the initial vector
adjacent region and the left adjacent region of the target derived from S3202 or S3204 ( for example, regions of + D
block to obtain a template for the target block ( S3213 ) . pixels centered on the initial vector) are further searched ,
US 2021/0136407 A1 May 6 , 2021
31

and a vector having a minimum matching cost is searched to pixels outside of a collocated rectangular slice in the search
set as the motion vector of the final target block . process of the motion vector. For example, the search range
[ 0477 ] Next, the following process is performed for each D of the bilateral matching process and the template match
subblock included in the target block ( S3214 to S3207 ) . ing process may be configured in accordance with the
[ 0478 ] In S3214 , the matching predictor 30373 acquires a position and the size of the target block , or the position and
template of a subblock in the target block , as illustrated in the size of the target subblock .
FIG . 38 ( d ). In a case that only the upper side or the left side [ 0487] Specifically, the matching predictor 30373 derives
region of the target block is set to the template at S3212 , the the search range D1x in the left direction of the target block
template of the subblock is the same shape at S3214 as well . illustrated in FIG . 40 , the search range D2x in the right
[ 0479 ] At S3206 , the matching predictor 30373 derives an direction of the target block , the search range Dly in the
initial vector of a subblock in the target block ( initial vector upward direction of the target block , and the search range
search ). The initial vector candidate of the subblock is a D2y in the downward direction of the target block, as the
motion vector of the block level derived at S3205 , a motion range for referring to only pixels inside of a collocated
vector of an adjacent block in the spatial -temporal direction rectangular slice , by the following.
of the subblock , an ATMVP or STMVP vector of the D1x = xPosX +mvX [ 0 ] -xRSS
subblock , and the like. Among these candidate vectors , a
vector that minimizes the matching cost is set as the initial D2x = xRSS +WRS- (xPosX +mvX [O ] + W )
vector of the subblock . Note that the vector candidates used
for the initial vector search of the subblock are not limited Dly = yPosX +mvX [ 1 ] -yRSS
to the vectors described above .
[ 0480 ] Next , at S3207 , the matching predictor 30373 D2y = yRSs + hRS- (yPosX +mvX [ 1 ] + H ) ( Equation FRUC - 11 )
performs a step search ( local search ) centered on the initial [ 0488 ] The matching predictor 30373 configures the mini
vector of the subblock selected at S3206 . The matching mum value of D1x , D2x , Dly , and D2y determined by
predictor 30373 derives a matching cost of a vector candi (Equation FRUC - 11) and default search range Ddef as the
date of a local region centered on the initial vector of the search range D of the target block .
subblock ( for example, within a search range centered on the D =min ( Dx1,Dx2 ,Dy1 ,Dy2 ,Ddef) ( Equation FRUC - 12)
initial vector ( a region of + D pixels ) ) , and derives the
smallest vector as the motion vector of the subblock . Here , [ 0489] The following derivation method may be used . The
in a case that the vector candidate matches (or is outside of) matching predictor 30373 derives the search range Dix in
the search range centered on the initial vector, the matching the left direction of the target block illustrated in FIG . 40 , the
predictor 30373 does not search for the vector candidate . search range D2x in the right direction of the target block ,
[ 0481 ] Then , in a case that processing is complete for all the search range Dly in the upward direction of the target
of the subblocks included in the target block , the pattern block , and the search range D2y in the downward direction
match vector derivation process of the template matching of the target block , as the range for referring to only pixels
ends. inside of a collocated rectangular slice , by the following.
[ 0482 ] Although the above reference picture is Refo , the D1x = clip3 (0 ,Ddef,xPosX +mvXTO ] -xRSs)
template matching can be performed by the same process as
described above even in a case that the reference picture is D2x = clip3 (0 ,Ddef XRSS +WRS-(xPosX +mvX [O ] + W ))
Refl. Furthermore , in a case that there are two reference
pictures, the motion compensation unit 3091 performs a Dly = clip3 ( 0 , Ddef ,yPosX +mvX [ 1 ] -yRSs )
bi - prediction process by using two derived motion vectors .
[ 0483 ] The output fruc_merge_idx to the motion compen D2y = clip3 (0 ,Ddef,yRSa + hRS- (yPosX +mvX [1 ] Eluation FRUC - 11b )
sation unit 3091 is derived by the following equation . [ 0490 ] The matching predictor 30373 configures the mini
fruc_merge_idx = fruc_merge_idx & BM_flag & ( TM_ mum value of Dix , D2x , Dly, and D2y determined by
flag << 1 ) (Equation FRUC - 8 ) ( Equation FRUC - 11 b ) as range D of the target block .
[ 0484 ] Note that, in a case that fruc_mergc_idx is sig D = min ( Dx1 ,Dx2 ,Dy1 ,Dy2) ( Equation FRUC - 12 b )
nalled by the rectangular slice decoder 2002 , BM_flag and
TM_flag may be derived before the pattern match vector [ 0491 ] Note that, in a case that a configuration in which
derivation processing , and a matching process with the value the rectangular slice boundary is performed padding with a
of the flag being true may only be performed . fixed value , and the width and the height of the padding are
BM_flag = fruc_merge_idx & 1 xPad and yPad , the following equation may be used instead
of (Equation FRUC - 11 ) and (Equation FRUC - 11b ).
TM_flag = ( fruc_merge_idx & 10 ) >> 1 ( Equation FRUC - 9 ) Dlx = xPosX +mvX [ O ] - (xRSs - xPad )
[ 0485 ] Note that in a case that the template is located D2x = xRSs + wRS + xPad-( xPosX +mvXTO ] + W )
outside of the rectangular slice , so the template matching is
turned off, there is two options of fruc_merge_idx = 0 (no Dly = yPosX +mvX [ 1 ] - (YRSs - yPad )
matching ) or fruc_merge_idx = 1 ( bilateral matching), and
fruc_merge_idx can be expressed as 1 bit . D2y =yRSs + hRS+ yPad- (y PosX +mvX [ 1 ]+ H ) ( Equation FRUC -13)
Rectangular Slice Boundary Search Range [ 0492 ] Alternatively, the following equation may be used .
D1x = clip3 (0 ,Ddef,xPosX +mvX [ 0 ] - (xRSs - xPad ))
[ 0486 ] In a case of performing independent coding or
decoding of a rectangular slice ( rectangular_slice_flag is 1 ) , D2x = clip3 (0 ,Ddef,xRSS + WRS + xPad- (xPosX +mvX
the search range D may be configured so as not to refer to [ 0 ] + W ))
US 2021/0136407 A1 May 6 , 2021
32

Dly = clip3 ( 0 ,Ddef ,yPosX +mvX [ 1 ] - (yRSs - yPad )) directions are processed for a certain subblock , and then the
D2y = clip3( 0 ,Ddef,yRSs + hRS + yPad- (yPosX +mvX
process is transferred to processing of the next subblock . In
[1 ] + H ) (Equation FRUC - 13b ) FIG . 42 (a ) , for the direction of the adjacent block relative to
the target subblock , i = I is the upper side , i =2 is the left side ,
[ 0493 ] In the matching process , even in a case that the i =3 is the lower side , and i =4 is the right side .
template is outside of the rectangular slice , or the motion [ 0499 ] First , the OBMC predictor 30374 checks the need
vector points to outside of the rectangular slice , by deriving for the OBMC processing and the presence or absence of an
a motion vector and generating a prediction image as adjacent block ( S3401 ) . In a case that the prediction unit is
described above, the reference pixel can be replaced by a block unit, and the target subblock does not share the
using a pixel value within the rectangular slice . Therefore, a boundary with the block boundary in the direction indicated
reduction in the frequency of use of the matching processing by i , there is no adjacent block required for the OBMC
can be suppressed , and the rectangular slices can be inde processing (N in S3401 ) , so the process proceeds to S3404 ,
pendently performed an inter prediction, so that the coding and the flag obmc_flag [ i ] is set to 0. Otherwise (in a case
efficiency can be increased . that the prediction unit is a block unit and the target subblock
shares the boundary with the block boundary, or in a case
OBMC Processing that the processing unit is a subblock) , there is an adjacent
[ 0494 ] The motion compensation unit 3091 according to block required for the OBMC processing (Y at S3401 ) , and
the present embodiment may generate a prediction image by the process proceeds to S3402 .
using an OBMC processing . Here, the Overlapped block [ 0500 ] For example, the subblock SCU1 [ 3 ] [ 0] in FIG .
motion compensation (OBMC ) processing will be 41 ( a ) does not share the boundary with the block boundary
described . The OBMC processing is a processing to generate on the left side , the lower side , and the right side , so
an interpolation image ( a motion compensation image ) of a obmc_flag [ 2 ] = 0 , obmc_flag [ 3 ] =0 , and obmc_flag [ 4 ] =0 are
target block by using an interpolation image Pred of the set. The subblock SCU2 [ 0 ] [ 2 ] does not share the boundary
target subblock generated by using an inter prediction with the block boundary on the upper side , the lower side ,
parameter ( hereinafter, a motion parameter) of the target and the right side , so obmc_flag [ 1 ] =0 , obmc_flag [ 3 ] =0 , and
block, and an interpolation image PredRN of the target block obmc_flag [ 4 ] =0 are set . A white subblock is a subblock that
generated by using a motion parameter of an adjacent block does not share the boundary with the block boundary at all ,
of the target subblock . In pixels (boundary pixels ) in the so obmc_flag [ 1 ] = obmc_flag [ 2 ] = obmc_flag [ 3 ] = obmc_flag
[ 4 ] =0 is set.
target block where the distance to the block boundary is [ 0501 ] Next, the OBMC predictor 30374 checks whether
close , processing to correct an interpolation image of the an adjacent block in the direction indicated by i is an intra
target block is performed in units of subblocks by an prediction block , or a block outside of the rectangular slice ,
interpolation image PredRN based on a motion parameter of as the availability of the adjacent block ( S3402 ) . In a case
an adjacent block . that the adjacent block is an intra prediction block or a block
[ 0495 ] FIG . 41 is a diagram illustrating an example of a outside of the rectangular slice (Y in S3402 ) , the process
region for generating a prediction image by using a motion proceeds to S3404 , and obmc_flag [ i ] in the corresponding
parameter of an adjacent block according to the present direction i is set to 0. Otherwise (in a case that the adjacent
embodiment. In a prediction in units of blocks , since the block is an inter prediction block and a block is inside of the
motion parameters in the block are the same, the pixels of rectangular slice ) ( N at S3402 ) , the process proceeds to
the subblocks with diagonal lines that are within a pre S3403 .
scribed distance from the block boundary are subject to [ 0502 ] For example , in the case of FIG . 41 (c ) , with respect
OBMC processing applications as illustrated in FIG . 41 ( a ) . to the target subblock SCU3 [ O ] [ O ] of the target block CU3
In a prediction in units of subblocks , since the motion in the rectangular slice , since the adjacent block on the left
parameter is different for each subblock , the pixels of each side is outside of the rectangular slice , obmc_flag [ 2 ] of the
of the subblocks are subject to OBMC processing applica target subblock SCU3 [ 0 ] [ O] is set to 0. With respect to the
tions , as illustrated in FIG . 41 ( b ) . target subblock SCU4 [ 3 ] [ 0 ] of the target block CU4 in the
[ 0496 ] Note that the shapes of the target block and an rectangular slice , obmc_flag [ 1 ] of the target subblock SCU4
adjacent block are not necessarily the same , so that the [ 3 ] [ 0] is set to O since the adjacent block on the upper side
OBMC processing is preferably performed on a subblock is an intra prediction .
unit into which blocks are partitioned . The size of the [ 0503 ] Next, the OBMC predictor 30374 checks whether
subblocks can vary from 4x4 to 8x8 block sizes . or not motion parameters of the adjacent block in the
Flow of OBMC Processing direction indicated by i and the target subblock are the same
as the availability of the adjacent block ( S3403 ) . In a case
[ 0497] FIG . 42 ( a ) is a flowchart illustrating a parameter that the motion parameters are the same ( Y at S3403 ) , the
derivation processing performed by the OBMC predictor process proceeds to S3404 and obmc_flag [ i ] =0 is set.
30374 according to the present embodiment. Otherwise ( in a case that the motion parameters are differ
[ 0498 ] The OBMC predictor 30374 determines whether or ent) ( N at S3403 ) , the process proceeds to S3405 .
not an adjacent block adjacent in each direction of the upper [ 0504 ] Whether or not the motion parameters of the sub
side , the left side , the lower side , and the right side is present block and the adjacent block are the same is determined by
or absent or available with respect to the target subblock . In
the following equation.
FIG . 42 , a method is illustrated in which all of the subblocks ((mvLX [ 0 ] ! = mvLXRN [ 0 ])||(mvLX [ 1 ] ! =mvLXRN [ 1 ])
are processed for each direction of the upper, left, lower, and || (refIdxLX ! = refldxLXRN ) ) ? (Equation OBMC - 1 )
right, and then the process is transferred to processing in the [ 0505 ] Here , the motion vector of the target subblock in
next direction, but , a method can be taken in which all the the rectangular slice is (mvLX [ 0 ] , mvLX [ 1 ] ) , the reference
US 2021/0136407 A1 May 6 , 2021
33

picture index is refldxLX, the motion vector of the adjacent [ 0514 ] [Processing 3C ] Rectangular Slice Boundary
block in the direction indicated by i is (mvLXRN [ 0] , Motion Vector Replacement ( Alternative Motion Vector
mvLXRN [ 1 ] ) , and the reference picture index is refldxRN . Replacement)
[ 0506 ] For example, in FIG . 41 (c ) , in a case that the [ 0515 ] A motion vector is copied from an adjacent sub
motion vector of the target subblock SCU4 [ 0 ] [ 0 ] is (mvLX block with a motion vector pointing inside of a collocated
[ 0] , mvLX [ 1 ] ) , the reference picture index is refldxLX , the rectangular slice .
motion vector of the left side adjacent block is (mvLXR2 [ 0516 ] [ Processing 3D ] Rectangular Slice Boundary
[ 0] , mvLXR2 [ 1 ] ) , the reference picture index is OBMC Off
refldxLXR2, in a case that the motion vector and the [ 0517] In a case that referring to outside of the collocated
reference picture index are the same, for example, in case rectangular slice is determined with reference to the refer
that ( (mvLX [ 0 ] ==mvLXRN [0 ] ) && (mvLX ence image with the motion vector (MvLXRN [ 0] ,
[ 1 ] = - mvLXRN [ 1 ] ) && ( refldxLX = refldxLXRN ) ) is true , MvLXRN [ 1 ] ) of the adjacent block in the direction i ,
obmc_flag [ 2 ] =0 of the target subblock is set. obmc_flag [ i ] =0 is set ( the OBMC processing is not per
[ 0507] Note that the motion vector and the reference formed in the direction i ) . In this case , S3407 is skipped and
picture index are used in the above equation, but the motion proceeded .
vector and the POC may be used as the following equation. [ 0518 ] Note that the processing 3 requires the slice coder
( (mvLX [ 0 ]! =mvLXRN [ 0 ]) ||(mvLX [ 1 ] ! =mvLXRN [ 1 ]) 2012 and the slice decoder 2002 to select the same process .
l (refPOC ! = refPOCRN ) ) ? ( Equation OBMC - 2 ) [ 0519 ] The OBMC predictor 30374 sets obmc_flag [ i ] = 1
in a case that the motion vector of the adjacent block
Here, refPOC is the POC of the target subblock and indicates inside of the rectangular slice or in a case that the
refPOCRN is the POC of the adjacent block . processing 3 is performed ( S3407) .
[ 0508 ] Next , the OBMC predictor 30374 determines [ 0520 ] Next, the OBMC predictor 30374 performs the
whether or not all regions pointed by the motion vectors of processes of S3401 to S3407 described above in all direc
the adjacent blocks are inside of the rectangular slice (in the tions ( i = 1 to 4 ) of the subblocks, and the process is termi
reference picture, some or all of the blocks at the positions nated .
in which the collocated block is shifted by mvN ( N = 0 ... [ 0521 ] The OBMC predictor 30374 outputs the derived
4 ) are not inside of the collocated rectangular slice ) ( S3405 ) . prediction parameter described above ( obmc_flag and the
In a case that all the regions pointed by the motion vectors motion parameters of the adjacent blocks of each of the
are inside of the rectangular slice ( Y in S3405 ) , the process subblocks) to the inter prediction image generation unit 309 ,
proceeds to S3407 . Otherwise (in a case that regions pointed and the inter prediction image generation unit 309 refers to
by the motion vectors are outside of the rectangular slice obmc_flag to determine whether or not the OBMC process
even partially ) ( N at S3405 ) , the process proceeds to S3406 . ing is necessary, and performs the OBMC processing to the
[ 0509 ] In a case that a motion vector of an adjacent block target block (described in detail in Motion Compensation ).
points to outside of the rectangular slice , any of the follow [ 0522 ] Note that in the slice decoder 2002 , obmc_flag (i )
ing processes 3 are applied ( S3406 ) . is set in a case that obmc_flag is signalled from the slice
[ 0510 ] [Processing 3A] Rectangular Slice Boundary Pad coder 2012 , and the above processing may be performed
ding only in the case of obmc_flag [ i ] = 1 .
[ 0511 ] The rectangular slice boundary padding is per
formed by the motion compensation unit 3091. Rectangular BTM
slice boundary padding ( rectangular slice outside padding ) is [ 0523 ] The BTM predictor 3038 derives a high accuracy
achieved by clipping the reference positions at the positions motion vector by performing the bilateral template matching
of the upper, lower, left, and right bounding pixels of the (BTM) processing by setting a prediction image generated
rectangular slice , as previously described . For example, in a by using bi - directional motion vectors derived by the merge
case that the upper left coordinate of the target subblock prediction parameter derivation unit 3036 as a template .
relative to the upper left coordinate of the picture is ( xs, ys ) ,
the width and the height of the target subblock are BW and Example of Motion Vector Derivation Process
BH , the upper left coordinate of the target rectangular slice
in which the target subblock is located is (xRSs, yRSs ) , the [ 0524 ] In a case that two motion vectors derived in the
width and the height of the target rectangular slice are wRS merge mode are opposite relative to the target block , the
and hRS , and the motion vector of the adjacent block is BTM predictor 3038 performs the bilateral template match
(MvLXRN [ 0 ] , MvLXRN [ 1 ] ) , the reference pixel (xRef, ing (BTM) process .
yRef ) of the subblock is derived with the following equation. [ 0525 ] The bilateral template matching (BTM ) process
xRef + i = Clip3 (xRSs, xRSs + wRS - BW ,xs + (MvLXRN [ 0 ]
>>log 2 ( M ) ))
will be described with reference to FIG . 43. FIG . 43 (a ) is a
diagram illustrating a relationship between a reference pic
y Ref+ j + Clip3 (yRSs, yRSs+hRS- BH ,yS + (MvLXRN [1 ] ture and a template in a BTM prediction, ( b ) is a diagram
>>log 2 ( M )) ( Equation OBMC - 3 ) illustrating the flow of the processing, and ( c ) is a diagram
illustrating a template in a BTM prediction .
[ 0512 ] [Processing 3B ] Rectangular Slice Boundary [ 0526 ] As illustrated in FIGS . 43 ( a ) and ( c) , the BTM
Motion Vector Limitation predictor 3038 first generates a prediction block of the target
[ 0513 ] The motion vector MvLXRN of the adjacent block block Cur_block from multiple motion vectors ( for example
is clipped so as not to refer to outside of the rectangular slice mvLO and mvL1) derived by the merge prediction parameter
in a manner such as , for example , (Equation CLIP1 ) to derivation unit 3036 , and set this as a template. Specifically,
(Equation CLIP5) described above . the BTM predictor 3038 first generates a prediction block
US 2021/0136407 A1 May 6 , 2021
34

Cur_Temp from a motion compensation image predLO gen processed PU stored in the prediction parameter memory
erated by mvLO and a motion compensation image predL1 307 , based on the reference picture index refldx, and stores
generated by myL1. the prediction vector candidate in the prediction vector
Cur_Temp[ x ][ y ] = Clip3 (0 , ( 1 << bitDepth ) –1 , (predLO [x ] candidate list mvpListLX [ ] of the vector candidate storage
[ y ] + predL1 [x ][ y ]+1 ) >> 1) ( Equation BTM - 1 ) unit 3036 .
[ 0527] Next , the BTM predictor 3038 configures motion [ 0534 ] The vector candidate selection unit 3034 selects the
vector candidates in a range of : D pixels with mvLO and motion vector mvpListLX [mvp_1X_idx ] indicated by the
mvL1 each as the center ( initial vector ), and derives the prediction vector index mvp_1X_idx among the prediction
matching costs of the motion compensation images PredLO vector candidates of the prediction vector candidate list
and PredL1 generated by each of the motion vector candi mvpListLX [ ] as the prediction vector mvpLX . The vector
dates and the template . Then , the vectors mvLO' and mvL1', candidate selection unit 3034 outputs the selected prediction
which minimizes the matching cost , are set as the updated vector mvpLX to the addition unit 3035 .
motion vector of the target block . However, the search range [ 0535 ] Note that the prediction vector candidate is derived
is limited to inside of the collocated rectangular slices on the by scaling a motion vector of a PU for which decoding
reference pictures Refo and Refl. processing is completed, the PU ( for example, an adjacent
[ 0528] Next, a flow of the BTM prediction will be PU) in a predetermined range from the decoding target PU .
described with reference to FIG . 43 ( b ) . First , the BTM Note that the adjacent PU includes a PU spatially adjacent
predictor 3038 acquires a template ( S3501 ) . As described to the decoding target PU , such as , for example , a left PU
above , the template is generated from the motion vectors and an upper PU , and a region that is temporally adjacent to
( for example mvLO and mvL1 ) derived by the merge pre the decoding target PU , for example, a region that is
diction parameter derivation unit 3036. Next, the BTM obtained from a prediction parameter of a PU with the same
predictor 3038 performs local search in the collocated position as the decoding target PU but with a different
rectangular slice . The local search may be performed by display time . Note that, as described in the derivation of a
repeating a search of multiple different accuracies such as temporal merge candidate, by changing the lower right block
S3502 to S3505 . For example, the local search is performed position of the collocated block to the lower right position in
in the order of M pixel accuracy search LO processing the rectangular slice illustrated in FIG . 20 ( ), in the case of
( S3502 ) , N pixel accuracy search LO processing ( S3503 ) , M rectangular_slice_flag = 1, a rectangular slice sequence can
pixel accuracy search L1 processing ( S3504 ) , and N pixel be decoded independently by using an AMVP prediction
accuracy search L1 processing ( S3505 ) . Here, M > N , for without decreasing the coding efficiency.
example , M= 1 pixel accuracy and N = 1 / 2 pixel accuracy can [ 0536 ] The addition unit 3035 calculates the motion vector
be set.
[ 0529 ] The M pixel accuracy LX search processing ( X = 0 mvLX by adding the prediction vector mvpLX input from
1 ) performs a search centered on the coordinate indicated the AMVP prediction parameter derivation unit 3032 and the
by mvLX in the rectangular slice . The N pixel accuracy difference vector mvdLX input from the inter prediction
search LX processing performs, in the rectangular slice , a parameter decoding control unit 3031. The addition unit
search centered on coordinate with the minimal matching 3035 outputs the calculated motion vector mvLX to the
cost in the M pixel accuracy search LX processing. prediction image generation unit 308 and the prediction
[ 0530 ] Note that the rectangular slice boundary may parameter memory 307 .
extended by padding in advance . In this case , the motion [ 0537] Note that the motion vector derived in the merge
compensation unit 3091 also performs a padding process . prediction parameter derivation unit 3036 may not be output
[ 0531 ] In a case that rectangular_slice_flag is 1 , the search to the inter prediction image generation unit 309 as is , but
range D may be adaptively modified as illustrated in (Equa may be output via the BTM predictor 3038 .
tion FRUC - 11) to ( Equation FRUC -13 ) to avoid reference to
pixels outside of the collocated rectangular slice in the LIC Predictor 3039
motion vector search process so that each rectangular slice
may be decoded independently. In the BTM processing, [ 0538 ] A Local Illumination Compensation ( LIC ) predic
(mvX [ 0 ] , mvX [ 1 ] ) of ( FRUC - 11) and (FRUC - 13 ) is tion is a processing for linearly predicting a pixel value of a
replaced by (mvLX [ 0 ] , mvLX [ 1 ] ) . target block Cur_block from pixel values of an adjacent
[ 0532 ] By modifying the motion vector derived in the region Ref_Temp (FIG . 45 (a ) ) of a region on a reference
merge mode in this way , the prediction image can be picture pointed by a motion vector derived by a merge
improved . Then , by limiting the modified motion vector prediction, a subblock prediction, an AMVP prediction, or
inside of the rectangular slice , the coding efficiency can be the like . and an adjacent region Cur_Temp (FIG . 45 (b ) ) of
increased since the rectangular slices can be independently the target block . As described in the equation below, a
performed inter predictions while suppressing a reduction in combination of a scale coefficient a and a offset b is
the frequency of use of the bilateral template matching calculated in which the square error SSD is minimized
processing. between the prediction value Cur_Temp' of the adjacent
[ 0533 ] FIG . 44 is a schematic diagram illustrating a con region of the target block determined from the adjacent
figuration of the AMVP prediction parameter derivation unit region Ref_Temp of the region on the reference picture, and
3032 according to the present embodiment. The AMVP the adjacent region Cur_Temp of the target block .
prediction parameter derivation unit 3032 includes a vector Cur_Temp '[ ] [ ] = a * Ref_Temp [ ] + b
candidate derivation unit 3033 , a vector candidate selection
unit 3034 , and vector candidate storage unit 3036. The SSD = 2X ( Cur_Temp ' [x ][ y ]-Cur_Temp[x ][ y] ) ^ 2 ( Equation LIC - 1 )
vector candidate derivation unit 3033 derives a prediction
vector candidate from a motion vector mvLX of an already [ 0539 ] Here, EE is the sum of x and y.
US 2021/0136407 A1 May 6 , 2021
36

collocated rectangular slice of the rectangular slice in which obmc_flag [ i ] = 1 (Y in S3413 ) , an interpolation image Pre
the target block is located even partially, the rectangular dRN [x ] [ y ] is generated ( S3414 ) . In other words, only for
slice can be independently performed an inter prediction by the subblocks in the direction indicated by i being obmc_flag
padding the rectangular slice boundary in advance . [ i ] = 1 , an interpolation image PredRN [ x ] [y ] (x =0 .
Padding
BW - 1 , y=0 ... BH - 1 ) is generated ( S3414 ) based on the
prediction list utilization flag predFlagLX [xPbN] [yPbN] of
[ 0557] In the above (Equation INTER - 2 ) , reference is the adjacent block input from the inter prediction parameter
made to the pixel refImg [ xInt + k - NTAP / 2 + 1] [ yInt] on the decoder 303 , the reference picture index refldxLX [xPbN]
reference picture, but in a case of referring to a pixel value [yPbN] , and the motion vector mvLX [xPbN] [yPbN ], and
outside of the picture that does not actually exist , the a weighted average processing of the interpolation image
following picture boundary padding ( offpicture padding) is PredC [ x ] [y ] and the interpolation image PredRN [x ] [y ]
performed . The picture boundary padding is achieved by described below is performed ( S3415 ) , to generate an inter
using a pixel value refImg [xRef+ i] [ yRef + j] at a following polation image PredLX ( S3416 ) . Note that (xPbN , yPbN ) is
position xRef + i, yRef + j, as a pixel value at a position of a the upper left coordinate of the adjacent block .
reference pixel ( xIntL + i, yIntL + j ). [ 0564 ] The weighted average processing is then per
xRef + i = Clip3 (0 ,pic_width_in_luma_samples - 1 , formed ( S3415 ) .
xIntL + i) [ 0565 ] In the configuration of performing the OBMC
processing, the motion compensation unit 3091 performs a
yRef + j = Clip3 ( 0 ,pic_height_in_luma_samples - 1,
yIntL + j) (Equation PAD - 3 )
weighted average processing on the interpolation image
PredC [ x] [y ] and the interpolation image PredRN [ x] [y ] to
[ 0558 ] Note that rectangular slice boundary padding update the interpolation image PredC [ x ] [ y ] . Specifically, in
( Equation PAD - 1) may be performed instead of the picture a case of the OBMC flag obmc_flag [ i ] = 1 ( the OBMC
boundary padding ( Equation PAD - 3 ). processing is effective ) input from the inter prediction
parameter decoder 303 , the motion compensation unit 3091
OBMC Interpolation Image Generation performs the following weighted average processing on S
[ 0559 ] In OBMC , two types of interpolation images are pixels of the subblock boundary in the direction indicated by
i.
generated, including an interpolation image of a target
subblock derived based on an inter prediction parameter of PredC [x ][y ] = ((w1* PredC [x ][y ]+ w2* PredRN [x ][y ]) + 0 )
the target block , and an interpolation image derived based on >> shift ( Equation INTER - 4 )
an inter prediction parameter of an adjacent block , and an [ 0566 ] Here , weights wl and w2 in the weighted average
interpolation image that is used for prediction is ultimately processing will be described . the weights wl and w2 in the
generated by performing weighting processing on these. weighted average processing are determined according to
Here , an interpolation image of a target subblock derived the distance ( number of pixels ) of the target pixel from the
based on an inter prediction parameter of the target block is subblock boundary. They have a relationship of wl +w2 =
referred to as an interpolation image PredC ( a first OBMC ( 1<<shift ), o = 1 << ( shift - 1 ).
interpolation image ), and an interpolation image derived
based on an inter prediction parameter of an adjacent block [ 0567] In the OBMC processing , a prediction image is
is referred to as an interpolation image PredRN (a second generated by using interpolation images of multiple adjacent
OBMC interpolation image ) . Note that N indicates either of blocks . Here , a method for updating PredC [ x ] [y ] from
the upper side (A ) , the left side ( L ) , the lower side ( B ) , and motion parameters of multiple adjacent blocks will be
the right side (R) of the target subblock . In a case that the described .
OBMC processing is not performed (OBMC off ), the inter [ 0568 ] First , in a case of obmc_flag [ 1 ] = 1 , the motion
polation image PredC becomes a motion compensation compensation unit 3091 updates Pred [ x ] [ y ] by applying
image PredLX of the target subblock as is . In a case that the an interpolation image PredRA [x ] [y ] created by using the
OBMC processing is performed (OBMC on) , a motion motion parameter of the upper side adjacent block to the
compensation image PredLX of the target subblock is gen interpolation image PredC [ x ] [ y ] of the target subblock .
erated from the interpolation image PredC and the interpo PredC>>[xshift
][y ]= ((w1 * PredC [x ][y ]+w2 * PredRA [x ][y])+ 0 )
lation image PredRN . ( Equation INTER - 5 )
[ 0560 ] The motion compensation unit 3091 generates an
interpolation image , based on an inter prediction parameter Next, the motion compensation unit 3091 updates PredC [ x ]
of the target subblock input from the inter prediction param [y ] sequentially by using the interpolation images PredRL
eter decoder 303 ( the prediction list utilization flag pred [x ] [ y] , PredRL [x ] [ y] , and PredRL [ x ] [y ] created by using
FlagLX , the reference picture index refldxLX , the motion the motion parameters of the adjacent blocks on the left side
vector mvLX , and the OBMC flag obmc_flag ). (i =2 ) , the lower side ( i =3 ) , and the right side (i =4 ) of the
[ 0561 ] FIG . 42 ( b) is a flowchart describing the operations target subblock for the direction i where obmc_flag [ i ] = 1 .
of the interpolation image generation in the OBMC predic That is , the updates are made by the following equation.
tion of the motion compensation unit 3091 . PredC [x ][y ] =((w1* PredC [x ][y ]+ w2* PredRL[x ][y ]) + 0 )
[ 0562 ] First , the motion compensation unit 3091 generates >> shift
an interpolation image PredC [ x ] [ y ] ( x = 0 ... BW - 1 , y = 0
... BH - 1 ), based on a prediction parameter (S3411 ). PredC [x ][y ] = ((w1* PredC [x ][y] +w2* PredRB [x ][y ])+ 0 )
>>shift
[ 0563 ] Next , it is determined whether or not obmc_flag
[ i ] = 1 ( S3413 ) . In a case of obmc_flag [ i ] =0 (N in S3413 ) , the PredC [x ][y ] = ((w1* PredC [x ][y ]+w2* PredRR [x ][y ])+ 0 )
process proceeds in the next direction (i =i + 1 ) . In a case of >> shift (Equation INTER - 6 )
US 2021/0136407 A1 May 6 , 2021
37

[ 0569 ] In a case of obmc_flag [ 0 ] =0 , or after performing Pred [ x ] [y ] = Clip3 ( 0 , ( 1 << bitDepth )-1 , ( (PredLX [x ][ y ]
* w0 + 2 ( log 2 WD - 1 ) ) >> log 2 WD ) +00 ) (Equation INTER- 11 )
the above - described process for i = 1 to 4 , PredC [ x] [y ] is set
to the prediction image PredLX [ x ] [ y ] ( S3416 ) . [ 0578 ] Here , log 2WD is a variable indicating a prescribed
PredLX [ x ] [ y ] = PredC [ x ] [ y] ( Equation INTER - 7 ) shift amount.
[ 0570 ] As described above , the motion compensation unit ( 0579 ] Furthermore, in the case of a bi-prediction BiPred ,
3091 can generate a prediction image in consideration of a and in a case that a weight prediction is performed , the
weight predictor 3094 derives weighting prediction coeffi
motion parameter of an adjacent block of a target subblock , cients wo , w1 , 00 , and o1 from the coded data , and performs
and thus can generate a prediction image with high predic the processing according to the following equation.
tion accuracy in the OBMC processing . Pred [ x ][ y ] = Clip3 (0 , ( 1 << bitDepth ) -1 , (PredL0 [ x ][ y ]
[ 0571 ] The number of pixels S of the subblock boundary * w0 + PredL1 [x ] [ y ] *wl + ((00 + 01 + 1 )<< log 2WD ))
updated by the OBMC processing may be arbitrary ( S =2 to >> ( log 2WD + 1 )) ( INTER- 12 )
block size) . The manner of partitioning of a block including [ 0580 ] With such a configuration , the video decoding
a subblock to be subjected to the OBMC processing may apparatus 31 can independently decode a rectangular slice in
also be in any manner of partitioning such as 2NxN , NX2N , rectangular slice sequence units in a case that the value of
NxN , and the like . rectangular_slice_flag is 1. As a mechanism is introduced to
[ 0572 ] By deriving a motion vector of OBMC and gener ensure the independence of decoding of each rectangular
ating a prediction image in this manner, even in a case that slice for each individual tool , each rectangular slice can be
the motion vector of the subblock points to outside of the independently decoded in the video while minimizing a
rectangular slice , a reference pixel is replaced with a pixel decrease in the coding efficiency. As a result, the egion
value in the rectangular slice . Accordingly, a reduction in the required for display or the like can be selected and decoded ,
frequency of use of the OBMC processing can be sup so that the amount of processing can be greatly reduced .
pressed, and the rectangular slices can be independently
performed an inter prediction, so the coding efficiency can Configuration of Video Coding Apparatus
be increased .
[ 0581 ] FIG . 15 ( b) illustrates the video coding apparatus 11
LIC Interpolation Image Generation of the present invention . The video coding apparatus 11
[ 0573 ] In LIC , a prediction image PredLX is generated by includes a picture partitioning processing unit 2010 , a
using a scale coefficient a and an offset b calculated by the header information generation unit 2011 , slice coders 2012a
LIC predictor 3039 to modify the interpolation image Pred to 2012n , and a coding stream generation unit 2013. FIG .
of the target block derived in ( Equation INTER - 3 ) . 16 ( a ) is a flowchart of the video coding apparatus.
( Equation INTER - 8)
[ 0582 ] In a case that a slice is a rectangular slice (Y at
PredLX [x ] [ y ] = Pred [ x ] [ y ] * a + b S1601 ) , the picture partitioning processing unit 2010 parti
tions the picture into multiple rectangular slices that do not
Weight Prediction overlap each other, and transmits the rectangular slices to the
slice coders 2012a to 2012n . In a case that a slice is a general
[ 0574 ] The weight predictor 3094 generates a prediction slice , the picture partitioning processing unit 2010 partitions
image of a target block by multiplying the input motion the picture into any shape and transmits the slices to the slice
compensation image PredLX by a weighting coefficient. In coders 2012a to 2012n .
a case that one of the prediction list utilization flags (pred [ 0583 ] In the case that the slice is a rectangular slice (Y at
FlagLO or predFlagL1 ) is 1 (in the case of a uni-prediction ), S1601 ) , the header information generation unit 2011 gener
and in a case that an weight prediction is not used , a ates rectangular slice information ( Sliceld , and information
processing of the following equation is performed by which related to the number and size of regions of the rectangular
the input motion compensation image PredLX (LX is LO or slices) from the partitioned rectangular slices . The header
L1 ) is combined with the number of pixel bits bitDepth . information generation unit 2011 also determines a rectan
Pred [ x ][ y ] = Clip3 ( 0 , ( 1 << bitDepth ) -1 , ( PredLX [ x ] [ y ] + gular slice for inserting an I slice ( S1602 ) . The header
offset1 )>>shift1) (Equation INTER - 9 ) information generation unit 2011 transmits the rectangular
[ 0575 ] Here, shiftl = 14 - bitDepth , offset1 = 1 << (shift1-1). slice information and the information related to the I slice
In a case that both of the prediction list utilization flags insertion to the coding stream generation unit 2013 as the
(predFlagLO and predFlagL1) are 1 (in the case of a bi header information ( S1603 ) .
prediction BiPred ), and in a case that an weight prediction [ 0584 ] The slice coders 2012a to 2012n code each rect
is not used , a processing of the following equation is angular slice in a unit of rectangular slice sequence ( S1604 ) .
performed by which the input motion compensation images In this manner, by the slice coders 2012a to 2012n , coding
PredLO and PredL1 are averaged and combined to the processing can be performed in parallel on the rectangular
number of pixel bits . slices .
Pred [x ] [ y ] = Clip3( 0 , ( 1 << bitDepth ) -1,( PredL0 [ x ] [ y ] + [ 0585 ] Here, the slice coders 2012a to 2012n perform
PredL1[ x ] [ y ] + offset2 )>>shift2 ) ( Equation INTER- 10 ) coding processing on a rectangular slice sequence, similarly
to one independent video sequence, and do not refer to
[ 0576 ] Here, shift2 = 15 - bitDepth , offset2 = 1<< (shift2-1). prediction information of a rectangular slice sequence of a
[ 0577] Furthermore, in the case of a uni-prediction, and in different Sliceld temporally or spatially in a case of per
a case that a weight prediction is performed, the weight forming coding processing. That is , the slice coders 2012a
predictor 3094 derives a weighting prediction coefficient wo to 2012n do not refer to a different rectangular slice spatially
and an offset 00 from the coded data , and performs the or temporally in a case of coding a rectangular slice in a
processing according to the following equation. picture. In a case of a general slice , the slice coders 2012a
US 2021/0136407 A1 May 6 , 2021
38

to 2012n perform coding processing on each slice sequence, [ 0590 ] Note that the prediction image generation unit 101
while sharing information of the reference picture memory. is an operation same as the prediction image generation unit
[ 0586 ] The coding stream generation unit 2013 generates 308 already described , and thus descriptions thereof will be
omitted .
a coding stream Te in a unit of NAL unit, from the header [ 0591 ] The prediction image generation unit 101 generates
information including the rectangular slice information the prediction image P of the PU , based on a pixel value of
transmitted from the header information generation unit a reference block read from the reference picture memory ,
2011 and the coding stream Tes of the rectangular slices by using a parameter input from the prediction parameter
output by the slice coders 2012a to 2012n . In a case of a coder. The prediction image generated by the prediction
general slice , the coding stream generation unit 2013 gen image generation unit 101 is output to the subtraction unit
erates a coding stream Te in a unit of NAL unit from the 102 and the addition unit 106 .
header information and the unreasonable stream TeS . [ 0592 ] The intra prediction image generation unit ( not
[ 0587] In this way, the slice coders 2012a to 2012n can illustrated ) included in the prediction image generation unit
independently code each rectangular slice , so that coding 101 is an operation same as the intra prediction image
processing can be performed in parallel on multiple rectan generation unit 310 already described .
gular slices . [ 0593 ] The subtraction unit 102 subtracts a signal value of
the prediction image P of the PU input from the prediction
image generation unit 101 from a pixel value at a corre
Configuration of Slice Coder sponding PU position of the image T, and generates a
residual signal. The subtraction unit 102 outputs the gener
[ 0588 ] Next , a configuration of the slice coders 2012a to ated residual signal to the transform processing and quan
2012n will be described. As an example below, the configu tization unit 103 .
ration of the slice coder 2012a will be described with [ 0594 ] The transform processing and quantization unit
reference to FIG . 47. FIG . 47 is a block diagram illustrating 103 performs a frequency transform for the prediction
a configuration of 2012 , which is one of the slice coders residual signal input from the subtraction unit 102 , and
2012a to 2012n . FIG . 47 is a block diagram illustrating a calculates a transform coefficient. The transform processing
configuration of the slice coder 2012 according to the and quantization unit 103 quantizes the calculated transform
present embodiment. The slice coder 2012 includes a pre coefficients to calculate quantization transform coefficients.
diction image generation unit 101 , a subtraction unit 102 , a The transform processing and quantization unit 103 outputs
transform processing and quantization unit 103 , an entropy the calculated quantization transform coefficients to the
coder 104 , an inverse quantization and inverse transform entropy coder 104 and the inverse quantization and inverse
processing unit 105 , an addition unit 106 , a loop filter 107 , transform processing unit 105 .
a prediction parameter memory ( a prediction parameter [ 0595 ] To the entropy coder 104 , the quantization trans
storage unit, a frame memory ) 108 , a reference picture form coefficients are input from the transform processing
memory (a reference image storage unit , a frame memory ) and quantization unit 103 , and prediction parameters are
109 , a coding parameter determination unit 110 , and a input from the prediction parameter coder 111. For example,
prediction parameter coder 111. The prediction parameter the input prediction parameters include codes such as a
coder 111 includes an inter prediction parameter coder 112 reference picture index ref_idx_lX , a prediction vector index
and an intra prediction parameter coder 113. Note that the mvp_1X_idx, a difference vector mvdLX , a prediction mode
slice coder 2012 may have a configuration in which the loop pred_mode_flag , and a merge index merge_idx.
filter 107 is not included . [ 0596 ] The entropy coder 104 performs entropy coding on
[ 0589 ] For each picture of an image T, the prediction the input partitioning information , the prediction param
image generation unit 101 generates a prediction image Pof eters , the quantization transform coefficients, and the like to
a prediction unit PU for each coding unit CU , which is a generate the coding stream Tes , and outputs the generated
region where the picture is partitioned. Here , the prediction coding stream TeS to the outside .
image generation unit 101 reads a block that has been [ 0597] The inverse quantization and inverse transform
decoded from the reference picture memory 109 , based on processing unit 105 is the same as the inverse quantization
a prediction parameter input from the prediction parameter and inverse transform processing unit 311 ( FIG . 18 ) in the
coder 111. For example, in a case of an inter prediction, the rectangular slice decoder 2002 , and dequantizes the quan
prediction parameter input from the prediction parameter tization transform coefficients input from the transform
coder 111 is a motion vector. The prediction image genera processing and quantization unit 103 to calculate the trans
tion unit 101 reads a block at a position on a reference form coefficients. The inverse quantization and inverse
picture indicated by a motion vector starting from a target transform processing unit 105 performs inverse transform on
PU . In a case of an intra prediction , the prediction parameter the calculated transform coefficients to calculate a residual
is , for example, an intra prediction mode . The prediction signal. The inverse quantization and inverse transform pro
image generation unit 101 reads a pixel value of an adjacent cessing unit 105 outputs the calculated residual signal to the
PU used in an intra prediction mode from the reference addition unit 106 .
picture memory 109 , and generates a prediction image P of [ 0598 ] The addition unit 106 adds a signal value of the
a PU . The prediction image generation unit 101 generates prediction image P of the PU input from the prediction
the prediction image P of the PU by using one prediction image generation unit 101 and a signal value of the residual
scheme among multiple prediction schemes for the read signal input from the inverse quantization and inverse trans
reference picture block . The prediction image generation form processing unit 105 for each pixel , and generates the
unit 101 outputs the generated prediction image P of the PU decoded image . The addition unit 106 stores the generated
to the subtraction unit 102 . decoded image in the reference picture memory 109 .
US 2021/0136407 A1 May 6 , 2021
39

[ 0599 ] The loop filter 107 performs a deblocking filter, a parameters necessary for generation of a prediction image
sample adaptive offset ( SAO ) , and an adaptive loop filter output to the prediction image generation unit 101. A
(ALF ) to the decoded image generated by the addition unit configuration of the inter prediction parameter coder 112
106. Note that the loop filter 107 need not necessarily will be described later.
include the three types of filters described above , and may [ 0606 ] The intra prediction parameter coder 113 includes
be configured with a deblocking filter only , for example. a partly identical configuration to the configuration in which
[ 0600 ] The prediction parameter memory 108 stores the the intra prediction parameter decoder 304 derives intra
prediction parameter generated by the coding parameter prediction parameters, as a configuration to derive the pre
determination unit 110 for each picture and CU of the coding diction parameters necessary for generation of a prediction
target in a predetermined position . image output to the prediction image generation unit 101 .
[ 0601 ] The reference picture memory 109 stores the [ 0607] The intra prediction parameter coder 113 derives a
decoded image generated by the loop filter 107 for each format for coding ( for example, MPM_idx, rem_intra_
picture and CU of the coding target in a predetermined luma_pred_mode, and the like ) from the intra prediction
position . Note that the memory management of a reference mode IntraPredMode input from the coding parameter deter
picture is the same as the process of the reference picture mination unit 110 .
memory 306 of the video decoding apparatus described [ 0608 ] Configuration of Inter Prediction Parameter Coder
above , and thus descriptions thereof will be omitted . Next, a configuration of the inter prediction parameter coder
[ 0602 ] The coding parameter determination unit 110 112 will be described. The inter prediction parameter coder
selects one set among multiple sets of coding parameters. A 112 is a unit corresponding to the inter prediction parameter
coding parameter is an above -mentioned QT or BT parti decoder 303 of FIG . 28 , and FIG . 48 illustrates the configu
tioning parameter or a prediction parameter or a parameter ration .
to be a target of coding which is generated associated with [ 0609 ] The inter prediction parameter coder 112 includes
these. The prediction image generation unit 101 generates an inter prediction parameter coding control unit 1121 , an
the prediction image P of the PU by using each of the sets AMVP prediction parameter derivation unit 1122 , a subtrac
of these coding parameters. tion unit 1123 , a subblock prediction parameter derivation
[ 0603 ] The coding parameter determination unit 110 cal unit 1125 , a BTM predictor 1126 , and a LIC predictor 1127 ,
culates an RD cost value indicating the volume of the and a partitioning mode derivation unit, a merge flag deri
information quantity and coding errors for each of the vation unit, an inter prediction indicator derivation unit, a
multiple sets . For example, the RD cost value is a sum of a reference picture index derivation unit , a vector difference
code amount and a value of multiplying a square error by a derivation unit or the like not illustrated , and each of the
coefficient X. The code amount is an information quantity of partitioning mode derivation unit , the merge flag derivation
the coding stream TeS obtained by performing entropy unit, the inter prediction indicator derivation unit, the ref
coding on a quantization residual and a coding parameter. erence picture index derivation unit, and the vector differ
The square error is a sum of pixels for square values of ence derivation unit derives a PU partitioning mode part_
residual values of residual signals calculated in the subtrac mode , a merge flag merge_flag, an inter prediction indicator
tion unit 102. The coefficient X is a pre - configured real inter_pred_ide, a reference picture index refldxLX , and a
number that is larger than a zero . The coding parameter difference vector mvdLX , respectively. The inter prediction
determination unit 110 selects a set of coding parameters by parameter coder 112 outputs a motion vector (mvLX , sub
which the calculated RD cost value is minimized . With this MvLX ), a reference picture index refldxLX, a PU partition
configuration, the entropy coder 104 outputs the selected set ing mode part_mode, an inter prediction indicator inter_
of coding parameters as the coding stream Tes to the pred_ide, or information for indicating these to the
outside , and does not output sets of coding parameters that prediction image generation unit 101. The inter prediction
are not selected . The coding parameter determination unit parameter coder 112 outputs a PU partitioning mode part
110 stores the determined coding parameters in the predic mode, a merge flag merge_flag, a merge index merge_idx ,
tion parameter memory 108 . an inter prediction indicator inter_pred_idc, a reference
[ 0604 ] The prediction parameter coder 111 derives a for picture index refldxLX , a prediction vector index mvp_1X_
mat for coding from the parameters input from the coding idx , a difference vector mvdLX , and a subblock prediction
parameter determination unit 110 , and outputs the format to mode flag subPbMotionFlag to the entropy coder 104 .
the entropy coder 104. The derivation of the format for [ 0610 ] The inter prediction parameter coding control unit
coding is , for example , to derive a difference vector from a 1121 includes a merge index derivation unit 11211 and a
motion vector and a prediction vector . The prediction param vector candidate index derivation unit 11212. The merge
eter coder 111 derives parameters necessary to generate a index derivation unit 11211 compares a motion vector and a
prediction image from the parameters input from the coding reference picture index input from the coding parameter
parameter determination unit 110 , and outputs the param determination unit 110 with a motion vector and a reference
eters to the prediction image generation unit 101. For picture index possessed by a PU of a merge candidate read
example , the parameters necessary to generate a prediction from the prediction parameter memory 108 to derive a
image are a motion vector in a unit of subblock . merge index merge_idx, and outputs it to the entropy coder
[ 0605 ] The inter prediction parameter coder 112 derives a 104. The merge candidate is a reference PU in a predeter
inter prediction parameter, based on the prediction param mined range from a coding target CU being a coding target
eters input from the coding parameter determination unit ( for example, a reference PU adjoining the lower left end ,
110. The inter prediction parameter coder 112 includes a the upper left end, and the upper right end of the coding
partly identical configuration to the configuration in which target block ), and is a PU for which a coding process is
the inter prediction parameter decoder 303 derives inter completed. The vector candidate index derivation unit 11212
prediction parameters, as a configuration to derive the derives a prediction vector index mvp_1X_idx.
US 2021/0136407 A1 May 6 , 2021
40

[ 0611 ] In a case that the coding parameter determination indicates that the shape of the slice is rectangular, and the
unit 110 determines the use of a subblock prediction mode, slice that is rectangular is coded independently without
the subblock prediction parameter derivation unit 1125 reference to information of another slice within the picture
derives a motion vector and a reference picture index for a and without reference to information of another slice among
subblock prediction of any of a spatial subblock prediction , the plurality of the pictures by the slice that is rectangular.
a temporal subblock prediction, an affine prediction , a [ 0616 ] A video decoding apparatus according to an aspect
matching motion derivation, and an OBMC prediction , in of the present invention includes: in decoding of a slice
accordance with the value of subPbMotionFlag. As resulting from partitioning of a picture , a first decoder unit
described in the description of the rectangular slice decoder configured to decode a sequence parameter set including
2002 , the motion vector and the reference picture index are information related to a plurality of the pictures; a second
derived by reading out a motion vector or a reference picture decoder unit configured to decode information indicating a
index of an adjacent PU , a reference picture block , or the position and a size of the slice on the picture; a third decoder
like from the prediction parameter memory 108. The sub unit configured to decode the picture on a slice unit basis ,
block prediction parameter derivation unit 1125 , and a and a fourth decoder unit configured to decode a NAL
spatial-temporal subblock predictor 11251 , an affine predic header unit, wherein the first decoder unit decodes a flag
tor 11252 , a matching predictor 11253 , and an OBMC indicating whether a shape of the slice is rectangular or not,
predictor 11254 included in the subblock prediction param the position and the size of the slice that is rectangular and
eter derivation unit 1125 have configurations similar to the has a same slice ID are not changed in a period of time in
subblock prediction parameter derivation unit 3037 of the which each of the plurality of the pictures refers to a same
inter prediction parameter decoder 303 , and the spatial sequence parameter set in a case that the flag indicates that
temporal subblock predictor 30371 , the affine predictor the shape of the slice is rectangular, and the slice that is
30372 , the matching predictor 30373 , and the OBMC pre rectangular is decoded without reference to information of
dictor 30374 included in the subblock prediction parameter another slice within a picture and without reference to
derivation unit 3037 . information of another slice that is rectangular among the
[ 0612 ] The AMVP prediction parameter derivation unit plurality of the pictures by the slice that is rectangular.
1122 includes an affine predictor 11221 , and has a configu [ 0617] In a video coding apparatus or a video decoding
ration similar to the AMVP prediction parameter derivation apparatus according to an aspect of the present invention, the
unit 3032 ( see FIG . 28 ) described above . independent coding or decoding processing of the slice that
[ 0613 ] In other words , in a case that the prediction mode is rectangular refers to only a block included in the slice that
predMode indicates an inter prediction mode , a motion is collocated and rectangular, and derives a prediction vector
vector mvLX is input to the AMVP prediction parameter candidate in a temporal direction
derivation unit 1122 from the coding parameter determina [ 0618 ] In a video coding apparatus or a video decoding
tion unit 110. The AMVP prediction parameter derivation apparatus according to an aspect of the present invention , the
unit 1122 derives a prediction vector mvpLX , based on the independent coding or decoding processing of the slice that
input motion vector mvLX . The AMVP prediction param is rectangular clips a reference position at positions of upper,
eter derivation unit 1122 outputs the derived prediction lower, left, and right boundary pixels of the slice that is
vector mvpLX to the subtraction unit 1123. Note that the collocated and rectangular in reference of a reference picture
reference picture index refldxLX and the prediction vector by motion compensation.
index mvp_1X_idx are output to the entropy coder 104. The [ 0619 ] In a video coding apparatus or a video decoding
affine predictor 11221 has a configuration similar to the apparatus according to an aspect of the present invention, the
affine predictor 30321 ( see FIG . 28 ) of the AMVP prediction independent coding or decoding processing of the slice that
parameter derivation unit 3032 described above . The LIC is rectangular limits a motion vector such that the motion
predictor 1127 has a configuration similar to the LIC pre vector enters within the slice that is collocated and rectan
dictor 3039 ( see FIG . 28 ) described above . gular in motion compensation .
[ 0614 ] The subtraction unit 1123 subtracts the prediction [ 0620 ] In a video coding apparatus according to an aspect
vector mvpLX input from the AMVP prediction parameter of the present invention , the first coder unit codes a maxi
derivation unit 1122 from the motion vector mvLX input mum value of a temporal hierarchy identifier and an inser
from the coding parameter determination unit 110 , and tion period of an intra slice .
generates a difference vector mvdLX . The difference vector
mvdLX is output to the entropy coder 104 . [ 0621 ] In a video decoding apparatus according to an
[ 0615 ] A video coding apparatus according to an aspect of aspect of the present invention , the first decoder unit decodes
the present invention includes: in coding of a slice resulting a maximum value of a temporal hierarchy identifier and an
from partitioning of a picture, a first coder unit configured to insertion period of an intra slice .
code a sequence parameter set including information related [ 0622 ] In a video coding apparatus according to an aspect
to a plurality of the pictures; a second coder unit configured of the present invention, the third coder unit codes intra
to code information indicating a position and a size of the slices in a unit of the plurality of the pictures, and an
slice on the picture; a third coder unit configured to code the insertion position of an intra slice of the intra slices is a
picture on a slice unit basis , and a fourth coder unit config picture of which a temporal hierarchy identifier is zero .
ured to code a NAL header unit , wherein the first coder unit [ 0623 ] In a video coding apparatus according to an aspect
codes a flag indicating whether a shape of the slice is of the present invention, the fourth coder unit codes an
rectangular or not, the position and the size of the slice that identifier indicating a type of NAL unit , an identifier indi
is rectangular and has a same slice ID is not changed in a cating a layer to which NAL belongs , and a temporal
period of time in which each of the plurality of the pictures identifier, and codes in addition the slice ID in a case that the
refers to a same sequence parameter set in a case that the flag NAL unit stores data including a slice header.
US 2021/0136407 A1 May 6 , 2021
41

[ 0624 ) In a video decoding apparatus according to an ments and various amendments can be made to a design that
aspect of the present invention , the fourth decoder unit codes fall within the scope that does not depart from the gist of the
an identifier indicating a type of NAL unit , an identifier present invention .
indicating a layer to which NAL belongs , and a temporal
identifier, and codes in addition the slice ID in a case that the Application Examples
NAL unit stores data including a slice header. [ 0628 ] The above -mentioned video coding apparatus 11
Implementation Examples by Software
and the video decoding apparatus 31 can be utilized being
installed to various apparatuses performing transmission ,
reception, recording , and regeneration of videos. Note that,
[ 0625 ] Note that, part of the slice coder 2012 and the slice videos may be natural videos imaged by cameras or the like ,
decoder 2002 in the above -mentioned embodiments , for or may be artificial videos ( including CG and GUI) gener
example , the entropy decoder 301 , the prediction parameter ated by computers or the like.
decoder 302 , the loop filter 305 , the prediction image [ 0629 ] At first, referring to FIG . 49 , it will be described
generation unit 308 , the inverse quantization and inverse that the above -mentioned video coding apparatus 11 and the
transform processing unit 311 , the addition unit 312 , the video decoding apparatus 31 can be utilized for transmission
prediction image generation unit 101 , the subtraction unit and reception of videos .
102 , the transform processing and quantization unit 103 , the [ 0630] (a) of FIG . 49 is a block diagram illustrating a
entropy coder 104 , the inverse quantization and inverse configuration of a transmitting apparatus PROD_A installed
transform processing unit 105 , the loop filter 107 , the coding with the video coding apparatus 11. As illustrated in ( a) of
parameter determination unit 110 , and the prediction param FIG . 49 , the transmitting apparatus PROD_A includes a
eter coder 111 , may be realized by a computer. In that case , coder PROD_A1 which obtains coded data by coding vid
this configuration may be realized by recording a program eos , a modulation unit PROD_A2 which obtains modulating
for realizing such control functions on a computer -readable signals by modulating carrier waves with the coded data
recording medium and causing a computer system to read obtained by the coder PROD_A1, and a transmitter PROD_
the program recorded on the recording medium for execu A3 which transmits the modulating signals obtained by the
tion . Note that it is assumed that the " computer system ” modulation unit PROD_A2 . The above -mentioned video
mentioned here refers to a computer system built into either coding apparatus 11 is utilized as the coder PROD_A1.
the slice coder 2012 or the slice decoder 2002 , and the [ 0631 ] The transmitting apparatus PROD_A may further
computer system includes an OS and hardware components include a camera PROD_A4 for imaging videos, a recording
such as a peripheral apparatus . The " computer - readable medium PROD_A5 for recording videos , an input terminal
recording medium ” refers to a portable medium such as a PROD_A6 to input videos from the outside, and an image
flexible disk , a magneto -optical disk , a ROM , a CD - ROM , processing unit PRED_A7 which generates or processes
and the like , and a storage apparatus such as a hard disk built images , as sources of supply of the videos input into the
into the computer system . Moreover, the computer-read coder PROD_A1. In (a ) of FIG . 49 , although the configu
able recording medium ” may include a medium that ration that the transmitting apparatus PROD_A includes
dynamically retains a program for a short period of time , these all is exemplified, a part may be omitted .
such as a communication line that is used to transmit the
program over a network such as the Internet or over a [ 0632 ] Note that the recording medium PROD_A5 may
communication line such as a telephone line , and may also record videos which are not coded , or may record videos
include a medium that retains a program for a fixed period coded in a coding scheme for recording different than a
of time , such as a volatile memory within the computer coding scheme for transmission . In the latter case , a decoder
system for functioning as a server or a client in such a case . ( not illustrated ) to decode coded data read from the record
The program may be configured realize some of the ing medium PROD_A5 according to a coding scheme for
functions described above , and also may be configured to be recording may be interleaved between the recording medium
capable of realizing the functions described above in com PROD_A5 and the coder PROD_A1.
bination with a program already recorded in the computer [ 0633 ] (b ) of FIG . 49 is a block diagram illustrating a
system . configuration of a receiving apparatus PROD_B installed
[ 0626 ] Part or all of the video coding apparatus 11 and the with the video decoding apparatus 31. As illustrated in (b ) of
video decoding apparatus 31 in the embodiments described FIG . 49 , the receiving apparatus PROD_B includes a
above may be realized as an integrated circuit such as a receiver PROD_B1 which receives modulating signals , a
Large Scale Integration (LSI). Each function block of the demodulation unit PROD_B2 which obtains coded data by
demodulating the modulating signals received by the
video coding apparatus 11 and the video decoding apparatus receiver PROD_B1 , and a decoder PROD_B3 which obtains
31 may be individually realized as a processor, or part or all videos by decoding the coded data obtained by the demodu
may be integrated into a processor. The circuit integration lation unit PROD_B2 . The above -mentioned video decoding
technique is not limited to LSI , and the integrated circuits for apparatus 31 is utilized as the decoder PROD_B3 .
the functional blocks may be realized as dedicated circuits or [ 0634 ] The receiving apparatus PROD_B may further
a multi - purpose processor. In a case that with advances in include a display PROD_B4 for displaying videos , a record
semiconductor technology, a circuit integration technology ing medium PROD_B5 to record the videos, and an output
with which an LSI is replaced appears, an integrated circuit terminal PROD_B6 to output videos outside , as supply
based on the technology may be used . destination of the videos output by the decoder PROD_B3 .
[ 0627] The embodiment of the present invention has been In (b ) of FIG . 49 , although the configuration that the
described in detail above referring to the drawings, but the receiving apparatus PROD_B includes these all is exempli
specific configuration is not limited to the above embodi fied , a part may be omitted .
US 2021/0136407 A1 May 6 , 2021
42
[ 0635 ] Note that the recording medium PROD_B5 may [ 0642 ] Note that the recording medium PROD_M may be
record videos which are not coded , or may record videos ( 1 ) a type built in the recording apparatus PROD_C such as
which are coded in a coding scheme for recording different Hard Disk Drive ( HDD ) or Solid State Drive ( SSD ) , may be
from a coding scheme for transmission . In the latter case , a (2 ) a type connected to the recording apparatus PROD_C
coder (not illustrated ) to code videos acquired from the such as an SD memory card or a Universal Serial Bus (USB )
decoder PROD_B3 according to a coding scheme for flash memory, and may be (3 ) a type loaded in a drive
recording may be interleaved between the decoder PROD_ apparatus ( not illustrated ) built in the recording apparatus
B3 and the recording medium PROD_B5. PROD_C such as Digital Versaslice Disc (DVD ) or Blu - ray
[ 0636 ] Note that the transmission medium for transmitting Disc (BD : trade name).
modulating signals may be wireless or may be wired . The [ 0643 ] The recording apparatus PROD_C may further
transmission aspect to transmit modulating signals may be include a camera PROD_C3 for imaging a video , an input
broadcasting ( here , referred to as the transmission aspect terminal PROD_C4 to input the video from the outside , a
where the transmission target is not specified beforehand ) or receiver PROD_C5 to receive the video , and an image
may be telecommunication (here, referred to as the trans processing unit PROD_C6 which generates or processes
mission aspect that the transmission target is specified images , as sources of supply of the video input into the coder
beforehand ). Thus, the transmission of the modulating sig PROD_C1. In ( a) of FIG . 50 , although the configuration that
nals may be realized by any of radio broadcasting, cable the recording apparatus PRODC includes these all is exem
broadcasting , radio communication , and cable communica plified , a part may be omitted .
tion . [ 0644 ] Note that the receiver PROD_C5 may receive a
[ 0637 ] For example, broadcasting stations ( broadcasting video which is not coded , or may receive coded data coded
equipment, and the like )/ receiving stations (television in a coding scheme for transmission different from a coding
receivers, and the like) of digital terrestrial television broad scheme for recording. In the latter case , a decoder ( not
casting are examples of the transmitting apparatus PROD illustrated ) for transmission to decode coded data coded in
A / receiving apparatus PROD_B for transmitting and / or a coding scheme for transmission may be interleaved
receiving modulating signals in radio broadcasting. Broad between the receiver PROD_C5 and the coder PROD_C1.
casting stations (broadcasting equipment, and the like) [ 0645 ] Examples of such recording apparatus PROD_C
receiving stations ( television receivers, and the like) of cable include a DVD recorder, a BD recorder, a Hard Disk Drive
television broadcasting are examples of the transmitting (HDD ) recorder, and the like (in this case , the input terminal
apparatus PROD_A / receiving apparatus PROD_B for trans PROD_C4 or the receiver PROD_C5 is the main source of
mitting and / or receiving modulating signals in cable broad supply of a video ) . A camcorder ( in this case , the camera
casting PROD_C3 is the main source of supply of a video ) , a
[ 0638 ] Servers (work stations, and the like )/clients (tele personal computer ( in this case , the receiver PROD_C5 or
vision receivers, personal computers, smartphones, and the the image processing unit C6 is the main source of supply of
like) for Video On Demand (VOD ) services , video hosting a video ) , a smartphone ( in this case , the camera PROD_C3
services using the Internet and the like are examples of the or the receiver PROD_C5 is the main source of supply of a
transmitting apparatus PROD_A / receiving apparatus video ) , or the like is an example of such recording apparatus
PROD_B for transmitting and /or receiving modulating sig PROD_C .
nals in telecommunication ( usually, any of radio or cable is [ 0646 ] (b ) of FIG . 50 is a block illustrating a configuration
used as transmission medium in the LAN , and cable is used of a regeneration apparatus PROD_D installed with the
for as transmission medium in the WAN ) . Here, personal above -mentioned video decoding apparatus 31. As illus
computers include a desktop PC , a laptop type PC , and a trated in (b ) of FIG . 50 , the regeneration apparatus PROD_D
graphics tablet type PC . Smartphones also include a multi includes a reading unit PROD_D1 which reads coded data
functional portable telephone terminal. written in the recording medium PROD_M , and a decoder
[ 0639 ] Note that a client of a video hosting service has a
PROD_D2 which obtains a video by decoding the coded
function to code a video imaged with a camera and upload data read by the reading unit PROD_D1 . The above -men
the video to a server , in addition to a function to decode
tioned video decoding apparatus 31 is utilized as the decoder
PROD_D2.
coded data downloaded from a server and to display on a [ 0647] Note that the recording medium PROD_M may be
display . Thus, a client of a video hosting service functions as ( 1 ) a type built in the regeneration apparatus PROD_D such
both the transmitting apparatus PROD_A and the receiving as HDD or SSI may be (2) a type connected to the
apparatus PROD_B . regeneration apparatus PROD_D such as an SD memory
[ 0640 ] Next , referring to FIG . 50 , it will be described that card or a USB flash memory, and may be (3 ) a type loaded
the above -mentioned video coding apparatus 11 and the in a drive apparatus ( not illustrated ) built in the regeneration
video decoding apparatus 31 can be utilized for recording apparatus PROD_D such as DVD or BD .
and regeneration of videos. [ 0648 ] The regeneration apparatus PROD_D may further
[ 0641 ] ( a) of FIG . 50 is a block diagram illustrating a include a display PROD_D3 for displaying a video , an
configuration of a recording apparatus PROD_C installed output terminal PROD_D4 to output the video to the outside ,
with the above -mentioned video coding apparatus 11. As and a transmitter PROD_D5 which transmits the video , as
illustrated in ( a) of FIG . 50 , the recording apparatus the supply destination of the video output by the decoder
PROD_C includes a coder PROD_C1 which obtains coded PROD_D2. In (b ) of FIG . 50 , although the configuration that
data by coding a video , and a writing unit PROD_C2 which the regeneration apparatus PROD_D includes these all is
writes the coded data obtained by the coder PROD_C1 in a exemplified, a part may be omitted .
recording medium PROD_M . The above -mentioned video [ 0649 ] Note that the transmitter PROD_D5 may transmit
coding apparatus 11 is utilized as the coder PROD_C1 . a video which is not coded , or may transmit coded data
US 2021/0136407 A1 May 6 , 2021
43

coded in a coding scheme for transmission different than a grated Services Digital Network (ISDN) , Value -Added Net
coding scheme for recording. In the latter case , a coder ( not work ( VAN ), a Community Antenna television /Cable Tele
illustrated ) to code a video in a coding scheme for trans vision (CATV ) communication network , Virtual Private
mission may be interleaved between the decoder PROD_D2 Network , telephone network , mobile communication net
and the transmitter PROD_D5 . work , satellite communication network , and the like are
[ 0650 ] Examples of such regeneration apparatus PROD_D available . A transmission medium constituting this commu
include a DVD player, a BD player, an HDD player, and the nication network may also be a medium which can transmit
like (in this case , the output terminal PROD_D4 to which a a program code , and is not limited to a particular configu
television receiver, and the like is connected is the main ration or a type. For example, a cable communication such
supply target of the video ) . A television receiver ( in this as Institute of Electrical and Electronic Engineers ( IEEE )
case , the display PROD_D3 is the main supply target of the 1394 , a USB , a power line carrier, a cable TV line , a phone
video ) , a digital signage ( also referred to as an electronic line , an Asymmetric Digital Subscriber Line ( ADSL ) line ,
signboard or an electronic bulletin board , and the like , the and a radio communication such as infrared ray such as
display PROD_D3 or the transmitter PROD_D5 is the main Infrared Data Association ( IrDA) or a remote control, Blu
supply target of the video ), a desktop PC ( in this case , the eTooth ( trade name), IEEE 802.11 radio communication ,
output terminal PROD_D4 or the transmitter PROD_D5 is High Data Rate ( HDR) , Near Field Communication (NFC ) ,
the main supply target of the video ) , a laptop type or Digital Living Network Alliance (DLNA: trade name ), a
graphics tablet type PC ( in this case , the display PROD_D3 cellular telephone network , a satellite channel, a terrestrial
or the transmitter PROD_D5 is the main supply target of the digital broadcast network are available . Note that the
video ) , a smartphone ( in this case , the display PROD_D3 or embodiments of the present invention can be also realized in
the transmitter PROD_D5 is the main supply target of the the form of computer data signals embedded in a carrier
video ) , or the like is an example of such regeneration wave where the program code is embodied by electronic
apparatus PROD_D . transmission .
[ 0651 ] Realization as Hardware and Realization as Soft [ 0655 ] The embodiments of the present invention are not
ware Each block of the above -mentioned video decoding limited to the above -mentioned embodiments, and various
apparatus 31 and the video coding apparatus 11 may be modifications are possible within the scope of the claims .
realized as a hardware by a logical circuit formed on an Thus, embodiments obtained by combining technical means
integrated circuit (IC chip ) , or may be realized as a software modified appropriately within the scope defined by claims
using a Central Processing Unit (CPU) . are included in the technical scope of the present invention .
[ 0652 ] In the latter case , each apparatus includes a CPU INDUSTRIAL APPLICABILITY
performing a command of a program to implement each
function, a Read Only Memory (ROM) stored in the pro [ 0656 ] The embodiments of the present invention can be
gram , a Random Access Memory ( RAM ) for developing the preferably applied to a video decoding apparatus to decode
program , and a storage apparatus ( recording medium) such coded data where image data is coded, and a video coding
as a memory for storing the program and various data , and apparatus to generate coded data where image data is coded .
the like . The purpose of the embodiments of the present The embodiments of the present invention can be preferably
invention can be achieved by supplying , to each of the applied to a data structure of coded data generated by the
apparatuses , the recording medium recording readably the video coding apparatus and referred to by the video decod
program code (execution form program , intermediate code ing apparatus.
program , source program ) of the control program of each of
the apparatuses which is a software implementing the above REFERENCE SIGNS LIST
mentioned functions with a computer, and reading and
performing the program code that the computer (or a CPU [ 0657] 41 Video display apparatus
or a MPU) records in the recording medium . [ 0658 ] 31 Video decoding apparatus
[ 0653 ] For example, as the recording medium , a tape such [ 0659 ] 2002 Slice decoder
as a magnetic tape or a cassette tape, a disc including a [ 0660 ] 11 Video coding apparatus
magnetic disc such as a floppy ( trade name) disk / a hard disk [ 0661 ] 2012 Slice coder
and an optical disc such as a Compact Disc Read - Only 1-8 . ( canceled)
Memory (CD - ROM )/Magneto -Optical disc ( MO disc ) Mini 9. A decoding device for decoding a picture including a
Disc ( MD )/ Digital Versatile Disc ( DVD )/CD Recordable rectangular region, the decoding device comprising:
(CD - R )/Blu - ray Disc ( trade name ), a card such as an IC card a prediction parameter decoding circuitry that decodes a
( including a memory card ) / an optical card, a semiconductor flag in a sequence parameter set, wherein the flag
memory such as a mask ROM /Erasable Programmable specifies whether rectangular region information is
Read -Only Memory ( EPROM )/Electrically Erasable and present in the sequence parameter set ; and
Programmable Read - Only Memory (EEPROM : trade
name)/ a flash ROM , or a Logical circuits such as a Pro a motion compensation circuitry that derives padding
grammable logic device (PLD ) or a Field Programmable locations,
Gate Array ( FPGA ) can be used . wherein
[ 0654 ] Each of the apparatuses is configured connectably the prediction parameter decoding circuitry decodes the
with a communication network , and the program code may rectangular region information , if a value of the flag
be supplied through the communication network . This com is equal to one , and
munication network may be able to transmit a program code, the padding locations is derived by using top left coordi
and is not specifically limited . For example, the Internet, the nates and a width and a height of the rectangular region ,
intranet, the extranet, Local Area Network (LAN ), Inte if the value of the flag is equal to one .
US 2021/0136407 A1 May 6 , 2021
44

10. The decoding device of claim 1 , wherein the rectan


gular region information includes (i ) a first syntax element
specifying a number of rectangular region and ( ii ) a second
syntax element specifying a size of the rectangular region.
11. A method for decoding a picture including a rectan
gular region, the method including:
decoding a flag in a sequence parameter set , wherein the
flag specifies whether rectangular region information is
present in the sequence parameter set ;
decoding the rectangular region information , if a value of
the flag is equal to one ; and
deriving padding locations by using top left coordinates
and a width and a height of the rectangular region , if
the value of the flag is equal to one .
12. A coding device for coding a picture including a
rectangular region , the coding device comprising:
a prediction parameter coding circuitry that codes a flag in
a sequence parameter set , wherein the flag specifies
whether rectangular region information is present in the
sequence parameter set; and
a motion compensation circuitry that derives padding
locations ,
wherein
the prediction parameter coding circuitry codes the
rectangular region information , if a value of the flag
is equal to one , and
the padding locations is derived by using top left coordi
nates and a width and a height of the rectangular region ,
if the value of the flag is equal to one .

You might also like