Video Coding (VC-1)
Video Coding (VC-1)
Outline
z z z z z z
Windows Media family and its evolution WMV applications Video coding tools Comparison with MPEG-2, H.264/AVC Performance evaluations Conclusions
Focus on streaming compressed audio and video over the Internet to personal computers. Has a vision to move forward and enable the effective delivery of digital media through any networks to any devices. Applications include:
Internet based application like Web broadcast, VOD. Consumer electronics like DVD, car audio and mobile
WM end-2-end delivery
z z
Audio codec
Windows Media Audio 9 (mono/stereo, 8kHz~48kHz,
5kbps~320kbps, CD quality at 48~128kbps) Windows Media Audio 9 Professional (5.1 or 7.1 ch, up to 96kHz, up to 24 bits/sample, 128kbps~) Windows Media Audio 9 Lossless (2:1 ratio for stereo) Windows Media Audio 9 Voice (mono, 4kbps~20kbps, hybrid CELP/transform coding)
z
Video codec
Windows Media Video 7 and 8 (non-standard version of MPEG-4) Windows Media Video 9 (VC-9, VC-1) (160x120@10kbps,
BT.601@2Mbps, 720p@4~6Mbps, 1080i@6~20Mbps) Windows Media Video 9 Screen (generally 28kbps, 100kbps for images) Windows Media Video 9 Image (slide show and transitions)
One-pass CBR (live encoding and transmission) Two-pass CBR (offline encoding for on-demand streaming) One-pass VBR (live capture) Two-pass VBR (download-and-play applications) Peak-constrained VBR (constrained reading-speed) Avg/max/min bitrates are specified. Multiple bitrate encoding (MBR)
WMV status
z
HD movies have been commercially released in 2003. z WMV-9 is under consideration of SMPTE, to be VC-1 by C-24 group, Sep 2003. Promoted to CD, March 2004.
previously named Proposed SMPTE Standard for Television: VC-9
Bit-stream Parsing
Inv. VLC
Inv Quant
Inv Transf
Decoded Frame
Motion Compensation Inv. VLC Pred ? pel interp 4MV ? pel interp Buffer
(1-frame delay)
Re-sizing
Same structure
z z z
Internal color format is 8-bit 4:2:0. Block-based motion compensation and spatial transform. I/P/B definitions are similar to MPEG-4. (not as H.264)
Design criteria
z
Design metrics
Rate-distortion curve Visually feedback by cinema testing Drift-free design for bit exact reconstruction Computational complexity v.s. coding gain
z z z
FP arithmetic is ruled out 16 bit word size is preferred Conditional statements should be minimized.
Guideline: Any inefficiency in signal processing operations tends to have a big impact on R-D at high rates, whereas any inefficiency in entropy coding has more impact at low rate R-D plot.
Signal process ops: motion comp., transform, loop filtering. Entropy coding: zigzag scanning, motion vector prediction.
Adaptive block size transform Limited precision transform set Adaptive motion compensation Adaptive quantization Advanced entropy coding Loop filtering Advanced B frame coding Interlace coding Overlap smoothing Low-rate tools Fading compensation
Trends and textures are better preserved by large transform, while areas of discontinuity are better by small transform. z One 8x8, two 8x4, two 4x8 or four 4x4 transforms are applicable to code a block, which allows to use the size best suited for the underlying data. z Transform type can be signaled at the frame, macroblock or block level. z Intra block always adopts 8x8 transform.
The ability of retain texture information by large transform. Although R-D gain is not huge, it provides major subjective quality benefits, especially for the subtle texture, film details and grain noise. In H.264 high profile, adaptive transform is added for acknowledging this benefit.
16 bit transform
z
Design constraints
A full 16-bit operation, where both sums and products of two 16-bit
values produce results within 16-bits. Forward and inverse transform form an orthogonal pair. VU = diag(D) Transform approximates a DCT. Norms of basis functions within one transform type are identical. Norms of basis functions between transform types are identical.
8x8 inverse transform places the tightest constraint. z WMV-9 relaxes the last two constraints. The norms are in the ratio 288:289:292 (1% difference). This is compensated during encoding process. z Row Itrans => rounding => column Itrans => rounding
Motion compensation
z z z
8x8 or 16x16 prediction Up to -pel motion vector is adopted. Adaptive motion mode derived from 3 criteria (MV resolution, size, filtering type) is signaled at frame level.
Mixed block size (16x16 and 8x8), -pel, bicubic [high
bitrate] 16x16, -pel, bicubic 16x16, -pel, bicubic 16x16, -pel, bilinear, [low bitrate]
Bicubic filtering
z
-pel bilinear filtering is applied to chrominance components. -pel bilinear is optional for low complexity applications.
Case 1 Case 4 Case 5 Case 8 Integer locations Case 3 Case 6 Case 2 Case 7
Adaptive quantization
z z
The same quantization rule applies to all 4 transform coeffs. Two quantization modes, decided at each frame
Dead-zone, suitable for low bitrate, {-kQ-D, 0, kQ+D} Regular uniform quantization, high bitrate, {kQ} Adaptively change according to the running QP
Dead-zone
3/2QP
In WMV9, up to 8 tables (coding sets) are used for coding each symbol and is selected by each frame. E.g., there are 8 transform AC coeff. tables, which is different from H.264, symbols are encoded adaptively by several tables of different symbol distributions.
Y blocks Table High Rate Intra High Motion Intra Mid Rate Intra Cb and Cr blocks Index Table 0 1 2 High Rate Inter High Motion Inter Mid Rate Inter
run_before 0 1 2 3 4 5 6
Coding Set Correspondence for PQINDEX > 8 Index 0 1 2 Y blocks Table High Rate Intra High Motion Intra Mid Rate Intra Cb and Cr blocks Index Table 0 1 2 High Rate Inter High Motion Inter Mid Rate Inter
zerosLeft
1 1 0 2 1 01 00 3 11 10 01 00 4 11 10 01 001 000 5 11 10 011 010 001 000 6 11 000 001 011 010 101 100 >6 111 110 101 100 011 010 001
Some symbols are spatially correlated, e.g. MB type. An efficient way to encode these symbols by taking advantage of spatial dependency of these bits 7 Modes: Raw, RowSkip, ColSkip, Norm-2, Norm-6, Diff-2 and Diff-6
skip
Row-skip Col-skip
intra
MB type of P-VOP
inter
Loop filtering
z
z z
z z z
In-loop deblocking filter is used as H.264. Filtering is applied to every 4th, 8th, 12th, etc pixel row or column depending on transform type. Adaptive filtering rule A shortcut to save computation. Filtering energy is small than that of H.264.
Shortcut
Interlace coding
z
MVs, where each MV can refer to either one of two previously encoded fields.
z z z
Explicit coding of the B frames temporal position relative to its two reference frames. (variable velocity model) Intra-coded B frames. Improve MV coding efficiency. Allow bottom B-field to refer to top B-field.
Overlap smoothing
z
Another technique to reduce blocking artifacts in intra areas. z Drawback of deblocking filtering
It is purely a decoder process, which operates equally on both
block-aligned true edges and apparent block edges. Usually disable in the less complex profiles.
z
The lapped transform is another way to remove blocking effect. z Spatial-domain approach makes lapped transform as a pre- and post-processing. z Adaptive applications rule: applied in the lower bitrate, also can be switched on or off at MB-basis.
p0 p1 q1 q0 a0 a1 b1 b0
y0 7 y1 1 y = 1 2 y 1 3
0 0 1 x0 r0 7 1 1 x1 r1 + >> 3 1 7 1 x 2 r0 0 0 7 x3 r1
to further reduce rate cost and keep the constant bitrate requirement.
original
Fading compensation
z
Encoder detects fading prior to motion compensation by comparing the error measure with a threshold. Encoder and decoder use the quantized fading parameters based on a linear first-order function to transform the original reference frame into a new reference frame.
Video smoothing
z
Interpolate missing frames after decoding, also referred to as frame interpolation z Use an advanced optical flow estimation technique (on a perpixel basis), along with warping, to synthesize new frames. z Need a CPU at 733MHz to interpolate a video clip at 320x240 from 10 to 30 fps. z J. Ribas-Corbeta and J. Sklansky, Interframe interpolation of cinematic sequences, Journal of VCIR, Dec 1993.
Motion block size Brightness change Ref. Frame num (P/B) Generalised B Intra prediction Inter-intra mixed
16x16, 16x8, , 4x4 Weighted prediction (B) M/M Y Spatial-domain prediction N/A
transform coding, entropy coding & postprocessing Transform size & type CA Multiple VLCs Bitplane coding Arithmetic coding Post-processing Rate control Quantization Dynamic frame resizing Dynamic range reduction Streaming & error resilience Data partitioning Bitstream switching N/A N/A N/A System level Slice level partitioning SI/SP frames uniform N/A Adaptive uniform and non-uniform Y log scale N/A 8x8 float N/A N/A N/A Optional 8x8, 8x4, 4x8, 4x4 integer Y Y N/A In-the-loop deblocking Overlapped transform 4x4 integer (only +, >>) Y N/A Y (Main profile) In-the-loop deblocking
Glasgow_qcif_15fps
W MV9 H264-1ref
PSNR-Y
Kbps
Conclusions
z
z z
Software and hardware components can be developed based on SDKs or WM hardware porting kits. WM 9 provides a variety of state-of-the-art audio and video codecs for different applications. The quality of WMV-9 is competitive with H.264/AVC and arguably superior based on several independent tests, with significantly lower computational complexity. This paper explains why some of the tools unique to WMV-9 provide an intrinsic quality benefit over H.264/AVC.
Reading assignment
z
Mandatory
Sridhar Srinivasan et al., Windows Digital Media Division,
Microsoft Corporation, Windows Media Video 9: overview and applications, Signal Processing: Image Communication, Oct 2004.
Homework
7. Composite symbol represents different properties of one MB, and tries to exploit its joint occurrence probability. Bitplane coding collects the same symbol for all MBs and removes the in-between correlations. Could you think out a way to simultaneously take advantage of both?