0% found this document useful (0 votes)
82 views

Non Blocking Filter Design and Improve

This document discusses the deblocking filter algorithm used in H.264 video compression. It begins with an overview of the H.264 decoder architecture and the purpose of using an in-loop deblocking filter. It then describes the deblocking filter algorithm, including the block format, filter conventions, boundary strength analysis, and boundary strength levels. The goal of the deblocking filter is to reduce blocking artifacts while maintaining image sharpness at low bitrates. It applies an adaptive algorithm to block edges based on factors like quantization level, prediction type, and motion vectors.

Uploaded by

Bảo Ngọc Lê
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views

Non Blocking Filter Design and Improve

This document discusses the deblocking filter algorithm used in H.264 video compression. It begins with an overview of the H.264 decoder architecture and the purpose of using an in-loop deblocking filter. It then describes the deblocking filter algorithm, including the block format, filter conventions, boundary strength analysis, and boundary strength levels. The goal of the deblocking filter is to reduce blocking artifacts while maintaining image sharpness at low bitrates. It applies an adaptive algorithm to block edges based on factors like quantization level, prediction type, and motion vectors.

Uploaded by

Bảo Ngọc Lê
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Deblocking Filter For H.

264 Decoder
1st Le Bao Ngoc 2nd Vu Minh Nhat 3rd Vu Minh Duc
Student ID:20182930 Student ID:20182931 Student ID:20182911
Class: ET-E4 K63 Class: ET-E4 K63 Class: ET-E4 K63

Abstract—Important developments in the fields of image pro- • Motion compensation–(MC) reconstructs the macroblock
cessing were born efficient image transmission and compression (MB) from neighbor reference frames.
techniques allow solving a lot of bandwidth problems. Among • INTRA prediction reconstructs the macroblock from the
them, the most outstanding is the introduction of the compression
standard, H.264 is considered the best video compression stan- neighbor macroblocks in the same frame.
dard today. The H.264 compression standard allows data whether • INTER or INTRA prediction reconstructed macroblock
the video is compressed with a very high compression ratio but is added to the residues.
still meets the quality requirements image quality. The purpose • The reconstructed frame is filtered by non-blocking filter
of this thesis is to focus on research and simulation programming and the result is sent to the frame memory.
Deblocking filter algorithm in H.264 standard. This is an image
recovery algorithm for allows to improve the quality of video
images compressed according to the H.264 standard, in order
to overcome degradation of image quality due to the factors
themselves in the compression methods caused data. Deblocking
algorithms can be installed on specialized hardware use as well
as by software.
Index Terms—deblocking filter, H.264 decoder, pipelined ar-
chitecture

I. I NTRODUCTION
Most video compression standards such as: ITU- H.2631,2,
MPEG-43 and H.264 use block-based transformations to ex-
ploit spatial redundancy in frames (Transformations based on
blocks include: cosine transform discrete - DCT, quantiza-
tion and entropy encoding. . . ). Quantizing by dividing the
coefficients obtained from the DCT transformation for the
quantization coefficients do most of the coefficients with small Fig. 1. H.264/AVC Decoder Architecture
values in each block are suppressed. As a result, when the
compression ratio is high, only remaining DC coefficient and
a few other coefficients. This loses relevance and calculation
continuity of pixels at the boundary between adjacent blocks.
This phenomenon is known as blocking artifacts [1, 4-10].

TABLE I
D EBLOCKING F ILTER W ITH E ACH S TANDARD

Standard Deblocking Filter


H.261 Optional in-loop filter
MPEG-1 No filter
MPEG-2 No filter, post filter processing often used
H.263 No filter
MPEG-4 Optional in-loop filter, post filter processing suggested
H.264 Mandatory in-loop filter, post filter processing used

II. OVERVIEW O F H.264/AVC D ECODER


• Input bit stream passes first through the entropy decoding.
• The next step is the inverse quantization and inverse
Fig. 2. Q−1 and T −1 module diagram
transforms (Q−1 and T −1 ) to recompose the prediction
residues.
• The blocks are then grouped in macroblocks which is a
4x4 block matrix for luma and 2x2 matrix for each croma
component.
• Each block edge has to be filtered.
• The deblocking filter is applied to each decoded block of
a given macroblock for luma and croma samples.
• For each block, four different edges are filtered separately.
B. Filter convention

Fig. 3. Block diagram of Intra prediction architecture

III. P URPOSE O F U SING D EBLOCKING F ILTER FOR H.264


D ECODER
• The main goal of non-blocking filter as a part of the
standard was to put it inside the feedback loop in the
encoding process.
• Reduce blocking artifacts performs previous standards in Fig. 5. Filter Structure
bit-rate reduction.
• The Deblocking Filter in the H.264 standard is not only • For each block edge, the filter is applied to the pixel
a single low pass filter, but a complex decision algorithm component values perpendicular to that edge.
and filtering process with 5 different filter strength. • Pixel components in both the current (Q) and the previews
=>To maintain the sharpness of the real images while (P) block can have values changed.
eliminating the artifacts introduced by intra-frame and • Pixels already modified during a filter stage can be
inter-frame predictions at low bit-rates. modified again in a subsequent filter operation.
• The filtering algorithm is adaptive:the pixel values, the
• Mitigate the characteristic “blurring” effect of the filter
position of a block inside the macroblock, the type of
at high bit-rates.
prediction employed (inter or intra),the motion vectors,the
• The filter itself is very optimized, with very small kernels
quantization parameter.
of simple coefficients.
• The filter operates as sequential circuit so we use non- C. Boundary Strength Analysis
blocking assignment for this filter.
1.Boundary Strength(BS)
• It can be synthesized using an ASIC flow to exceed 1080p
requirements when synthesized to a XILINX Virtex II
FPGA.

IV. D EBLOCKING F ILTER A LGORITHM A NALYSIS


A. Format Of Block Used

Fig. 4. Edge positions for a given 4x4 block inside a 16x16 macroblock
Fig. 6. Oversegmented Image
• The color format is YCbCr 4:2:0 (croma-luma).
q0, q1, q2, q3

Fig. 7. Boundary strength

• The Boundary Strength for croma is the same as for the


corresponding luma block
• The quantization parameter (QP) and the pixel values are
Fig. 8. Description Of Block Boundaries
taken into account.
• The parameters α and β are QP dependent and set
For boundaries where the BS>0, one quantization-dependent
thresholds for filtering to be applied. parameter pair is used as a reference for determining when
• BS decision logic a complex set of sequential calcula-
the set of samples to be filtered is:
tions.
• BS calculation needs to be done for every LOP. α, β (1)
• The reconstructed frame is filtered by non-blocking filter
. It can be defined:
and the result is sent to the frame memory.
2.Boundary Strength Level α(x) = 0.8(2x/6 − 1), β(x) = 0.5x − 0.7 (2)
The values of them depend on QP and are related to the
TABLE II length of the filter. As the threshold values increase with QP,
C ONDITION TO DETERMINE VALUE OF BS
the edges containing detailed information about the content
will have an increased coding error (the size of the artifacts
Condition and type of blocks BS increases).
The block is intra coded and its boundary is external edge 4
The block is intra coded and its boundary is internal edge 3
No intra-coded and at least one contain coded coefficients for block 2
Block movement is greater than 1 sample luma 1
Motion compensation from different reference frames 1
The edge is not a slice then the filter can be applied 0

Changing the value of BS shows that the strongest


blocking artifacts can mainly cause by using the intra
prediction mode in the compressor and decreasing for the
prediction and motion compensation in inter-coded.
The filter is implemented for each sample line if it is satisfied
3.Sample Adaptability Level 3 conditions:
|p0 − p0| > α(IndexA) (3)
In the filter, it is quite important to be able to distinguish
between the actual contour of the image and those generated |p1 − p0| > β(IndexB) (4)
by quantizing the DCT coefficients. In order to preserve the |q1 − q0| > β(IndexB) (5)
sharpness of the image, the real edges of the image must be
preserved and unfiltered while the spurious edges need to be IndexA, IndexB values can be calculated by using offset
minimized and eliminated. values(control filter properties at the slice):
In order to classify these two cases, pixels across the
IndexA = M in(M ax(0, QP + Of f setA), 51) (6)
boundaries must be analyzed and compared. Sample
values inside two adjacent 4x4 block p3, p2, p1, p0, and IndexB = M in(M ax(0, QP + Of f setB), 51) (7)
(0:51) is meaning value interval of QP. as the luminance component sample values. The c0 cutoff is
The values of them depend on QP and are related to the set to c1+1. In this way, one does not need conditions for
length of the filter. As the threshold values increase with QP, boundaries with BS¡4, and therefore does not need to access
the edges containing detailed information about the content the sample values p2 and q2.
will have an increased coding error (the size of the artifacts
increases).
D. Filter Operation

Fig. 11. Luma and Croma block enumeration

V. P ROPOSED D EBLOCKING F ILTER A RCHITECTURE


A. Numbering And Macroblock Convention

Fig. 12. Luma and Croma block enumeration


Fig. 9. Deblocking Filter Algorithm Diagram

Fig. 13. Macroblock color diagram

B. Edge Filter
In the non-blocking filter process, the edge filter is the heart
of the process:
• The Edge Filter architecture can accept one LOP per

Fig. 10. Deblocking Filter Algorithm Diagram(Continue) cycle for Q and P blocks and produce the filtered Q’
and P’ LOP.
For chroma component filtering, only the p0 and q0 values • An entire block border will enter in the Edge Filter each
can be modified. These values are filtered in the same way four cycles (one block border is four LOPs tall).
D. Deblocking Filter Architecture
1.Design Block

Fig. 14. Edge Filter Architecture

C. Pipelined Architecture Of Filter


Pipelined edge filter can be then encapsulated in such
way only the Q input and P’ outputs are visible in Filter
Encapsulation
If an Edge Filter architecture with a series stage pipeline
can be designed, the output P’ could be connected directly to
the input Q.
Fig. 17. Design Block Of Deblocking Filter
Using this encapsulation turns the control logic simpler,
but at a cost of a small overhead
• Input Buffer: pixel data are fed into the filter through
the input buffer.
• Edge Filter: it is responsible for all filter functionality,
including the thresholds and BS calculation and filtering
itself.
• Edge Filter Encapsulation: this block encapsulates the
Fig. 15. Edge Filter Architecture edge filter connecting the Q output to the P input and
implements some additional buffers to data control.
The pipelined architecture was designed so that only one • MB Buffer: The MB buffer stores two different types
arithmetic or logic operation is to be done every stage of the of block (block for use in the external vertical border
pipeline. filtering process, blocks transposed and reordered for the
=> 11-stage pipeline only to calculate the BS, used to select horizontal edge filtering).
the correct filter,so 12 stage pipeline would be enough • Line Buffer: The line buffer stores the blocks needed to
If an entire column of luma blocks that consist of four blocks make the external horizontal border filtering in the next
stacked are processed before the next one, the first LOP of the MB line.
first Q block will be the first LOP of the first P block after 16 • Transpose: In this part of the filter, data from a 4x4 pixel
LOP cycles. block are transposed.
=> Edge Filter architecture with a 16 stage pipeline Filter Process Result:
The croma Cb and Cr, wich are half the height is stacked so A total of 256 clock cycles are needed to process an entire
they can be processed as a luma block. 4:2:0 macroblock (24 blocks):
=> Makes the implementation of the control logic much -If data is not available at the beginning of a 256 cycle
simpler. operation, a bubble is inserted in the pipeline.
-The stop cycle is also a 256 cycle operation.
2.Filter Module

Fig. 18. Filter Module


Fig. 16. 16 Stage Pipeline Edge Filter
E. Timing Analysis

Fig. 22. Input Block Sequence In Input Buffer Of The Deblocking Filter

Fig. 19. Architecture of 8 pixels parallel-in parallel-out filter Fig. 23. Deblocking Filter Output Block Sequence

Fig. 24. Vertical Edge Timing Analysis

Fig. 20. Datapath Of Strong Filter

Fig. 25. Horizontal Edge Timing Analysis

VI. C++ C ODE


Fig. 21. Datapath Of Normal Filter
We implement algorithm of deblocking filter with sample
of image after through previous stage of H.264 decoder.
Algorithm 1: Vertical Edge Filter

Algorithm 5: Deblocking FIlter of Image(Video)

Algorithm 2: Horizontal Edge Filter

∗Result:

Algorithm 3: Deblocking Filter

Fig. 26. Input Image

Algorithm 4: YUV
∗QP = 18, searching window size is 8, the threshold
parameters are set equal to 4. Inter prediction mode with
variable dimensions, using Intra mode for the first frame:

Fig. 27. Output Image


∗QP = 48, searching window size is 8, the threshold
parameters are set equal to 4. Inter prediction mode with
VII. I MPLEMENTATION A ND R ESULT variable dimensions, using Intra mode for the first frame:

The validated behavioral design was then synthesized, the


post place and route was validated and performance results
Fig. 28. Waveform of simple deblocking filter were obtained for a Xilinx Artix7 FPGA.In this project, we
synthesize on chip xc7a100tcsg324-3.
The architecture presented was described in VHDL and
implenment on ModelSim. About 4500 lines of code were
written. The design behavior was validated by simulation using
some testbench files and data extracted from the reference
software using some public domain video sequences and
image sequences.
In simulation process, QP parameters are truncated using
only some typical values including: QP = 0, 1, 2, 3, 4, 5, 6,
7, 8, 9, 10, 11, 12, 18, 24, 36, 42, 48, 51.
Threshold parameters: AlphaMB, Alpha8x16/16x8,
Alpha8x8,Alpha4x8, Alpha4x4 define thresholds for dividing
MB into blocks with variable sizes of 4x4, 4x8, 8x4, 8x8,
8x16, 16x8 or 16x16.
The evaluation results are based on the PSNR(Peak signal-
to-noise ratio) value of the frames when using or not using
the filter of two different video files implementing algorithms
according to H.264 standard with different quantization
parameters.
∗QP = 0, searching window size is 8, the threshold
parameters are set equal to 4. Inter prediction mode with
variable dimensions, using Intra mode for the first frame:

Fig. 29. Synthesis Overview

Extracted data before and after the Deblocking Filter process


was used to ensure the correctness of the implemented VIII. ACKNOWLEDGMENT
architecture. Table below presents the number of Xilinx LUTs
and BRAMs used to synthesize the developed Deblocking The project presented the basic concepts, principles and
Filter. techniques of video compression in the H.264 compression
standard.
Successfully deployed a number of key algorithms in the
compression standard (DCT, IDCT transformation methods;
quantization operations; Intra/Inter prediction methods...) as
the basic for building and deploying main algorithm of this
project - deblocking algorithm in video processing according
to H.264 standard.
The obtained results help the us get a more intuitive view
of a number of problems in video compression techniques
such as: assessing the correlation between compression ratio
Fig. 30. Synthesis Result Of SiteLogic and the quality of compressed video and the role of filters
in video compression. improving the quality of the video
compression.
Viewing a number of requirements that need to be resolved
in the cases of specific video series... as the basic for
the application of this compression standard in practical
applications.
Fig. 31. Synthesis Result Of IO And GT Specific

IX. I MPROVEMENT I N T HE F UTURE

In the future, the direction of development and expansion


of the project will focus on perfecting the algorithms of the
compression standard such as:
• Future work will address the support for high profiles,
scalability and multi-view fundamendments of the H.264
standard, which require small modification in the BS
decision logic and in the data width for pixel values.
• Building full compression and decompression algorithms,
improving algorithms to reduce computational complex-
Fig. 32. RTL Schematic On Vivado
ity allows program integration in applications with limited
processing capacity.
Our group try to improve my design of deblocking filter
• Apply compression standards for building and developing
more complied to pipelined architecture and then synthesize
practical applications such as:applications of surveillance
design on chip xc7a200tfbg484-3
cameras integrated in handheld devices (such as mo-
bile phones...), applied to build software for bandwidth-
limited applications like:internet video.... the basic of
actual requirements, new algorithms will be built and
developed

Fig. 33. Synthesis Result Of High Efficiency Of Deblocking Filter


R EFERENCES
Deblocking filter architecture was designed around a pipelined
[1] Draft ITU-T Recommendation and Final Draft international Standard of
edge filter. The filter requires eight samples for the processing. Joint Video Specification (ITU-T Rec. H.264/ISO/IEC 14496-10 AVC)
So, the filter architecture was designed to be four samples (March 2003).
wide. Due to data dependencies as well as the fact that the [2] List, P., Joch, A., Lainema, J., Bjotergaard, G., Karczewicz, M.: Adap-
tative deblocking filter. IEE trans. Circuits Syst. Video Technol. 13,
samples need to go through the edge filter twice (for vertical 614–619 (2003).
and horizontal filtering), the processing rate of the deblocking [3] Puri, A., Chen, X., Luthra, A.: Video coding using the H.264/MPEG-4
filter is 1.5 samples per clock cycle. Achieving a frequency AVC compression standard. Signal Processing: Image Communication
of 175 MHz, the architecture is able to process up to 227.4 19, 793–849 (2004).
[4] Sima, M., Zhou, Y., Zhang, W.: An Efficient Architecture for Adapta-
million samples per second. tive Deblocking Filter of H.264/AVC Video Coding. IEEE Trans. On
Consumer Electronics 50(1), 292–296 (2004).
[5] Lin, H.-Y., Yang, J.-J., Liu, B.-D., Yang, J.-F.: Efficient deblocking
filter architecture for H.264 video coders. In: 2006 IEEE International
Symposium on Circuits and Systems, ISCAS 2006, May 21–24 (2006)
6. Khurana, G., Kassim, A.A., Chua, T.-P.: M.B.A Mi Pipelined hard-
ware implementation of in-loop deblocking filter in H.264/AVC. IEEE
Transactions on Consumer Electronics 52(I.2), 536–540 (2006).
[6] L. Agostini, A. Azevedo, V. Rosa, E. Berriel, T. Santos, S. Bampi,
A. Susin. Design of a H.264/AVC main profile decoder for HDTV. In
Proceedings of IEEE International Conference on Field Programmable
Logic and Applications, Madrid, pages 501-506, 2006.
[7] Chen J-W, Lin C-C, Guo J-I, Wang J-S (2006) Low complexity
architecture design of H.264 predictive pixel compensator for HDTV
application. In: IEEE international conference on acoustics, speech and
signal processing, vol 3, pp 932–935.
[8] Huang C-T, Tikekar M, Chandrakasan AP (2014) Memory-hierarchical
and mode-adaptive HEVC intra prediction architecture for quad full HD
video decoding. IEEE Trans VLSI Syst.
[9] Finchelstein DF (2009) Low-power techniques for video decoding.
Thesis, Massachusetts Institute of Technology

You might also like