0% found this document useful (0 votes)

14 views10 pages

Hardware-Efficient_2D-DCT_IDCT_Architecture_for_Portable_HEVC-Compliant_Devices

HEVC

Uploaded by

Bhavya Gowda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views10 pages

Hardware-Efficient_2D-DCT_IDCT_Architecture_for_Portable_HEVC-Compliant_Devices

HEVC

Uploaded by

Bhavya Gowda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, VOL. 66, NO.

3, AUGUST 2020 203

Hardware-Efficient 2D-DCT/IDCT Architecture for

Portable HEVC-Compliant Devices
Ashish Singhadia , Meghan Mamillapalli, and Indrajit Chakrabarti, Member, IEEE

Abstract—Low power hardware acceleration cores for inte- electronic (CE) devices [4]–[7]. Without hardware accelera-
gration into real-time High Efficiency Video Coding (HEVC) tion cores, the CPU of CE devices is supposed to execute
codec for smartphones, tablets, camcorders, and televisions are in trillions of operations in a second for image and video pro-
great demand. This need motivates one for an efficient realization
of Discrete Cosine Transform (DCT) and Inverse-DCT (IDCT) cessing applications [8]. This overwhelming usage of the CPU
for HEVC. This paper presents an algorithm to calculate the will result in quick exhaustion of battery power, which will not
required minimum number of low-frequency DCT-output/IDCT- be appreciated by the consumers of portable CE devices. Thus,
input coefficients for 4, 8, 16, and 32-point DCT/IDCT in HEVC, power efficient hardware accelerators, which are intended to be
such that there is a slight decrease in peak-signal-to-noise-ratio included into real-time HEVC codec for portable CE devices,
(<0.15 decibel) and a minor increment in bitrate (<1.5%) as
compared to the reference HEVC-Test-Model (HM) Software. are in high demand [2], [8].
However, the encoding time gets reduced at most by 17.95% Nowadays, CE manufacturers have started integrating next-
for Class-A type sequences, while reporting mean-squared-error generation HEVC codec into CE products such as smart-
and structural-similarity of 1.42 and 0.9913, respectively for 4K phones [4], tablets [5], camcorders [6], and televisions [7], etc.
ultra-high-definition videos. Moreover, HEVC-compliant com- Moreover, there have been several instances of research in devel-
putationally efficient architectures are introduced for n-point
DCT/IDCT. The presented flexible Transpose Memory architec- oping low-power hardware accelerators for HEVC codecs [9]–
ture uses only sixteen random-access-memories to support all [14], HEVC encoders [15], and HEVC decoders [16] for
transform-unit sizes in HEVC. The proposed two-dimensional consumer electronics applications. Discrete Cosine Transform
DCT/IDCT architecture can process up to 288@4K frames- (DCT) and Inverse DCT (IDCT) [17] play a very impor-
per-second, and it consumes the minimum power, energy, and tant role in image and video applications. It is employed in
area of 11.23 milliwatts, 2.34 picojoules, and 120 kilo-gate-
equivalents, respectively. Such design with low power, area, and many image and video compression standards due to its good
energy features can be included in a real-time HEVC codec for energy compaction property [17], [18]. Several fast algorithms
HEVC-compliant consumer electronic devices. introduced in the existing literature [9]–[14], [18]–[24] can
Index Terms—HEVC, DCT, IDCT, hardware implementation, considerably decrease the computational requirement of DCT.
low power consumption. Scalable architectures for approximate DCT in HEVC have
been introduced by Jridi and Meher [19]. Moreover, they also
presented a reconfigurable architecture based on MCM algo-
rithm to support DCT/IDCT for the HEVC standard [19]. A
reconfigurable 2D-DCT hardware requiring 96 multipliers for
I. I NTRODUCTION
HEVC was presented by Zheng et al. [23]. The 2D DCT designs
IGH Efficiency Video Coding (HEVC) [1], [2] is a video
H compression standard that can deliver better performance
than the previous model, i.e., H.264[3]. It demonstrates the
as presented in the state-of-the-art [9]–[11], [25] required two
DCT hardware blocks for performing 2D DCT operation, and
processed 32 samples for all Transform Unit (TU) sizes in a
same video quality at approximately 50% of the bitrate clock cycle. The DCT/IDCT architecture by Budagavi et al. [26]
of H.264 [3], with resolutions supporting as high as 8K presented a reduction of 43-45% in area as compared to the
(7680×4320). Hardware accelerated designs for HEVC codec separate implementation of forward and inverse core trans-
have become highly essential to increase the efficiency of form. An optimized 32-point DCT architecture for HEVC
central processing unit (CPU) in HEVC-compliant consumer was proposed by Chatterjee and Sarawadekar [21]. The 2D
DCT hardware by Zhu et al. [27] needs multipliers to calcu-
Manuscript received January 4, 2020; revised April 5, 2020 and May 11, late all DCT coefficients, while the 2D DCT architecture as
2020; accepted June 28, 2020. Date of publication July 1, 2020; date of current presented by Zhao et al. [28] computes DCT of larger TUs
version July 28, 2020. This work was supported by the Ministry of Electronics
and Information Technology (MeitY), Government of India, through the Chip using smaller ones.
to System (C2S) Program. (Corresponding author: Ashish Singhadia.) While the HEVC transform [1] of multiple sizes improves
Ashish Singhadia and Indrajit Chakrabarti are with the Department the compression performance of video codecs, it also increases
of Electronics and Electrical Communication Engineering, Indian Institute
of Technology Kharagpur, Kharagpur 721302, India (e-mail: singha- the complexity of hardware implementation. With the devel-
[email protected]; [email protected]). opment of CE devices such as tablets [5], multimedia smart-
Meghan Mamillapalli is with the Department of Engineering, phones supporting real-time video calling and conferenc-
Qualcomm India Private Ltd., Bengaluru 560037, India (e-mail:
[email protected]). ing [4], digital set-top boxes with a feature of recording
Digital Object Identifier 10.1109/TCE.2020.3006213 digital videos, televisions [7] etc., it is strongly desired for
1558-4127
c 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on January 02,2024 at 07:03:52 UTC from IEEE Xplore. Restrictions apply.
204 IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, VOL. 66, NO. 3, AUGUST 2020

a device to be able to support both capture and playback Moreover, unlike the designs shown in the state-of-the-art
of the video. Consequently, both DCT and IDCT have to be [9]–[14] for CE applications, the proposed hardware can per-
implemented in the same device. Therefore, the algorithms and form 2D 4-/8-/16-/32-point DCT/IDCT at the minimum hard-
architectures that can significantly reduce power, energy and ware requirement. Furthermore, the proposed design reports at
area of the overall hardware are always in high demand [8]. least 78% less area consumption than the separate implemen-
Consequently, low power ASIC design for DCT/IDCT oper- tation of original DCT and IDCT architectures for HEVC [26].
ations for HEVC [1] will certainly lead to improved power Additionally, the proposed ASIC design seems to fully satisfy
efficiency and hardware acceleration of a real-time HEVC the power and area requirements of the state-of-the-art imple-
codec [29] for HEVC-compliant CE devices [4]–[7]. mentation of standard HEVC chips [15], [16], [29] designed
The main contributions of the work presented in this paper for portable CE devices. Therefore, the ASIC implementa-
are enumerated as follows. tion of this featured architecture for HEVC makes it highly
1) This work presents an algorithm to find the mini- relevant to consumer electronics society [2]. Consequently,
mum required low-frequency DCT-output/IDCT-input the implementation of this featured design will enable the
coefficients in n-point DCT/IDCT for the Transform- consumers to enjoy the multiple benefits of low-area, high-
and-Quantization (TQ) block of HEVC, where n = speed, and extended-battery-life portable HEVC-compliant CE
4, 8, 16, and 32. It significantly decreases the computa- products [4]–[6].
tional overhead and encoding time (at most by 17.95%)
of HEVC-Test-Model (HM) Software [30]. Moreover,
the calculated Bjφntegaard Delta (BD) PSNR (BD- II. P ROPOSED A LGORITHM
PSNR) [31], BD-Rate [31], and Rate-Distortion (RD) In this section, a new algorithm is presented, namely,
comparison demonstrate better coding efficiency of the Min_Num_Transform_Coeff, to calculate XnOPT . Here,
algorithm as compared to the existing algorithms [9], XnOPT represents the minimum required number of DCT-
[10], [12], [19], [24], [32]. output/IDCT-input coefficients to compute n-point DCT/IDCT
2) The proposed approximate integer 1D architectures for for the Transform-and-Quantization (TQ) block of HEVC,
4-, 8-, 16-, 32-, and 4-/8-/16-/32-point DCT/IDCT are where n = 4, 8, 16, and 32. The algorithm determines the
hardware efficient. The architectures report a consid- number of minimum required DCT-output/IDCT-input low-
erable decrease in addition (76-92%) and shift (86- frequency coefficients needed for the calculation of n-point
95%) operations for all TUs in HEVC, as com- DCT/IDCT in HEVC-Test-Model (HM) Software [30] so
pared to the original Separate DCT+IDCT archi- that the decrease in PSNR (dB) and the increment in bitrate
tecture [26]. The proposed designs are HEVC- (Mbps) is less than 0.15dB and 1.5%, respectively. When
compliant and use less arithmetic operations while the algorithm is extended to HEVC-test-Model (HM), it
compared to the designs reported in the state-of-the-art significantly decreases the computational overhead of the
[9], [13], [14], [21], [23], [26]. reference algorithm [1]. Therefore, the total encoding time
3) This paper also proposes a flexible Transpose Memory of HM Software [30] gets reduced at most by 17.95% for
(TM) architecture to support all TU sizes in HEVC. The Class A type video sequences. In the algorithm, the threshold
proposed TM architecture uses only 16 RAMs to trans- for PSNR (Pth ) and bitrate (Rth ) are taken only 0.15 dB less
pose input of 32 × 32 samples while compared to the 32 and 1.015 times than Pmax and Rmin , respectively. Here, Pmax
RAMs which are presented in the designs of [9]–[11], and Rmin represent the maximum PSNR and minimum bitrate
[21], [23]–[25]. reported by HM Software [30] for a video sequence (I) con-
4) The 2D DCT/IDCT architecture introduced in this paper cerning the original reference algorithm [1], i.e., Xn = Xmax .
consumes lower power (11.23 mW), area (120 KGates) Moreover, Pav and Rav represent the average values of PSNR
and energy (2.34 pJ) while compared to the reported and bitrate, respectively. Here, fP (Xn ) and fR (Xn ) represent
state-of-the-art designs [9]–[11], [23]–[26]. It can cal- the values of PSNR and bitrate, respectively calculated for
culate 2D forward/inverse DCT for all the TU sizes n-point DCT/IDCT at X = Xn by HM Software [30]. The
in HEVC. The proposed architecture requires only reference quantization parameters (QP) considered for the
one DCT/IDCT module while compared to the two algorithm are 22, 27, 32 and 37.
DCT hardware blocks as reported in the architectures In the algorithm, the number of DCT-output/IDCT-input
[9]–[11], [25]. coefficients (X) required for n-point DCT/IDCT are var-
The rest of the paper unfolds as follows. Section II discusses ied simultaneously in the range of Xmax to Xmin in HM
the proposed algorithm. The 1D DCT/IDCT architectures are Software [30]. While calculating the PSNR and bitrate for
presented in Section III. The flexible TM architecture is dis- 32-point DCT/IDCT, the values of Xmax and Xmin are con-
cussed in Section IV. The 2D DCT/IDCT architecture with sidered to be 32 and zero, respectively. So, the number
its implementation and hardware comparison are presented in of DCT-output/IDCT-input coefficients are varied simultane-
Section V. Finally, the paper is concluded in Section VI. ously from 32 − 0 for 32-point DCT/IDCT. The remaining
The proposed hardware-efficient 2D DCT/IDCT architec- DCT-output/IDCT-input coefficients which are not considered
ture consumes low power, low energy, and low area as while computing n-point DCT/IDCT are made zero in HM
compared to the relevant state-of-the-art architectures [9]–[11], Software [30]. While varying the coefficients from Xmax to
[13], [14] presented in the realm of consumer electronics. Xmin for 32-point DCT/IDCT, the remaining algorithms of

Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on January 02,2024 at 07:03:52 UTC from IEEE Xplore. Restrictions apply.
SINGHADIA et al.: HARDWARE-EFFICIENT 2D-DCT/IDCT ARCHITECTURE FOR PORTABLE HEVC-COMPLIANT DEVICES 205

Algorithm 1 Min_Num_Transform_Coeff
Input: I, PTH , RTH , Rmin , Pmax
Output: X4OPT , X8OPT , X16OPT , X32OPT
1 I ← Test Video Sequence;
2 Pth ← (Pmax − 0.15);
3 Rth ← 1.015 ∗ Rmin ;

4 Initialize Init = PRmax
min
, Var = 0;
/* Init, Var are data variables */
5 for (X4 = 4, X4 ≥ 0, X4 − −) do
Fig. 1. Approximate integer 4-pt DCT/IDCT architecture.
6 for (X8 = 8, X8 ≥ 0, X8 − −) do
7 for (X16 = 16, X16 ≥ 0, X16 − −) do
8 for (X32 = 32, X32 ≥ 0, X32 − −) do
9 Pav ← Average{fP (X4 ), fP (X8 ), fP (X16 ),
fP (X32 )};
10 Rav ← Average{fR (X4 ), fR (X8 ), fR (X16 ),
fR (X32 )};

11 Var ← PRav av
;
12 if (Var < Init && Pav ≥ PTH && Fig. 2. Approximate integer 8-pt DCT/IDCT architecture.
Rav ≤ RTH ) then
13 Init ← Var;
14 X4OPT ← X4 ;
15 X8OPT ← X8 ;
16 X16OPT ← X16 ;
17 X32OPT ← X32 ;
18 end
19 end
20 end
21 end
22 end
Fig. 3. Approximate integer 16-point DCT/IDCT architecture.

4-, 8-, and 16-point DCT/IDCT are kept unaltered. The same 4-point DCT/IDCT depending upon the select line (T) of mul-
technique is also followed while varying the coefficients for 4-, tiplexers (MUXes) as depicted in Fig. 1. It requires twelve
8-, and 16-point DCT/IDCT. As the input video sequences (I) addition and six shift operations to perform DCT/IDCT. The
are varied, it is observed that on an average the minimum num- MUXes, which are presented in Fig. 1, are also used and fol-
ber of required DCT-output/IDCT-input coefficients for 4-, 8-, lowed in Figures 2-5. The red dashed lines in Fig. 1 show
16-, and 32-point DCT/IDCT are 4, 3, 6, and 7, respectively, multiplication with −1. Figures 2-5 also follow the same
i.e., X4OPT = 4, X8OPT = 3, X16OPT = 6, and X32OPT = 7. conventions as used in Fig. 1.

III. P ROPOSED A PPROXIMATE I NTEGER 1D−DCT/IDCT

B. Approximate Integer 8-Point DCT/IDCT Architecture
A RCHITECTURES FOR HEVC
The proposed approximate integer 8-point DCT/IDCT archi-
The HEVC standard [1] makes use of 31 coefficients repre-
tecture is presented in Fig. 2. The architecture can compute
sented by [C] to create the HEVC Transform matrix [1]. These
1D 8-point DCT/IDCT depending upon the select line (T) of
coefficients are borrowed from [1] and are given as follows.
MUXes as depicted in Fig. 2. It requires 50 addition opera-
[C] = {a, b, c, · · · x, y, z, A, B, · · · , E} (1) tions and 26 hardwired shift with HEVC coefficients given by
{a − g} to perform DCT/IDCT. In Fig 2, the term (g, β.f , e,
The numerical values of the HEVC coefficients in (1)
β.d) indicates multiplication of the coefficients g, β.f , e, and
are presented in (2). Hardware realization of these HEVC
β.d to the first, the second, the third, and the fourth line to
coefficients require only addition and shift operations.
the adder, respectively. Figures 3-5 also follow this conven-
[C] = {64, 83, 36, 89, 75, 50, 18, 90, 87, 80, 70, 57, 43, 25, tion. The variables c(3) − c(7) as depicted in Fig. 2 can be
9, 90, 90, 88, 85, 82, 78, 73, 67, 61, 54, 46, 38, 31, 22, 13, 4} obtained as in (3).
(2) c(3) = f .x(1); c(4) = c.x(2); c(5) = e.x(1);
c(6) = a.x(0); c(7) = d.x(1); (3)
A. Approximate Integer 4-Point DCT/IDCT Architecture
The proposed approximate integer 4-point DCT/IDCT archi- Here, the numerical values of HEVC coefficients (a, c, d, e, f )
tecture is presented in Fig. 1. The architecture can compute are given in (2).
Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on January 02,2024 at 07:03:52 UTC from IEEE Xplore. Restrictions apply.
206 IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, VOL. 66, NO. 3, AUGUST 2020

shift operations to perform DCT/IDCT. The shift operations

presented here do not need extra hardware and are hardwired.
The variables (c(10), c(12), . . . , c(28), c(30)) as depicted in
Fig. 4 can be obtained as in (5).

c(16) = a.x(0); c(12) = f .x(4);

c(20) = e.x(4); c(28) = d.x(4);
c(10) = m.x(2) − h.x(6); c(14) = l.x(2) − j.x(6);
c(18) = k.x(2) − m.x(6); c(22) = j.x(2) + o.x(6);
c(26) = i.x(2) + l.x(6); c(30) = h.x(2) + i.x(6); (5)

The variables (c(7), c(9), . . . , c(29), c(31)) as depicted in

Fig. 4. Approximate integer 32-point DCT/IDCT architecture. Fig. 4 can be obtained as in (6).

c(7) = B.x(1) − u.x(3) − q.x(5);

c(9) = A.x(1) − r.x(3) + v.x(5);
c(11) = z.x(1) − p.x(3) + A.x(5);
c(13) = y.x(1) − s.x(3) − E.x(5);
c(15) = x.x(1) − v.x(3) − z.x(5);
c(17) = w.x(1) − y.x(3) − u.x(5);
c(19) = v.x(1) − B.x(3) − p.x(5);
c(21) = u.x(1) − E.x(3) − t.x(5);
c(23) = t.x(1) + C.x(3) − y.x(5);
c(25) = s.x(1) + z.x(3) − D.x(5);
c(27) = r.x(1) + w.x(3) + B.x(5);
Fig. 5. Approximate integer 4-/8-/16-/32-point DCT/IDCT architecture.
c(29) = q.x(1) + t.x(3) + w.x(5);
c(31) = p.x(1) + q.x(3) + r.x(5); (6)
C. Approximate Integer 16-Point DCT/IDCT Architecture
The proposed approximate integer 16-point DCT/IDCT In (5) and (6), the numerical values of HEVC coefficients
architecture is presented in Fig. 3. which are denoted by (a − E) are presented in (2).
Here, the architecture can compute 16-point DCT/IDCT
depending upon the select line (T) of MUXes as depicted in E. Approximate Integer 4-/8-/16-/32-Point DCT/IDCT
Fig. 3. It requires 183 addition and 128 shift operations to Architecture
perform DCT/IDCT. The coefficients c(6)-c(15) as depicted in
The proposed approximate integer 4-/8-/16-/32-point
Fig. 3 can be obtained as in (4). The shift operations presented
DCT/IDCT architecture is presented in Fig. 5. The archi-
here do not need extra hardware and are hardwired.
tecture can compute 4-/8-/16-/32-point DCT/IDCT depend-
c(6) = f .x(2); c(8) = a.x(0); c(10) = e.x(2); ing upon the select line (T) of MUXes as depicted
c(12) = c.x(4); c(14) = d.x(2); in Fig. 5. It requires 459 addition and 239 shift oper-
ations with HEVC coefficients (a-E) to perform the
c(7) = l.x(1) − j.x(3) − n.x(5); DCT/IDCT. The shift operations presented here do not need
c(9) = k.x(1) − m.x(3) − i.x(5); extra hardware and are realized by rewiring. The vari-
c(11) = j.x(1) + o.x(3) − k.x(5); ables (c(12), c(20), c(28)), (c(7), c(9), . . . , c(29), c(31)), and
c(13) = i.x(1) + l.x(3) + o.x(5); (c(14), c(18), c(22), c(26), c(30)) as depicted in Fig. 5 can be
obtained as given in (5), (6), and (7), respectively.
c(15) = h.x(1) + i.x(3) + j.x(5); (4)
Here, the numerical values of HEVC coefficients (a, . . . , o) c(14) = l.x(2) − j.x(6) − n.x(10);
are given in (2). c(18) = k.x(2) − m.x(6) − i.x(10);
c(22) = j.x(2) + o.x(6) − k.x(10);
D. Approximate Integer 32-Point DCT/IDCT Architecture c(26) = i.x(2) + l.x(6) + o.x(10);
The proposed approximate integer 32-point DCT/IDCT c(30) = h.x(2) + i.x(6) − j.x(10); (7)
architecture is presented in Fig. 4. The architecture can com-
pute 32-point DCT/IDCT depending upon the select line (T) of In Fig. 5, the numerical values of HEVC coefficients denoted
MUXes as depicted in Fig. 4. It requires 415 addition and 214 by (a − E) are presented in (2).

TABLE I
R EQUIRED A RITHMETIC O PERATIONS FOR A LL TU S IZES

TABLE II
R ESULTS OF HM S OFTWARE U NDER L OW D ELAY P, A LL I NTRA M AIN , AND R ANDOM ACCESS C ONFIGURATIONS

F. Comparison of Proposed 1D DCT/IDCT Architectures in Table II. Here, twelve video sequences of different classes
Table I shows a comparison of the proposed approximate are considered to check the conformity of the proposed designs
integer 1D DCT/IDCT architectures as presented in Figures 1- to HEVC. Due to the decreased computational overhead of
5 regarding addition and shifting operations for all TU sizes, HEVC transform, Encoding Time (ET) of HM Software gets
i.e., 4×4, 8×8, 16×16, and 32×32, as specified in HEVC. The reduced at most by 17.95% for Class A type sequences. The
arithmetic operations required for all the TUs with reference reference QP values considered here are 22, 27, 32, and 37.
to the original DCT and IDCT algorithms [17] were presented In Table II, BD-Rate [31] and BD-PSNR [31] show aver-
by Kalali et al. [9], [10]. The computational requirement of age increase in bitrate (%) and difference in PSNR (dB),
the original IDCT algorithm for all the TU sizes in HEVC respectively at the four reference QPs concerning original
were shown by Budagavi et al. [26]. As depicted in Table I, algorithm [33]. Moreover, Average represents average
the architecture with Separate DCT and IDCT modules [26] change in Encoding Time (ET) in % for the four reference
requires sum of the arithmetic operations of original DCT and QPs. It is calculated as follows.
IDCT algorithm, which were presented in the research publica- Final Value − Reference Value
tions of Kalali et al. [9], [10]. Table I presents the requirements Average = × 100% (8)
Reference Value
of the proposed architectures concerning addition and shift-
ing operations. It also shows the reduction (%) of arithmetic In Table II, it is worthwhile to note that the proposed designs
operations in comparison to that of the Separate DCT+IDCT report the maximum reduction of 17.95% in encoding time
architecture [26]. The shift operations required for all TU sizes (Avg. ET) for Traffic video sequence under Low-Delay-P
in the proposed DCT/IDCT designs as shown in Table I do configuration as compared to the reference algorithm [33] in
not need extra hardware and are realized by rewiring. HM software [30].
The results presented in Table II are also compared with
the technique [12] for twelve video sequences under Low-
G. Results of HEVC-Test-Model (HM) Software Delay-P, All-Intra-Main, and Random-Access configurations.
The proposed 1D DCT/IDCT architectures as presented In Table II, it can be seen that the proposed methodology
in Figures 1-5 are implemented in HM Software [30]. The reports higher savings in encoding-time, while reporting com-
results of the proposed algorithm under Low-Delay-P, All- parable BD-Rate and BD-PSNR as compared to those of the
Intra-Main, and Random Access configurations are presented technique reported in the work [12]. Moreover, the proposed

Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on January 02,2024 at 07:03:52 UTC from IEEE Xplore. Restrictions apply.
208 IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, VOL. 66, NO. 3, AUGUST 2020

Fig. 8. Transpose Memory architecture of 2D 4-/8-/16-/32-point DCT/IDCT.

IV. T RANSPOSE M EMORY

The proposed Transpose Memory (TM) architecture is
presented in Fig. 8. It has a flexibility to support 4 × 4, 8 × 8,
16 × 16, and 32 × 32 input coefficient matrices. Moreover, it
can transpose 32 × 32 input samples by using only 16 dual-
port RAMs of size 64 × 16 bits each. Moreover, it needs
Fig. 6. Comparison of RD Curve for multiple Class Sequences with HM
Software [30]. 2, 4, and 8 dual-port RAMs to transpose 4 × 4, 8 × 8, and
16 × 16 input samples, respectively. In Fig. 8, the select lines
E8, E16, and E32 of MUXes determine the size of TU under
process. With their support, the TM architecture can produce
output in 4, 8, 16, and 32 clock cycles for 4 × 4, 8 × 8,
16 × 16, and 32 × 32 TU sizes, respectively. The select lines
Sid 0−Sid 31, Sim 0−Sim 31, Sod 0−Sod 31, and Som 0−Som 31 help
the architecture to transpose the input samples of all TU sizes
in HEVC. The sequences generated for the selection lines of
the MUXes/DEMUXes by Pattern Generator block with the
help of MOD 4, 8, 16, and 32 counter depend on the select
line (S), which determines the size of TU under operation. The
Fig. 7. Comparison of frames for Jockey 4K (3840 × 2160) Sequence. TM architecture requires only sixteen RAMs as opposed to the
use of 32 RAMs in the state-of-the-art architectures [9]–[11],
[21], [23]–[25].
algorithm reports a lower increase in bitrate percentage as The dual-port RAM module, as shown in Fig. 8, is conceptu-
compared to the existing state-of-the-art techniques [9], [10], ally divided into two single port RAMs as Memory Block-A/B.
[19], [24], [32]. Furthermore, BD-Rate and BD-PSNR reported The presented TM architecture stores the samples diagonally
by Jridi et al. [19], [32] indicate a higher loss in coding effi- into the RAMs. After performing the row-wise transform oper-
ciency when compared to our methodology. Additionally, the ation, the 16 RAMs store coefficients diagonally in the first
methods reported by Kalali et al. [9], [10] show a higher 32 clock cycles, and then the samples are retrieved from the
increase in BD-Rate for Class B and Class A type sequences. RAMs to reconfigure the inputs of 1D DCT/IDCT processor
Therefore, it is worthwhile to note that the proposed algo- to conduct the column-wise operation. The MUXes and the
rithm results in better coding efficiency while compared to DEMUXes attached at the input and output ports of TM archi-
the state-of-the-art techniques [9], [10], [12], [19], [24], [32]. tecture help the 16 RAMs to store input coefficients diagonally
Rate-Distortion (RD) performance comparison curves for at the desired locations, and generate coefficients in transposed
Low-Delay-P configuration for four different classes sequence form of the input coefficient arrangement.
is shown in Fig. 6. The curves for the proposed algorithm are The method to transpose input coefficients is explained
comparable to HM software [30]. A comparison of frames for by an example of 8×8 coefficient input matrix as given in
Jockey sequence of resolution 4K (3840 × 2160) for All-Intra- Fig. 9(a). The input samples (0 − 63) as presented in Fig. 9(a)
Main configuration is shown in Fig. 7 concerning the reference are processed column-wise. These samples are stored diago-
and proposed algorithm for the four reference QPs. For nally in the four dual port RAMs or eight Memory Blocks
All-Intra-Main Configuration at QP = 27, the proposed algo- (B0-B7) of size 8 × 16 bits each as depicted in Fig. 9(b).
rithm reports Mean Square Error (MSE), Structural Similarity Moreover, the arrangement of 8 × 8 input coefficients at the
(SSIM), and 2D Correlation Coefficient (Corr2) equal to 1.42, desired locations in the eight Memory Blocks is depicted in
0.9913, and 0.9993, respectively. Fig. 9(b).

Fig. 9. Addressing logic for input section of TM. Fig. 10. Addressing logic for output section of TM.

Figure 9(c) shows the generated sequences for the select

lines (Sid 0 − Sid 7) of input DEMUXes. The generated
sequences for the select lines (Sim 0 − Sim 7) of input MUXes
is given in Fig. 9(d), which makes way for the input samples
to be stored diagonally in the Memory Blocks. Figure 10(a)
displays the reading addresses (RD0-RD7) generated by a
MOD-8 counter for the eight memory blocks. The sequences
for the select lines (Sod 0 − Sod 7) of DEMUXes at the out-
put end of TM architecture are presented in Fig. 10(b). The
sequences for the select lines (Som 0 − Som 7) of MUXes at the
output end of TM are shown in Fig. 10(c). The final trans-
posed output coefficients of the TM architecture for the 8 × 8
input samples are shown in Fig. 10(d).

V. P ROPOSED A PPROXIMATE I NTEGER 2D

4-/8-/16-/32-P OINT DCT/IDCT A RCHITECTURE
The proposed 2D 4-/8-/16-/32-point DCT/IDCT architec- Fig. 11. Proposed architecture of approximate integer 2D 4-/8-/16-/32-point
ture is shown in Fig. 11. It requires only one DCT/IDCT DCT/IDCT.
module to compute 2D DCT/IDCT for all TU sizes while
compared to two DCT processors as presented in the state-
of-the-art designs [9]–[11], [25]. The DCT/IDCT processor input terminals of the DCT/IDCT processor. In Fig. 12, Dm (N)
generates 32 DCT/IDCT coefficients depending on the select represents the N-th output line from the m-th DEMUX (Dm ).
lines (S and T) of MUXes. The lines ‘S’ and ‘T’ allow the Therefore, D8 (1) represents that the input is received from the
architecture to select the size (4-/8-/16-/32-point) and type first output line of the eighth DEMUX (D8 ). Here, the select
(DCT/IDCT) of the transform, respectively. This convention lines ‘S’ and ‘T’ define the size of TU, i.e., 4×4, 8×8, 16×16,
is also followed in the Input and Output sequencing archi- and 32 × 32, and type, i.e., DCT/IDCT, of the transform
tectures as presented in Figures 12 and 13, respectively. The respectively. The Output sequencing architecture is presented
select lines (CP and RP) of MUXes and DEMUXes allow in Fig. 13. It allows the output samples of DCT/IDCT pro-
the 2D architecture to perform column and row processing of cessor to be sent to the desired output lines of the proposed
the input samples as depicted in Fig. 11. The Row/Column 2D DCT/IDCT architecture. To keep the input and output
Clipper clips input samples to 16-bits each. coefficients in sequence, the Input- and Output-Sequencing
The Input Sequencing architecture is presented in Fig. 12. architecture as presented in Figures 12 and 13 play a vital
It allows the applied input samples to be sent to the correct role for 2D DCT/IDCT. The specification of the proposed

Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on January 02,2024 at 07:03:52 UTC from IEEE Xplore. Restrictions apply.
210 IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, VOL. 66, NO. 3, AUGUST 2020

TABLE IV
C OMPARISON ON FPGA P LATFORM

TABLE V
ASIC I MPLEMENTATION R ESULTS

Fig. 12. Input sequencing architecture of 4-/8-/16-/32-point DCT/IDCT.

TABLE VI
R EDUCTION IN A REA C ONSUMPTION

Fig. 13. Output sequencing architecture of 4-/8-/16-/32-point DCT/IDCT.

TABLE III
S PECIFICATIONS OF THE P ROPOSED 2D 4-/8-/16-/32-P OINT
DCT/IDCT A RCHITECTURE

HDL. Moreover, RTL implementation result matches with

that of the HEVC HM Software [30]. For prototype valida-
tion of the proposed 2D DCT/IDCT hardware, the design is
implemented and tested on an FPGA in 40 nm CMOS technol-
ogy. Table IV shows the FPGA implementation results of the
proposed 2D 4-/8-/16-/32-point DCT/IDCT architecture. The
proposed 2D design reports significant reduction (>50%) in
FPGA resource utilization and Block RAMs (BRAMs) while
compared to the separate implementation of original DCT and
IDCT modules [26].
For implementation on ASIC platform, the Verilog codes of
2D DCT/IDCT architecture, which is shown in Fig. 11, is the proposed architectures have been synthesized by using a
presented in Table III. Here, x(4)-x(31)=0 conveys the fact 90 nm standard digital cell library. Moreover, ASIC imple-
that the input terminals x(4)-x(31) are made zero for the cal- mentation results of the proposed 1D and 2D DCT/IDCT
culation of 2D 4-point DCT/IDCT. Moreover, the desired 2D architectures for HEVC are shown in Table V. Here, the power
4-point DCT/IDCT output coefficients will be available at the and area are estimated at a frequency constraint of 145 MHz.
corresponding output ports y(0)-y(3) of the architecture. The hardware implementation of the proposed 2D 4-/8-/16-
/32-point DCT/IDCT architecture requires only 16 dual port
A. Results and Comparison of Hardware Implementation RAMs of size 64 × 16 bits each and reports maximum operat-
The proposed 2D 4-/8-/16-/32-point DCT/IDCT architec- ing frequency of 149.35 MHz, power dissipation of 11.23 mW,
ture as presented in Fig. 11 can calculate 2D DCT/IDCT for area of 120 kGates (Logic+16-SRAMs area), and energy-per-
all the TU sizes as mentioned in the HEVC standard [1]. It output-coefficient (EoC) of 2.34 pJ. Furthermore, Table VI
also includes MCM algorithm and clock gating to compute shows that the proposed 2D DCT/IDCT design reports 78%
the 2D DCT/IDCT output coefficients. Hardware implementa- reduction in area consumption while compared to the separate
tion of the proposed architectures is carried out in Verilog implementation of DCT and IDCT modules [26]. Therefore,

TABLE VII
H ARDWARE C OMPARISON OF THE P ROPOSED 2D DCT/IDCT A RCHITECTURE ON ASIC P LATFORM

the proposed design can be integrated into a real-time HEVC DCT/IDCT architecture needs only one DCT/IDCT module
codec for portable CE devices. to compute 2D 4-/8-/16-/32-point DCT/IDCT and consumes
Comparison of hardware of the proposed 2D 4-/8-/16-/32- lower energy (2.34 pJ), area (120 KGates), and power (11.23
point DCT/IDCT architecture as presented in Table VII indi- mW) than those of the architectures reported in the state-of-
cate that the proposed design reports lower power and smaller the-art. The proposed 2D DCT/IDCT architecture conforms
area as compared to the architectures [9]–[11], [23]–[26]. to the area and power requirements of a typical HEVC codec,
Moreover, the architectures [9], [11], [21], [23]–[25] can per- and it is endowed with low energy, area, and power consump-
form only forward DCT and require at least 32 RAMs to tion. Thus, this featured design can be integrated in a real-time
perform 2D transform. Thus, these designs will require some HEVC codec for power restrained HEVC-compliant portable
additional hardware for performing 2D-IDCT. The architec- CE devices such as smartphones, tablets, and camcorders.
tures [21], [26] can perform only 1D transform operation, and
therefore these designs will require additional hardware for
implementation of 2D transform. The architecture [12] has R EFERENCES
not been designed to perform 2D IDCT. Therefore, it has a
[1] V. Sze, M. Budagavi, and G. J. Sullivan, High Efficiency Video Coding:
slightly lower hardware consumption than the proposed design Algorithms and Architectures. Cham, Switzerland: Springer, 2014.
in this work. Unlike the architectures presented in the state-of- [2] K. Glasman, “CE society TV: High efficiency video coding (HEVC)
the-art [9]–[12], [21], [23]–[25], the proposed hardware in this [society news],” IEEE Consum. Electron. Mag., vol. 6, no. 1, pp. 19–22,
Jan. 2017.
work can compute 2D DCT/IDCT operation for all TU sizes [3] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview
in HEVC at the minimum hardware consumption while sup- of the H.264/AVC video coding standard,” IEEE Trans. Circuits Syst.
porting up to 288@4K frames-per-second. The results show Video Technol., vol. 13, no. 7, pp. 560–576, Jul. 2003.
conformity of the design to be integrated in a typical real-time [4] K. Wiens and P. Corcoran, “Repairability smackdown II: iPhone ver-
sus iPhone,” IEEE Consum. Electron. Mag., vol. 3, no. 1, pp. 19–24,
HEVC codec [15], [16], [29] for portable CE devices [4]–[6]. Jan. 2014.
[5] S. Kaur, “The revolution of tablet computers and apps: A look at emerg-
ing trends,” IEEE Consum. Electron. Mag., vol. 2, no. 1, pp. 36–41,
VI. C ONCLUSION Jan. 2013.
This paper presents an algorithm to find the minimum [6] Y. Takemura, “The development of video-camera technologies: Many
innovations behind video cameras are used for digital cameras and
required DCT-output/IDCT-input low-frequency coefficients smartphones,” IEEE Consum. Electron. Mag., vol. 8, no. 4, pp. 10–16,
for n-point DCT/IDCT in HM Software, where n = 4, 8, 16, Jul. 2019.
and 32. The algorithm succeeds in decreasing the encoding [7] B. Markwalter, “ATSC 3.0 begins commercial broadcasting: First IP-
based ultrahigh-definition TV broadcasts on the air [CTA insights],”
time at most by 17.95% of HM Software for Class A type IEEE Consum. Electron. Mag., vol. 7, no. 1, pp. 125–126, Jan. 2018.
video sequences. While compared to the designs reported [8] G. Delagi, “Harnessing technology to advance the next-generation
in the state-of-the-art, the proposed approximate integer 1D mobile user-experience,” in IEEE Int. Solid-State Circuits Conf. Dig.
4-, 8-, 16-, 32-, and 4-/8-/16-/32-point DCT/IDCT architec- Tech. Papers (ISSCC), San Francisco, CA, USA, Feb. 2010, pp. 18–24.
[9] E. Kalali, A. C. Mert, and I. Hamzaoglu, “A computation and energy
tures require relatively fewer computations and conform to reduction technique for HEVC discrete cosine transform,” IEEE Trans.
the HEVC standard. On testing the proposed architectures Consum. Electron., vol. 62, no. 2, pp. 166–174, May 2016.
with HM Software, it delivers RD performance close to the [10] E. Kalali, E. Ozcan, O. M. Yalcinkaya, and I. Hamzaoglu, “A low energy
HEVC inverse transform hardware,” IEEE Trans. Consum. Electron.,
reference algorithm, and reports a considerably less MSE of vol. 60, no. 4, pp. 754–761, Nov. 2014.
1.42, SSIM index up to 0.9913, and 2D correlation coeffi- [11] A. C. Mert, E. Kalali, and I. Hamzaoglu, “High performance 2D trans-
cient up to 0.9993 for 4K (3820 × 4196) video sequences. The form hardware for future video coding,” IEEE Trans. Consum. Electron.,
Transpose Memory architecture presented here requires only vol. 63, no. 2, pp. 117–125, May 2017.
[12] A. Singhadia, P. Bante, and I. Chakrabarti, “A novel algorithmic
16 RAMs to transpose 32 × 32 input samples, and support all approach for efficient realization of 2-D-DCT architecture for HEVC,”
the TU sizes in HEVC. The proposed 2D 4-/8-/16-/32-point IEEE Trans. Consum. Electron., vol. 65, no. 3, pp. 264–273, Aug. 2019.

Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on January 02,2024 at 07:03:52 UTC from IEEE Xplore. Restrictions apply.
212 IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, VOL. 66, NO. 3, AUGUST 2020

[13] M. J. Garrido, F. Pescador, M. Chavarrias, P. J. Lobo, and C. Sanz, “A [31] G. Bjontegaard, “Calculation of average PSNR differences between RD-
high performance FPGA-based architecture for the future video cod- curves,” presented at the 13th Video Coding Experts Group Meeting,
ing adaptive multiple core transform,” IEEE Trans. Consum. Electron., Austin, TX, USA, Apr. 2001.
vol. 64, no. 1, pp. 53–60, Feb. 2018. [32] M. Jridi, A. Alfalou, and P. K. Meher, “A generalized algorithm and
[14] M. Garrido, F. Pescador, M. Chavarrías, P. Lobo, and C. Sanz, “A 2-D reconfigurable architecture for efficient and scalable orthogonal approx-
multiple transform processor for the versatile video coding standard,” imation of DCT,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 62,
IEEE Trans. Consum. Electron., vol. 65, no. 3, pp. 274–283, Aug. 2019. no. 2, pp. 449–457, Feb. 2015.
[15] T.-M. Liu et al., “A 0.76 mm2 0.22 nJ/pixel DL-assisted 4K [33] G. J. Sullivan, J. R. Ohm, W. J. Han, and T. Wiegand, “Overview of the
video encoder LSI for quality-of-experience over smartphones,” IEEE high efficiency video coding (HEVC) standard,” IEEE Trans. Circuits
Solid-State Circuits Lett., vol. 1, no. 12, pp. 221–224, Dec. 2018. Syst. Video Technol., vol. 22, no. 12, pp. 1649–1668, Dec. 2012.
[16] D. Zhou et al., “An 8K H.265/HEVC video decoder chip with a new [34] Y. Voronenko and M. Püschel, “Multiplierless multiple constant mul-
system pipeline design,” IEEE J. Solid-State Circuits, vol. 52, no. 1, tiplication,” ACM Trans. Algorithms (TALG), vol. 3, no. 2, p. 11,
pp. 113–126, Jan. 2017. May 2007.
[17] V. Britanak, P. C. Yip, and K. R. Rao, Discrete Cosine and
Sine Transforms: General Properties, Fast Algorithms and Integer
Approximations, 1st ed. San Diego, CA, USA: Elsevier, 2007.
[18] R. J. Cintra, F. M. Bayer, V. A. Coutinho, S. Kulasekera, A. Madanayake,
and A. Leite, “Energy-efficient 8-point DCT approximations: Theory and
hardware architectures,” Circuits Syst. Signal Process., vol. 35, no. 11, Ashish Singhadia received the B.E. degree from
pp. 4009–4029, Nov. 2016. RGTU, Bhopal, India, in 2008, and the M.Tech.
[19] M. Jridi and P. K. Meher, “Scalable approximate DCT architectures degree in electronics and communication engineer-
for efficient HEVC-compliant video coding,” IEEE Trans. Circuits Syst. ing from IIT Kharagpur, Kharagpur, India, in 2013,
Video Technol., vol. 27, no. 8, pp. 1815–1825, Aug. 2017. where he is currently pursuing the Ph.D. degree
[20] M. M. A. Basiri and S. N. Mahammad, “High performance integer with the Department of Electronics and Electrical
DCT architectures for HEVC,” in Proc. 30th Int.Conf. VLSI Design Communication Engineering. He was an Assistant
16th Int. Conf. Embedded Syst. (VLSID), Hyderabad, India, Jan. 2017, Professor with VIT, Bhopal, from 2014 to 2017. He
pp. 121–126. has served as a Lecturer with T.I.E.,Tech, Jabalpur,
[21] S. Chatterjee and K. Sarawadekar, “An optimized architecture of HEVC India, from 2008 to 2010. His current research
core transform using real-valued DCT coefficients,” IEEE Trans. Circuits interests include VLSI architectures for image and
Syst. II, Exp. Briefs, vol. 65, no. 12, pp. 2052–2056, Dec. 2018. video processing.
[22] U. S. Potluri, A. Madanayake, R. J. Cintra, F. M. Bayer, S. Kulasekera,
and A. Edirisuriya, “Improved 8-point approximate DCT for image and
video compression requiring only 14 additions,” IEEE Trans. Circuits
Syst. I, Reg. Papers, vol. 61, no. 6, pp. 1727–1740, Jun. 2014.
[23] M. Zheng, J. Zheng, Z. Chen, L. Wu, X. Yang, and N. Ling, “A recon- Meghan Mamillapalli received the B.Tech. degree
figurable architecture for discrete cosine transform in video coding,” in electronics and communication engineering from
IEEE Trans. Circuits Syst. Video Technol., vol. 30, no. 3, pp. 810–821, MNNIT Allahabad, India, in 2017, and the M.Tech.
Mar. 2020. degree in microelectronics and VLSI design from
[24] M. Masera, M. Martina, and G. Masera, “Adaptive approximated DCT IIT Kharagpur, India, in 2019. He is currently work-
architectures for HEVC,” IEEE Trans. Circuits Syst. Video Technol., ing as a Design Engineer with Qualcomm India
vol. 27, no. 12, pp. 2714–2725, Dec. 2017. Private Ltd. His research interests include low power
[25] P. K. Meher, S. Park, B. Mohanty, K. Lim, and C. Yeo, “Efficient inte- hardware design for digital video processing and
ger DCT architectures for HEVC,” IEEE Trans. Circuits Syst. Video coding.
Technol., vol. 24, no. 1, pp. 168–178, Jan. 2014.
[26] M. Budagavi, A. Fuldseth, G. Bjontegaard, V. Sze, and M. Sadafale,
“Core transform design in the high efficiency video coding (HEVC) stan-
dard,” IEEE J. Sel. Topics Signal Process., vol. 7, no. 6, pp. 1029–1041,
Dec. 2013.
[27] J. Zhu, Z. Liu, and D. Wang, “Fully pipelined DCT/IDCT/Hadamard Indrajit Chakrabarti (Member, IEEE) received the
unified transform architecture for HEVC codec,” in Proc. IEEE Int. B.E. and M.E. degrees in electronics and telecom-
Symp. Circuits Syst. (ISCAS), Beijing, China, May 2013, pp. 677–680. munication engineering from Jadavpur University,
[28] W. Zhao, T. Onoye, and T. Song, “High-performance multiplierless India, in 1987 and 1990, respectively, and the
transform architecture for HEVC,” in Proc. IEEE Int. Symp. Circuits Ph.D. degree from IIT Kharagpur, India, in
Syst. (ISCAS), Beijing, China, May 2013, pp. 1668–1671. 1997, where he is currently a Professor with
[29] C. C. Ju et al., “A 0.5 nJ/pixel 4 K H.265/HEVC codec LSI for multi- the Department of Electronics and Electrical
format smartphone applications,” IEEE J. Solid-State Circuits, vol. 51, Communication Engineering. He has published more
no. 1, pp. 56–67, Jan. 2016. than 120 papers in peer-reviewed journals and con-
[30] F. Bossen, D. Flynn, K. Sharman, and K. Sühring. (Jan. 2018). HM ferences. His research interests include VLSI archi-
Software Manual. [Online]. Available: https://hevc.hhi.fraunhofer.de/ tectures for image and video processing, digital
trac/hevc/browser/tags/HM-16.18/doc/software-manual.pdf signal processing, error control coding, and wireless communication.

Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on January 02,2024 at 07:03:52 UTC from IEEE Xplore. Restrictions apply.

Materials Engineer Reviewer 1 With Answer Key
100% (1)
Materials Engineer Reviewer 1 With Answer Key
22 pages
22 Long Cases in Medicine by Shamol Sir PDF
86% (7)
22 Long Cases in Medicine by Shamol Sir PDF
259 pages
Jackendoff - Semantic Structures PDF
0% (1)
Jackendoff - Semantic Structures PDF
354 pages
Brief Data Sheet: Hi3520D V300 H.264 CODEC Processor
100% (1)
Brief Data Sheet: Hi3520D V300 H.264 CODEC Processor
7 pages
SolidWorks Electrical 2021 Black Book
From Everand
SolidWorks Electrical 2021 Black Book
Gaurav Verma
No ratings yet
CLW3060 Product Data
No ratings yet
CLW3060 Product Data
1 page
1-s2.0-S1434841116309037-main
No ratings yet
1-s2.0-S1434841116309037-main
8 pages
Vlsi Implementation of Integer DCT Architectures For Hevc in Fpga Technology
No ratings yet
Vlsi Implementation of Integer DCT Architectures For Hevc in Fpga Technology
12 pages
IET Image Processing - 2015 - Pastuszak - Hardware architectures for the H 265 HEVC discrete cosine transform (1)
No ratings yet
IET Image Processing - 2015 - Pastuszak - Hardware architectures for the H 265 HEVC discrete cosine transform (1)
11 pages
df_sao_2021
No ratings yet
df_sao_2021
13 pages
Kal Ali 2016
No ratings yet
Kal Ali 2016
9 pages
Serial parallel dataflow-pipelined processing architecture based accelerator for 2D transform-quantization in video coder and decoder
No ratings yet
Serial parallel dataflow-pipelined processing architecture based accelerator for 2D transform-quantization in video coder and decoder
12 pages
Design and Implementation of Multi-Purpose DCT/DST-Specific Accelerator On Heterogeneous Multicore Architecture
No ratings yet
Design and Implementation of Multi-Purpose DCT/DST-Specific Accelerator On Heterogeneous Multicore Architecture
10 pages
Core Transform Design in the High Efficiency Video Coding HEVC Standard
No ratings yet
Core Transform Design in the High Efficiency Video Coding HEVC Standard
13 pages
jiazhu2013
No ratings yet
jiazhu2013
4 pages
The_VLSI_Architecture_of_a_Highly_Efficient_Deblocking_Filter_for_HEVC_Systems
No ratings yet
The_VLSI_Architecture_of_a_Highly_Efficient_Deblocking_Filter_for_HEVC_Systems
13 pages
Artigo Científico
No ratings yet
Artigo Científico
6 pages
Image Compression Using High Efficient Video Coding (HEVC) Technique
No ratings yet
Image Compression Using High Efficient Video Coding (HEVC) Technique
3 pages
FPGA_ARCH_FVC_AMT (5)
No ratings yet
FPGA_ARCH_FVC_AMT (5)
7 pages
Potluri 2014
No ratings yet
Potluri 2014
14 pages
High Performance 2D Transform Hardware for Future Video Coding
No ratings yet
High Performance 2D Transform Hardware for Future Video Coding
9 pages
Sim2024 Sara
No ratings yet
Sim2024 Sara
4 pages
A Real-Time Low-Power Coding Bit-Rate Control Scheme For High-Efficiency Video Coding in A Multiprocessor System-on-Chip
No ratings yet
A Real-Time Low-Power Coding Bit-Rate Control Scheme For High-Efficiency Video Coding in A Multiprocessor System-on-Chip
11 pages
Reconfigurable CORDIC-Based Low-Power DCT Architecture Based On Data Priority
No ratings yet
Reconfigurable CORDIC-Based Low-Power DCT Architecture Based On Data Priority
9 pages
Subramanian 2010
No ratings yet
Subramanian 2010
4 pages
Performance Efficient Integration and Programming Approach of DCT Accelerator For HEVC in MANGO Platform
No ratings yet
Performance Efficient Integration and Programming Approach of DCT Accelerator For HEVC in MANGO Platform
9 pages
A Pipelined 8x8 2-D Forward DCT Hardware Architecture For H.264/AVC High Profile Encoder
No ratings yet
A Pipelined 8x8 2-D Forward DCT Hardware Architecture For H.264/AVC High Profile Encoder
11 pages
Design and Implementation of an Efficient Multi-Pattern Motion Estimation Search Algorithm for HEVC H.265
No ratings yet
Design and Implementation of an Efficient Multi-Pattern Motion Estimation Search Algorithm for HEVC H.265
10 pages
A Scalable Approximate DCT Architecture For Efficient Hevc Compliant Video Coding
No ratings yet
A Scalable Approximate DCT Architecture For Efficient Hevc Compliant Video Coding
13 pages
G Nageshwara Reddy - 13MVD1036
No ratings yet
G Nageshwara Reddy - 13MVD1036
8 pages
High-Performance Hardware Implementation of The H
No ratings yet
High-Performance Hardware Implementation of The H
4 pages
baldev2018
No ratings yet
baldev2018
9 pages
Low Power DCT Architecture For Image/Video Coders: IPASJ International Journal of Electronics & Communication (IIJEC)
No ratings yet
Low Power DCT Architecture For Image/Video Coders: IPASJ International Journal of Electronics & Communication (IIJEC)
10 pages
Iberchip2025 Sara 1
No ratings yet
Iberchip2025 Sara 1
4 pages
IPC18 03 Vide Capture Solutions
No ratings yet
IPC18 03 Vide Capture Solutions
37 pages
A Reconfigurable Multiple Transform Selection Architecture For VVC
No ratings yet
A Reconfigurable Multiple Transform Selection Architecture For VVC
12 pages
DCT Thesis
No ratings yet
DCT Thesis
12 pages
Bit-Plane Decomposition Matrix-Based VLSI
No ratings yet
Bit-Plane Decomposition Matrix-Based VLSI
57 pages
ASIC BASED DCT2016
No ratings yet
ASIC BASED DCT2016
5 pages
Approximate Interpolation Filters for the Fractional Motion Estimation in HEVC Encoders and Their VLSI Design
No ratings yet
Approximate Interpolation Filters for the Fractional Motion Estimation in HEVC Encoders and Their VLSI Design
6 pages
SCAN Chain Based Clock Gating For Low Power Video Codec Design
No ratings yet
SCAN Chain Based Clock Gating For Low Power Video Codec Design
7 pages
2012 12 HEVC-Special-Section
100% (1)
2012 12 HEVC-Special-Section
2 pages
VLSI Architecture of Full-Search Variable-Block-Size Motion Estimation For HEVC Video Encoding
No ratings yet
VLSI Architecture of Full-Search Variable-Block-Size Motion Estimation For HEVC Video Encoding
6 pages
High Performance Integer DCT Architectures For Hevc: Mohamed Asan Basiri M, Noor Mahammad SK
No ratings yet
High Performance Integer DCT Architectures For Hevc: Mohamed Asan Basiri M, Noor Mahammad SK
6 pages
wenjunzhao2013
No ratings yet
wenjunzhao2013
4 pages
Electronics Circuit Design
No ratings yet
Electronics Circuit Design
8 pages
Brief Data Sheet: Hi3520D V400 H.265 Codec Processor
No ratings yet
Brief Data Sheet: Hi3520D V400 H.265 Codec Processor
8 pages
Hi 3520 D
No ratings yet
Hi 3520 D
7 pages
Set Top Box HEVC Presentation
No ratings yet
Set Top Box HEVC Presentation
30 pages
Rpribas, 327-QuaseFinal
No ratings yet
Rpribas, 327-QuaseFinal
8 pages
Analog Dialogue, Volume 47, Number 4
From Everand
Analog Dialogue, Volume 47, Number 4
Analog Dialogue
No ratings yet
shen2016
No ratings yet
shen2016
12 pages
HEVC
No ratings yet
HEVC
208 pages
32 DCT
No ratings yet
32 DCT
57 pages
Brief Data Sheet: Hi3521A H.264 CODEC Processor
No ratings yet
Brief Data Sheet: Hi3521A H.264 CODEC Processor
7 pages
sangyoonpark2013DCT
No ratings yet
sangyoonpark2013DCT
4 pages
FPGA Implementations of HEVC Inverse DCT Using High-Level Synthesis
No ratings yet
FPGA Implementations of HEVC Inverse DCT Using High-Level Synthesis
7 pages
Highly Parallel HEVC Decoding For Heterogeneous Systems With CPU and GPU
No ratings yet
Highly Parallel HEVC Decoding For Heterogeneous Systems With CPU and GPU
34 pages
Joint Photographic Experts Group: Unlocking the Power of Visual Data with the JPEG Standard
From Everand
Joint Photographic Experts Group: Unlocking the Power of Visual Data with the JPEG Standard
Fouad Sabry
No ratings yet
DisplayPort vs HDMI
From Everand
DisplayPort vs HDMI
Alisa Turing
No ratings yet
Programming and Prototyping with Teensy Microcontrollers: Definitive Reference for Developers and Engineers
From Everand
Programming and Prototyping with Teensy Microcontrollers: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Programming NodeMCU for IoT Applications: Definitive Reference for Developers and Engineers
From Everand
Programming NodeMCU for IoT Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
EtherNet/IP Engineering Guide: Definitive Reference for Developers and Engineers
From Everand
EtherNet/IP Engineering Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Colour Banding: Exploring the Depths of Computer Vision: Unraveling the Mystery of Colour Banding
From Everand
Colour Banding: Exploring the Depths of Computer Vision: Unraveling the Mystery of Colour Banding
Fouad Sabry
No ratings yet
VLSI Testing pg
No ratings yet
VLSI Testing pg
3 pages
vd-model-paper_19_20 (1)
No ratings yet
vd-model-paper_19_20 (1)
3 pages
Low-Power_Approximate_Multipliers_Using_Encoded_Partial_Products_and_Approximate_Compressors
No ratings yet
Low-Power_Approximate_Multipliers_Using_Encoded_Partial_Products_and_Approximate_Compressors
13 pages
tv-model-qp
No ratings yet
tv-model-qp
1 page
DCT-10 additions
No ratings yet
DCT-10 additions
12 pages
DCT-14 additions
No ratings yet
DCT-14 additions
7 pages
INTRA PREDICTION SURVEY PAPER
No ratings yet
INTRA PREDICTION SURVEY PAPER
11 pages
IBE_23_cse
No ratings yet
IBE_23_cse
4 pages
ComparativeStudyDCTandDWT
No ratings yet
ComparativeStudyDCTandDWT
15 pages
P_2530
No ratings yet
P_2530
7 pages
HRM
No ratings yet
HRM
1 page
Chemical Technology TH
No ratings yet
Chemical Technology TH
16 pages
Grade III Holiday Homework (2024-25)
No ratings yet
Grade III Holiday Homework (2024-25)
16 pages
wb eplus 3 mas extras
No ratings yet
wb eplus 3 mas extras
143 pages
Retail Banking Trends Book 2023
No ratings yet
Retail Banking Trends Book 2023
18 pages
Aquascape-2021-ProductCatalog
No ratings yet
Aquascape-2021-ProductCatalog
140 pages
Unit 4 Wangari Maathi
No ratings yet
Unit 4 Wangari Maathi
2 pages
Purmo Tech Catalogue Panel Radiators Full PR 05 2017 en PL
No ratings yet
Purmo Tech Catalogue Panel Radiators Full PR 05 2017 en PL
116 pages
Markscheme HL Paper2
No ratings yet
Markscheme HL Paper2
99 pages
Operation Mockingbird
100% (2)
Operation Mockingbird
29 pages
Eo 13 - 2023 (BDRRMC)
No ratings yet
Eo 13 - 2023 (BDRRMC)
4 pages
Wesan WP G
No ratings yet
Wesan WP G
4 pages
African Peoples Names For God
No ratings yet
African Peoples Names For God
12 pages
Full Download Environmental Geology 12th Edition Carla W. Montgomery PDF DOCX
100% (3)
Full Download Environmental Geology 12th Edition Carla W. Montgomery PDF DOCX
47 pages
UBIS Thesis - Vijay Thomas
No ratings yet
UBIS Thesis - Vijay Thomas
92 pages
Rajput
No ratings yet
Rajput
5 pages
2022 Fleming autofagia
No ratings yet
2022 Fleming autofagia
32 pages
Che101 Report4 Group19 Donusturuldu
No ratings yet
Che101 Report4 Group19 Donusturuldu
5 pages
V2.3 Bentsai Company and Product Presentation
No ratings yet
V2.3 Bentsai Company and Product Presentation
39 pages
Hydraulics July 2017 (2010 Scheme)
No ratings yet
Hydraulics July 2017 (2010 Scheme)
1 page
8300 1F QP Mathematics G 19may23 AM - 1
No ratings yet
8300 1F QP Mathematics G 19may23 AM - 1
28 pages
Uranium Mining in Jharkhand - New Ventures
No ratings yet
Uranium Mining in Jharkhand - New Ventures
14 pages
2024-2025 State Nanotechnology Research Packet
No ratings yet
2024-2025 State Nanotechnology Research Packet
55 pages
USP-NF Palm Kernel Oil
No ratings yet
USP-NF Palm Kernel Oil
3 pages
awip-standing-seam-sr2-sr2-csi-masterformat-en
No ratings yet
awip-standing-seam-sr2-sr2-csi-masterformat-en
8 pages
AAI PAPER
No ratings yet
AAI PAPER
43 pages
Electrical Part List
No ratings yet
Electrical Part List
12 pages

Hardware-Efficient_2D-DCT_IDCT_Architecture_for_Portable_HEVC-Compliant_Devices

Uploaded by

Hardware-Efficient_2D-DCT_IDCT_Architecture_for_Portable_HEVC-Compliant_Devices

Uploaded by

IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, VOL. 66, NO.

3, AUGUST 2020 203

Hardware-Efficient 2D-DCT/IDCT Architecture for

III. P ROPOSED A PPROXIMATE I NTEGER 1D−DCT/IDCT

shift operations to perform DCT/IDCT. The shift operations

c(16) = a.x(0); c(12) = f .x(4);

The variables (c(7), c(9), . . . , c(29), c(31)) as depicted in

c(7) = B.x(1) − u.x(3) − q.x(5);

Fig. 8. Transpose Memory architecture of 2D 4-/8-/16-/32-point DCT/IDCT.

IV. T RANSPOSE M EMORY

Figure 9(c) shows the generated sequences for the select

V. P ROPOSED A PPROXIMATE I NTEGER 2D

Fig. 12. Input sequencing architecture of 4-/8-/16-/32-point DCT/IDCT.

Fig. 13. Output sequencing architecture of 4-/8-/16-/32-point DCT/IDCT.

HDL. Moreover, RTL implementation result matches with

You might also like