Hardware-Efficient_2D-DCT_IDCT_Architecture_for_Portable_HEVC-Compliant_Devices
Hardware-Efficient_2D-DCT_IDCT_Architecture_for_Portable_HEVC-Compliant_Devices
Abstract—Low power hardware acceleration cores for inte- electronic (CE) devices [4]–[7]. Without hardware accelera-
gration into real-time High Efficiency Video Coding (HEVC) tion cores, the CPU of CE devices is supposed to execute
codec for smartphones, tablets, camcorders, and televisions are in trillions of operations in a second for image and video pro-
great demand. This need motivates one for an efficient realization
of Discrete Cosine Transform (DCT) and Inverse-DCT (IDCT) cessing applications [8]. This overwhelming usage of the CPU
for HEVC. This paper presents an algorithm to calculate the will result in quick exhaustion of battery power, which will not
required minimum number of low-frequency DCT-output/IDCT- be appreciated by the consumers of portable CE devices. Thus,
input coefficients for 4, 8, 16, and 32-point DCT/IDCT in HEVC, power efficient hardware accelerators, which are intended to be
such that there is a slight decrease in peak-signal-to-noise-ratio included into real-time HEVC codec for portable CE devices,
(<0.15 decibel) and a minor increment in bitrate (<1.5%) as
compared to the reference HEVC-Test-Model (HM) Software. are in high demand [2], [8].
However, the encoding time gets reduced at most by 17.95% Nowadays, CE manufacturers have started integrating next-
for Class-A type sequences, while reporting mean-squared-error generation HEVC codec into CE products such as smart-
and structural-similarity of 1.42 and 0.9913, respectively for 4K phones [4], tablets [5], camcorders [6], and televisions [7], etc.
ultra-high-definition videos. Moreover, HEVC-compliant com- Moreover, there have been several instances of research in devel-
putationally efficient architectures are introduced for n-point
DCT/IDCT. The presented flexible Transpose Memory architec- oping low-power hardware accelerators for HEVC codecs [9]–
ture uses only sixteen random-access-memories to support all [14], HEVC encoders [15], and HEVC decoders [16] for
transform-unit sizes in HEVC. The proposed two-dimensional consumer electronics applications. Discrete Cosine Transform
DCT/IDCT architecture can process up to 288@4K frames- (DCT) and Inverse DCT (IDCT) [17] play a very impor-
per-second, and it consumes the minimum power, energy, and tant role in image and video applications. It is employed in
area of 11.23 milliwatts, 2.34 picojoules, and 120 kilo-gate-
equivalents, respectively. Such design with low power, area, and many image and video compression standards due to its good
energy features can be included in a real-time HEVC codec for energy compaction property [17], [18]. Several fast algorithms
HEVC-compliant consumer electronic devices. introduced in the existing literature [9]–[14], [18]–[24] can
Index Terms—HEVC, DCT, IDCT, hardware implementation, considerably decrease the computational requirement of DCT.
low power consumption. Scalable architectures for approximate DCT in HEVC have
been introduced by Jridi and Meher [19]. Moreover, they also
presented a reconfigurable architecture based on MCM algo-
rithm to support DCT/IDCT for the HEVC standard [19]. A
reconfigurable 2D-DCT hardware requiring 96 multipliers for
I. I NTRODUCTION
HEVC was presented by Zheng et al. [23]. The 2D DCT designs
IGH Efficiency Video Coding (HEVC) [1], [2] is a video
H compression standard that can deliver better performance
than the previous model, i.e., H.264[3]. It demonstrates the
as presented in the state-of-the-art [9]–[11], [25] required two
DCT hardware blocks for performing 2D DCT operation, and
processed 32 samples for all Transform Unit (TU) sizes in a
same video quality at approximately 50% of the bitrate clock cycle. The DCT/IDCT architecture by Budagavi et al. [26]
of H.264 [3], with resolutions supporting as high as 8K presented a reduction of 43-45% in area as compared to the
(7680×4320). Hardware accelerated designs for HEVC codec separate implementation of forward and inverse core trans-
have become highly essential to increase the efficiency of form. An optimized 32-point DCT architecture for HEVC
central processing unit (CPU) in HEVC-compliant consumer was proposed by Chatterjee and Sarawadekar [21]. The 2D
DCT hardware by Zhu et al. [27] needs multipliers to calcu-
Manuscript received January 4, 2020; revised April 5, 2020 and May 11, late all DCT coefficients, while the 2D DCT architecture as
2020; accepted June 28, 2020. Date of publication July 1, 2020; date of current presented by Zhao et al. [28] computes DCT of larger TUs
version July 28, 2020. This work was supported by the Ministry of Electronics
and Information Technology (MeitY), Government of India, through the Chip using smaller ones.
to System (C2S) Program. (Corresponding author: Ashish Singhadia.) While the HEVC transform [1] of multiple sizes improves
Ashish Singhadia and Indrajit Chakrabarti are with the Department the compression performance of video codecs, it also increases
of Electronics and Electrical Communication Engineering, Indian Institute
of Technology Kharagpur, Kharagpur 721302, India (e-mail: singha- the complexity of hardware implementation. With the devel-
[email protected]; [email protected]). opment of CE devices such as tablets [5], multimedia smart-
Meghan Mamillapalli is with the Department of Engineering, phones supporting real-time video calling and conferenc-
Qualcomm India Private Ltd., Bengaluru 560037, India (e-mail:
[email protected]). ing [4], digital set-top boxes with a feature of recording
Digital Object Identifier 10.1109/TCE.2020.3006213 digital videos, televisions [7] etc., it is strongly desired for
1558-4127
c 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on January 02,2024 at 07:03:52 UTC from IEEE Xplore. Restrictions apply.
204 IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, VOL. 66, NO. 3, AUGUST 2020
a device to be able to support both capture and playback Moreover, unlike the designs shown in the state-of-the-art
of the video. Consequently, both DCT and IDCT have to be [9]–[14] for CE applications, the proposed hardware can per-
implemented in the same device. Therefore, the algorithms and form 2D 4-/8-/16-/32-point DCT/IDCT at the minimum hard-
architectures that can significantly reduce power, energy and ware requirement. Furthermore, the proposed design reports at
area of the overall hardware are always in high demand [8]. least 78% less area consumption than the separate implemen-
Consequently, low power ASIC design for DCT/IDCT oper- tation of original DCT and IDCT architectures for HEVC [26].
ations for HEVC [1] will certainly lead to improved power Additionally, the proposed ASIC design seems to fully satisfy
efficiency and hardware acceleration of a real-time HEVC the power and area requirements of the state-of-the-art imple-
codec [29] for HEVC-compliant CE devices [4]–[7]. mentation of standard HEVC chips [15], [16], [29] designed
The main contributions of the work presented in this paper for portable CE devices. Therefore, the ASIC implementa-
are enumerated as follows. tion of this featured architecture for HEVC makes it highly
1) This work presents an algorithm to find the mini- relevant to consumer electronics society [2]. Consequently,
mum required low-frequency DCT-output/IDCT-input the implementation of this featured design will enable the
coefficients in n-point DCT/IDCT for the Transform- consumers to enjoy the multiple benefits of low-area, high-
and-Quantization (TQ) block of HEVC, where n = speed, and extended-battery-life portable HEVC-compliant CE
4, 8, 16, and 32. It significantly decreases the computa- products [4]–[6].
tional overhead and encoding time (at most by 17.95%)
of HEVC-Test-Model (HM) Software [30]. Moreover,
the calculated Bjφntegaard Delta (BD) PSNR (BD- II. P ROPOSED A LGORITHM
PSNR) [31], BD-Rate [31], and Rate-Distortion (RD) In this section, a new algorithm is presented, namely,
comparison demonstrate better coding efficiency of the Min_Num_Transform_Coeff, to calculate XnOPT . Here,
algorithm as compared to the existing algorithms [9], XnOPT represents the minimum required number of DCT-
[10], [12], [19], [24], [32]. output/IDCT-input coefficients to compute n-point DCT/IDCT
2) The proposed approximate integer 1D architectures for for the Transform-and-Quantization (TQ) block of HEVC,
4-, 8-, 16-, 32-, and 4-/8-/16-/32-point DCT/IDCT are where n = 4, 8, 16, and 32. The algorithm determines the
hardware efficient. The architectures report a consid- number of minimum required DCT-output/IDCT-input low-
erable decrease in addition (76-92%) and shift (86- frequency coefficients needed for the calculation of n-point
95%) operations for all TUs in HEVC, as com- DCT/IDCT in HEVC-Test-Model (HM) Software [30] so
pared to the original Separate DCT+IDCT archi- that the decrease in PSNR (dB) and the increment in bitrate
tecture [26]. The proposed designs are HEVC- (Mbps) is less than 0.15dB and 1.5%, respectively. When
compliant and use less arithmetic operations while the algorithm is extended to HEVC-test-Model (HM), it
compared to the designs reported in the state-of-the-art significantly decreases the computational overhead of the
[9], [13], [14], [21], [23], [26]. reference algorithm [1]. Therefore, the total encoding time
3) This paper also proposes a flexible Transpose Memory of HM Software [30] gets reduced at most by 17.95% for
(TM) architecture to support all TU sizes in HEVC. The Class A type video sequences. In the algorithm, the threshold
proposed TM architecture uses only 16 RAMs to trans- for PSNR (Pth ) and bitrate (Rth ) are taken only 0.15 dB less
pose input of 32 × 32 samples while compared to the 32 and 1.015 times than Pmax and Rmin , respectively. Here, Pmax
RAMs which are presented in the designs of [9]–[11], and Rmin represent the maximum PSNR and minimum bitrate
[21], [23]–[25]. reported by HM Software [30] for a video sequence (I) con-
4) The 2D DCT/IDCT architecture introduced in this paper cerning the original reference algorithm [1], i.e., Xn = Xmax .
consumes lower power (11.23 mW), area (120 KGates) Moreover, Pav and Rav represent the average values of PSNR
and energy (2.34 pJ) while compared to the reported and bitrate, respectively. Here, fP (Xn ) and fR (Xn ) represent
state-of-the-art designs [9]–[11], [23]–[26]. It can cal- the values of PSNR and bitrate, respectively calculated for
culate 2D forward/inverse DCT for all the TU sizes n-point DCT/IDCT at X = Xn by HM Software [30]. The
in HEVC. The proposed architecture requires only reference quantization parameters (QP) considered for the
one DCT/IDCT module while compared to the two algorithm are 22, 27, 32 and 37.
DCT hardware blocks as reported in the architectures In the algorithm, the number of DCT-output/IDCT-input
[9]–[11], [25]. coefficients (X) required for n-point DCT/IDCT are var-
The rest of the paper unfolds as follows. Section II discusses ied simultaneously in the range of Xmax to Xmin in HM
the proposed algorithm. The 1D DCT/IDCT architectures are Software [30]. While calculating the PSNR and bitrate for
presented in Section III. The flexible TM architecture is dis- 32-point DCT/IDCT, the values of Xmax and Xmin are con-
cussed in Section IV. The 2D DCT/IDCT architecture with sidered to be 32 and zero, respectively. So, the number
its implementation and hardware comparison are presented in of DCT-output/IDCT-input coefficients are varied simultane-
Section V. Finally, the paper is concluded in Section VI. ously from 32 − 0 for 32-point DCT/IDCT. The remaining
The proposed hardware-efficient 2D DCT/IDCT architec- DCT-output/IDCT-input coefficients which are not considered
ture consumes low power, low energy, and low area as while computing n-point DCT/IDCT are made zero in HM
compared to the relevant state-of-the-art architectures [9]–[11], Software [30]. While varying the coefficients from Xmax to
[13], [14] presented in the realm of consumer electronics. Xmin for 32-point DCT/IDCT, the remaining algorithms of
Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on January 02,2024 at 07:03:52 UTC from IEEE Xplore. Restrictions apply.
SINGHADIA et al.: HARDWARE-EFFICIENT 2D-DCT/IDCT ARCHITECTURE FOR PORTABLE HEVC-COMPLIANT DEVICES 205
Algorithm 1 Min_Num_Transform_Coeff
Input: I, PTH , RTH , Rmin , Pmax
Output: X4OPT , X8OPT , X16OPT , X32OPT
1 I ← Test Video Sequence;
2 Pth ← (Pmax − 0.15);
3 Rth ← 1.015 ∗ Rmin ;
4 Initialize Init = PRmax
min
, Var = 0;
/* Init, Var are data variables */
5 for (X4 = 4, X4 ≥ 0, X4 − −) do
Fig. 1. Approximate integer 4-pt DCT/IDCT architecture.
6 for (X8 = 8, X8 ≥ 0, X8 − −) do
7 for (X16 = 16, X16 ≥ 0, X16 − −) do
8 for (X32 = 32, X32 ≥ 0, X32 − −) do
9 Pav ← Average{fP (X4 ), fP (X8 ), fP (X16 ),
fP (X32 )};
10 Rav ← Average{fR (X4 ), fR (X8 ), fR (X16 ),
fR (X32 )};
11 Var ← PRav av
;
12 if (Var < Init && Pav ≥ PTH && Fig. 2. Approximate integer 8-pt DCT/IDCT architecture.
Rav ≤ RTH ) then
13 Init ← Var;
14 X4OPT ← X4 ;
15 X8OPT ← X8 ;
16 X16OPT ← X16 ;
17 X32OPT ← X32 ;
18 end
19 end
20 end
21 end
22 end
Fig. 3. Approximate integer 16-point DCT/IDCT architecture.
4-, 8-, and 16-point DCT/IDCT are kept unaltered. The same 4-point DCT/IDCT depending upon the select line (T) of mul-
technique is also followed while varying the coefficients for 4-, tiplexers (MUXes) as depicted in Fig. 1. It requires twelve
8-, and 16-point DCT/IDCT. As the input video sequences (I) addition and six shift operations to perform DCT/IDCT. The
are varied, it is observed that on an average the minimum num- MUXes, which are presented in Fig. 1, are also used and fol-
ber of required DCT-output/IDCT-input coefficients for 4-, 8-, lowed in Figures 2-5. The red dashed lines in Fig. 1 show
16-, and 32-point DCT/IDCT are 4, 3, 6, and 7, respectively, multiplication with −1. Figures 2-5 also follow the same
i.e., X4OPT = 4, X8OPT = 3, X16OPT = 6, and X32OPT = 7. conventions as used in Fig. 1.
Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on January 02,2024 at 07:03:52 UTC from IEEE Xplore. Restrictions apply.
SINGHADIA et al.: HARDWARE-EFFICIENT 2D-DCT/IDCT ARCHITECTURE FOR PORTABLE HEVC-COMPLIANT DEVICES 207
TABLE I
R EQUIRED A RITHMETIC O PERATIONS FOR A LL TU S IZES
TABLE II
R ESULTS OF HM S OFTWARE U NDER L OW D ELAY P, A LL I NTRA M AIN , AND R ANDOM ACCESS C ONFIGURATIONS
F. Comparison of Proposed 1D DCT/IDCT Architectures in Table II. Here, twelve video sequences of different classes
Table I shows a comparison of the proposed approximate are considered to check the conformity of the proposed designs
integer 1D DCT/IDCT architectures as presented in Figures 1- to HEVC. Due to the decreased computational overhead of
5 regarding addition and shifting operations for all TU sizes, HEVC transform, Encoding Time (ET) of HM Software gets
i.e., 4×4, 8×8, 16×16, and 32×32, as specified in HEVC. The reduced at most by 17.95% for Class A type sequences. The
arithmetic operations required for all the TUs with reference reference QP values considered here are 22, 27, 32, and 37.
to the original DCT and IDCT algorithms [17] were presented In Table II, BD-Rate [31] and BD-PSNR [31] show aver-
by Kalali et al. [9], [10]. The computational requirement of age increase in bitrate (%) and difference in PSNR (dB),
the original IDCT algorithm for all the TU sizes in HEVC respectively at the four reference QPs concerning original
were shown by Budagavi et al. [26]. As depicted in Table I, algorithm [33]. Moreover, Average represents average
the architecture with Separate DCT and IDCT modules [26] change in Encoding Time (ET) in % for the four reference
requires sum of the arithmetic operations of original DCT and QPs. It is calculated as follows.
IDCT algorithm, which were presented in the research publica- Final Value − Reference Value
tions of Kalali et al. [9], [10]. Table I presents the requirements Average = × 100% (8)
Reference Value
of the proposed architectures concerning addition and shift-
ing operations. It also shows the reduction (%) of arithmetic In Table II, it is worthwhile to note that the proposed designs
operations in comparison to that of the Separate DCT+IDCT report the maximum reduction of 17.95% in encoding time
architecture [26]. The shift operations required for all TU sizes (Avg. ET) for Traffic video sequence under Low-Delay-P
in the proposed DCT/IDCT designs as shown in Table I do configuration as compared to the reference algorithm [33] in
not need extra hardware and are realized by rewiring. HM software [30].
The results presented in Table II are also compared with
the technique [12] for twelve video sequences under Low-
G. Results of HEVC-Test-Model (HM) Software Delay-P, All-Intra-Main, and Random-Access configurations.
The proposed 1D DCT/IDCT architectures as presented In Table II, it can be seen that the proposed methodology
in Figures 1-5 are implemented in HM Software [30]. The reports higher savings in encoding-time, while reporting com-
results of the proposed algorithm under Low-Delay-P, All- parable BD-Rate and BD-PSNR as compared to those of the
Intra-Main, and Random Access configurations are presented technique reported in the work [12]. Moreover, the proposed
Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on January 02,2024 at 07:03:52 UTC from IEEE Xplore. Restrictions apply.
208 IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, VOL. 66, NO. 3, AUGUST 2020
Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on January 02,2024 at 07:03:52 UTC from IEEE Xplore. Restrictions apply.
SINGHADIA et al.: HARDWARE-EFFICIENT 2D-DCT/IDCT ARCHITECTURE FOR PORTABLE HEVC-COMPLIANT DEVICES 209
Fig. 9. Addressing logic for input section of TM. Fig. 10. Addressing logic for output section of TM.
Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on January 02,2024 at 07:03:52 UTC from IEEE Xplore. Restrictions apply.
210 IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, VOL. 66, NO. 3, AUGUST 2020
TABLE IV
C OMPARISON ON FPGA P LATFORM
TABLE V
ASIC I MPLEMENTATION R ESULTS
TABLE VI
R EDUCTION IN A REA C ONSUMPTION
TABLE III
S PECIFICATIONS OF THE P ROPOSED 2D 4-/8-/16-/32-P OINT
DCT/IDCT A RCHITECTURE
Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on January 02,2024 at 07:03:52 UTC from IEEE Xplore. Restrictions apply.
SINGHADIA et al.: HARDWARE-EFFICIENT 2D-DCT/IDCT ARCHITECTURE FOR PORTABLE HEVC-COMPLIANT DEVICES 211
TABLE VII
H ARDWARE C OMPARISON OF THE P ROPOSED 2D DCT/IDCT A RCHITECTURE ON ASIC P LATFORM
the proposed design can be integrated into a real-time HEVC DCT/IDCT architecture needs only one DCT/IDCT module
codec for portable CE devices. to compute 2D 4-/8-/16-/32-point DCT/IDCT and consumes
Comparison of hardware of the proposed 2D 4-/8-/16-/32- lower energy (2.34 pJ), area (120 KGates), and power (11.23
point DCT/IDCT architecture as presented in Table VII indi- mW) than those of the architectures reported in the state-of-
cate that the proposed design reports lower power and smaller the-art. The proposed 2D DCT/IDCT architecture conforms
area as compared to the architectures [9]–[11], [23]–[26]. to the area and power requirements of a typical HEVC codec,
Moreover, the architectures [9], [11], [21], [23]–[25] can per- and it is endowed with low energy, area, and power consump-
form only forward DCT and require at least 32 RAMs to tion. Thus, this featured design can be integrated in a real-time
perform 2D transform. Thus, these designs will require some HEVC codec for power restrained HEVC-compliant portable
additional hardware for performing 2D-IDCT. The architec- CE devices such as smartphones, tablets, and camcorders.
tures [21], [26] can perform only 1D transform operation, and
therefore these designs will require additional hardware for
implementation of 2D transform. The architecture [12] has R EFERENCES
not been designed to perform 2D IDCT. Therefore, it has a
[1] V. Sze, M. Budagavi, and G. J. Sullivan, High Efficiency Video Coding:
slightly lower hardware consumption than the proposed design Algorithms and Architectures. Cham, Switzerland: Springer, 2014.
in this work. Unlike the architectures presented in the state-of- [2] K. Glasman, “CE society TV: High efficiency video coding (HEVC)
the-art [9]–[12], [21], [23]–[25], the proposed hardware in this [society news],” IEEE Consum. Electron. Mag., vol. 6, no. 1, pp. 19–22,
Jan. 2017.
work can compute 2D DCT/IDCT operation for all TU sizes [3] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview
in HEVC at the minimum hardware consumption while sup- of the H.264/AVC video coding standard,” IEEE Trans. Circuits Syst.
porting up to 288@4K frames-per-second. The results show Video Technol., vol. 13, no. 7, pp. 560–576, Jul. 2003.
conformity of the design to be integrated in a typical real-time [4] K. Wiens and P. Corcoran, “Repairability smackdown II: iPhone ver-
sus iPhone,” IEEE Consum. Electron. Mag., vol. 3, no. 1, pp. 19–24,
HEVC codec [15], [16], [29] for portable CE devices [4]–[6]. Jan. 2014.
[5] S. Kaur, “The revolution of tablet computers and apps: A look at emerg-
ing trends,” IEEE Consum. Electron. Mag., vol. 2, no. 1, pp. 36–41,
VI. C ONCLUSION Jan. 2013.
This paper presents an algorithm to find the minimum [6] Y. Takemura, “The development of video-camera technologies: Many
innovations behind video cameras are used for digital cameras and
required DCT-output/IDCT-input low-frequency coefficients smartphones,” IEEE Consum. Electron. Mag., vol. 8, no. 4, pp. 10–16,
for n-point DCT/IDCT in HM Software, where n = 4, 8, 16, Jul. 2019.
and 32. The algorithm succeeds in decreasing the encoding [7] B. Markwalter, “ATSC 3.0 begins commercial broadcasting: First IP-
based ultrahigh-definition TV broadcasts on the air [CTA insights],”
time at most by 17.95% of HM Software for Class A type IEEE Consum. Electron. Mag., vol. 7, no. 1, pp. 125–126, Jan. 2018.
video sequences. While compared to the designs reported [8] G. Delagi, “Harnessing technology to advance the next-generation
in the state-of-the-art, the proposed approximate integer 1D mobile user-experience,” in IEEE Int. Solid-State Circuits Conf. Dig.
4-, 8-, 16-, 32-, and 4-/8-/16-/32-point DCT/IDCT architec- Tech. Papers (ISSCC), San Francisco, CA, USA, Feb. 2010, pp. 18–24.
[9] E. Kalali, A. C. Mert, and I. Hamzaoglu, “A computation and energy
tures require relatively fewer computations and conform to reduction technique for HEVC discrete cosine transform,” IEEE Trans.
the HEVC standard. On testing the proposed architectures Consum. Electron., vol. 62, no. 2, pp. 166–174, May 2016.
with HM Software, it delivers RD performance close to the [10] E. Kalali, E. Ozcan, O. M. Yalcinkaya, and I. Hamzaoglu, “A low energy
HEVC inverse transform hardware,” IEEE Trans. Consum. Electron.,
reference algorithm, and reports a considerably less MSE of vol. 60, no. 4, pp. 754–761, Nov. 2014.
1.42, SSIM index up to 0.9913, and 2D correlation coeffi- [11] A. C. Mert, E. Kalali, and I. Hamzaoglu, “High performance 2D trans-
cient up to 0.9993 for 4K (3820 × 4196) video sequences. The form hardware for future video coding,” IEEE Trans. Consum. Electron.,
Transpose Memory architecture presented here requires only vol. 63, no. 2, pp. 117–125, May 2017.
[12] A. Singhadia, P. Bante, and I. Chakrabarti, “A novel algorithmic
16 RAMs to transpose 32 × 32 input samples, and support all approach for efficient realization of 2-D-DCT architecture for HEVC,”
the TU sizes in HEVC. The proposed 2D 4-/8-/16-/32-point IEEE Trans. Consum. Electron., vol. 65, no. 3, pp. 264–273, Aug. 2019.
Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on January 02,2024 at 07:03:52 UTC from IEEE Xplore. Restrictions apply.
212 IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, VOL. 66, NO. 3, AUGUST 2020
[13] M. J. Garrido, F. Pescador, M. Chavarrias, P. J. Lobo, and C. Sanz, “A [31] G. Bjontegaard, “Calculation of average PSNR differences between RD-
high performance FPGA-based architecture for the future video cod- curves,” presented at the 13th Video Coding Experts Group Meeting,
ing adaptive multiple core transform,” IEEE Trans. Consum. Electron., Austin, TX, USA, Apr. 2001.
vol. 64, no. 1, pp. 53–60, Feb. 2018. [32] M. Jridi, A. Alfalou, and P. K. Meher, “A generalized algorithm and
[14] M. Garrido, F. Pescador, M. Chavarrías, P. Lobo, and C. Sanz, “A 2-D reconfigurable architecture for efficient and scalable orthogonal approx-
multiple transform processor for the versatile video coding standard,” imation of DCT,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 62,
IEEE Trans. Consum. Electron., vol. 65, no. 3, pp. 274–283, Aug. 2019. no. 2, pp. 449–457, Feb. 2015.
[15] T.-M. Liu et al., “A 0.76 mm2 0.22 nJ/pixel DL-assisted 4K [33] G. J. Sullivan, J. R. Ohm, W. J. Han, and T. Wiegand, “Overview of the
video encoder LSI for quality-of-experience over smartphones,” IEEE high efficiency video coding (HEVC) standard,” IEEE Trans. Circuits
Solid-State Circuits Lett., vol. 1, no. 12, pp. 221–224, Dec. 2018. Syst. Video Technol., vol. 22, no. 12, pp. 1649–1668, Dec. 2012.
[16] D. Zhou et al., “An 8K H.265/HEVC video decoder chip with a new [34] Y. Voronenko and M. Püschel, “Multiplierless multiple constant mul-
system pipeline design,” IEEE J. Solid-State Circuits, vol. 52, no. 1, tiplication,” ACM Trans. Algorithms (TALG), vol. 3, no. 2, p. 11,
pp. 113–126, Jan. 2017. May 2007.
[17] V. Britanak, P. C. Yip, and K. R. Rao, Discrete Cosine and
Sine Transforms: General Properties, Fast Algorithms and Integer
Approximations, 1st ed. San Diego, CA, USA: Elsevier, 2007.
[18] R. J. Cintra, F. M. Bayer, V. A. Coutinho, S. Kulasekera, A. Madanayake,
and A. Leite, “Energy-efficient 8-point DCT approximations: Theory and
hardware architectures,” Circuits Syst. Signal Process., vol. 35, no. 11, Ashish Singhadia received the B.E. degree from
pp. 4009–4029, Nov. 2016. RGTU, Bhopal, India, in 2008, and the M.Tech.
[19] M. Jridi and P. K. Meher, “Scalable approximate DCT architectures degree in electronics and communication engineer-
for efficient HEVC-compliant video coding,” IEEE Trans. Circuits Syst. ing from IIT Kharagpur, Kharagpur, India, in 2013,
Video Technol., vol. 27, no. 8, pp. 1815–1825, Aug. 2017. where he is currently pursuing the Ph.D. degree
[20] M. M. A. Basiri and S. N. Mahammad, “High performance integer with the Department of Electronics and Electrical
DCT architectures for HEVC,” in Proc. 30th Int.Conf. VLSI Design Communication Engineering. He was an Assistant
16th Int. Conf. Embedded Syst. (VLSID), Hyderabad, India, Jan. 2017, Professor with VIT, Bhopal, from 2014 to 2017. He
pp. 121–126. has served as a Lecturer with T.I.E.,Tech, Jabalpur,
[21] S. Chatterjee and K. Sarawadekar, “An optimized architecture of HEVC India, from 2008 to 2010. His current research
core transform using real-valued DCT coefficients,” IEEE Trans. Circuits interests include VLSI architectures for image and
Syst. II, Exp. Briefs, vol. 65, no. 12, pp. 2052–2056, Dec. 2018. video processing.
[22] U. S. Potluri, A. Madanayake, R. J. Cintra, F. M. Bayer, S. Kulasekera,
and A. Edirisuriya, “Improved 8-point approximate DCT for image and
video compression requiring only 14 additions,” IEEE Trans. Circuits
Syst. I, Reg. Papers, vol. 61, no. 6, pp. 1727–1740, Jun. 2014.
[23] M. Zheng, J. Zheng, Z. Chen, L. Wu, X. Yang, and N. Ling, “A recon- Meghan Mamillapalli received the B.Tech. degree
figurable architecture for discrete cosine transform in video coding,” in electronics and communication engineering from
IEEE Trans. Circuits Syst. Video Technol., vol. 30, no. 3, pp. 810–821, MNNIT Allahabad, India, in 2017, and the M.Tech.
Mar. 2020. degree in microelectronics and VLSI design from
[24] M. Masera, M. Martina, and G. Masera, “Adaptive approximated DCT IIT Kharagpur, India, in 2019. He is currently work-
architectures for HEVC,” IEEE Trans. Circuits Syst. Video Technol., ing as a Design Engineer with Qualcomm India
vol. 27, no. 12, pp. 2714–2725, Dec. 2017. Private Ltd. His research interests include low power
[25] P. K. Meher, S. Park, B. Mohanty, K. Lim, and C. Yeo, “Efficient inte- hardware design for digital video processing and
ger DCT architectures for HEVC,” IEEE Trans. Circuits Syst. Video coding.
Technol., vol. 24, no. 1, pp. 168–178, Jan. 2014.
[26] M. Budagavi, A. Fuldseth, G. Bjontegaard, V. Sze, and M. Sadafale,
“Core transform design in the high efficiency video coding (HEVC) stan-
dard,” IEEE J. Sel. Topics Signal Process., vol. 7, no. 6, pp. 1029–1041,
Dec. 2013.
[27] J. Zhu, Z. Liu, and D. Wang, “Fully pipelined DCT/IDCT/Hadamard Indrajit Chakrabarti (Member, IEEE) received the
unified transform architecture for HEVC codec,” in Proc. IEEE Int. B.E. and M.E. degrees in electronics and telecom-
Symp. Circuits Syst. (ISCAS), Beijing, China, May 2013, pp. 677–680. munication engineering from Jadavpur University,
[28] W. Zhao, T. Onoye, and T. Song, “High-performance multiplierless India, in 1987 and 1990, respectively, and the
transform architecture for HEVC,” in Proc. IEEE Int. Symp. Circuits Ph.D. degree from IIT Kharagpur, India, in
Syst. (ISCAS), Beijing, China, May 2013, pp. 1668–1671. 1997, where he is currently a Professor with
[29] C. C. Ju et al., “A 0.5 nJ/pixel 4 K H.265/HEVC codec LSI for multi- the Department of Electronics and Electrical
format smartphone applications,” IEEE J. Solid-State Circuits, vol. 51, Communication Engineering. He has published more
no. 1, pp. 56–67, Jan. 2016. than 120 papers in peer-reviewed journals and con-
[30] F. Bossen, D. Flynn, K. Sharman, and K. Sühring. (Jan. 2018). HM ferences. His research interests include VLSI archi-
Software Manual. [Online]. Available: https://hevc.hhi.fraunhofer.de/ tectures for image and video processing, digital
trac/hevc/browser/tags/HM-16.18/doc/software-manual.pdf signal processing, error control coding, and wireless communication.
Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on January 02,2024 at 07:03:52 UTC from IEEE Xplore. Restrictions apply.