High Performance Integer DCT Architectures For Hevc: Mohamed Asan Basiri M, Noor Mahammad SK

Uploaded by

nabila brahimi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views

High Performance Integer DCT Architectures For Hevc: Mohamed Asan Basiri M, Noor Mahammad SK

Uploaded by

nabila brahimi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

2017 30th International Conference on VLSI Design and 2017 16th International Conference on Embedded Systems

High Performance Integer DCT Architectures for

HEVC
Mohamed Asan Basiri M, Noor Mahammad Sk,
Department of Computer Science and Engineering, Department of Computer Science and Engineering,
IIITD&M Kancheepuram, Chennai, IIITD&M Kancheepuram, Chennai,
Email: [email protected] Email: [email protected]

Abstract—This paper proposes an efficient VLSI architecture

for integer discrete cosine transform (integer DCT) that is used in
real time high efficiency video coding (HEVC) applications. The
proposed N -point 1D-Integer DCT architecture consists of signed
configurable carry save adder tree based multiplier unit. So, the
depth of the architecture falls within the bounds of O(log2 N ).
The proposed 1D architecture is used to perform one N -point or
multiple N2 , N4 , ...2-point Integer DCTs in parallel. The proposed
1D architecture is used to design 2D folded and parallel designs.
The performance results show that the proposed architecture Fig. 1. Example for row and column process of 4 × 4-point 2D-Integer DCT
gives better performance compared with existing architectures
using 45 nm CMOS TSMC library. The proposed 32 × 32-point
parallel Integer DCT achieves 59.1% of improvement in worst
path delay compared with odd-even decomposition [3] based
architecture.
Index Terms—DCT, DSP, Integer DCT, and HEVC

I. I NTRODUCTION
Digital signal processors (DSPs) are essential for real-time
processing of real-world digitized data to perform high-speed
numeric calculations used for a broad range of applications
from basic consumer electronics to sophisticated industrial
instrumentation. The discrete transform [1] is used to change Fig. 2. Basic architecture for 2D-Integer DCT (a) Folded (b) Parallel
the representation of a signal from one domain to another
for reducing the complexity of a particular digital signal
processing application. Discrete cosine transform (DCT) is DCT architecture, where two 1D-Integer DCT units are used
very powerful transformation used in image compression. The to perform the row and column processes. In all the cases, the
circuit complexity of DCT is greater than integer DCT because transpose buffer is used to store the results from row process
DCT is floating point and the integer DCT is fixed point. In to find the column process values.
the recent trends, HEVC [2] is widely used in multimedia     
application, where the integer DCT is incorporated [3]. o11 c11 c12 c13 x11
The 1D and 2D discrete transformations are represented o12  = c21 c22 c23  x12  (1)
as (1) and (2) respectively, where O is the output matrix, X o13 c31 c32 c33 x13
is the input signal matrix, and C is the co-efficient matrix.     
The 4-point integer DCT co-efficient matrix is shown in (3). o11 o12 o13 c11 c12 c13 x11 x12 x13
Fig. 1 shows the 4 × 4-point 2D-integer DCT. During row o21 o22 o23  = c21 c22 c23  x21 x22 x23 
process, each row of 4 × 4-input matrix is 1D transformed o31 o32 o33 c31 c32 c33 x31 x32 x33
and the results are stored in each row of 4 × 4-buffer. During (2)
column process, each column of 4 × 4-buffer matrix is 1D  
64 64 64 64
transformed and the results are the required 2D transformed  83 36 − 36 − 83 
4×4
values. Fig. 2(a) shows the separable folded 2D-Integer DCT CInteger DCT =
 64 − 64 − 64
 (3)
64 
architecture, where one 1D-Integer DCT unit is used to
36 − 83 83 − 36
perform the both row and column processes. If sel = 0,
then row process is performed otherwise column process is The odd-even decomposition based N -point Integer DCT
performed. Fig. 2(b) shows the separable parallel 2D-Integer is shown in [3], where the N2 numbers of even ordered input

2380-6923/16 $31.00 © 2016 IEEE 121

DOI 10.1109/VLSID.2017.68
signal samples values are sent to N2 -point Integer DCT unit. with carry look ahead adder (CLA), which will produce the
The configurable Integer DCT is shown in [4], where the multiplication result oi . The corresponding resultant sign bit
multiplier is designed in such a way that to perform N or (oi s) will be obtained from the Fig. 3(c), where the series
N N
2 or 4 -point Integer DCTs. The 8-point Integer transform of multiplexers are used to store the xor-ed sign bit values
based HEVC architectures are shown in [5], [6], [7]. The accu- of input signal sample values (xi s) and the co-efficient values
mulators based N -point Integer DCT architectures are shown (cij s), where the i and j are varied from 0 to 31 for a 32-point
in [8] and [9], where N accumulators are used to produce N Integer DCT. Here, s32 , s16 , s8 , s4 , and s2 are incremented
outputs for 1D-DCT with N cycles. In all the above mentioned (initially s32 , s16 , s8 , s4 , and s2 are equal to 0) during each
existing architectures, add-shift network based multipliers are cycle using 5, 4, 3, 2, and 1-bit up counters respectively.
used. Therefore, the multiplier involves more number of CLAs So, the one of the operand for the proposed multiplier will
(carry look ahead adders), which causes to increase in worst be configured (varied) during each cycle. Fig. 3(a), (b), and
path delay. (c) are named together as Block. The critical path depth
mul, pro
of the proposed Block architecture (Tdelay ) is shown in
A. Contribution of this paper equation (4), which is equal to the critical path depth of the
The multiplier unit used in the latest N -point Integer DCT proposed multiplier in the N -point Integer DCT. The total
architectures is in the form of add-shift network, whereas in number of CSA levels used for the proposed N -point Integer
the proposed architecture, signed configurable carry save adder DCT is log2 log2 N . Here, T (csa) and T (cla) are the critical
tree [11] is used. Therefore, the depth of the architecture falls path depth of carry save adder and carry look ahead adder
within the bounds of O(log2 N ). The proposed 1D architecture respectively. If se = 0, 1, 2, 3, and 4, then 32, 16, 8, 4,
is used to perform one N -point or multiple N2 , N4 , ...2-point and 2-point Integer DCTs will be performed respectively. The
Integer DCTs in parallel. The performance results show that output from the Block is {oi s, oi }. Therefore, 32 numbers of
the proposed architecture gives better performance compared Blocks are required to obtain one output of 1D-Integer DCT.
with existing architectures using 45 nm CMOS TSMC library. Fig. 4 shows the overall architecture of proposed 32-point
The rest of the paper is organized as follows, Section II 1D-Integer DCT, where the inputs are from 32 numbers
elaborates the proposed architecture for Integer DCT. Design of Blocks as shown in Fig. 3. Therefore, log2 32 = 5
modeling, implementation, and results are stated in Section levels of signed fixed point adders are used. Therefore,
add, pro
III, followed by a Section IV as conclusion. the critical path depth of the signed adder tree (Tdelay )
used in the N -point proposed Integer DCT architecture
II. T HE PROPOSED ARCHITECTURE FOR I NTEGER DCT is (log2 N )T (add), which is shown in (5). Here, T (add)
Fig. 3 shows the proposed block architecture used for 32-point represents the critical path depth of the signed adder. The
1D-Integer DCT. In 32-point 1D-Integer DCT, the co-efficient proposed 32-point 1D architecture is used to perform one
matrix is in the size of 32×32. The input signal sample values 32-point or two 16-point or four 8-point or eight 4-point or
should be multiplied with the co-efficient, which forms the sixteen 2-point Integer DCTs in parallel. The 32-point Integer
matrix-vector multiplier. In all the existing architectures, the DCT output is {ou32 s, ou32 }. The 16-point Integer DCT
add-shift network based multiplier is used. So, the delay of outputs are {ou160 s, ou160 } and {ou161 s, ou161 }. The 8-
the multiplier is based on the number of adders used in the point Integer DCT outputs are {ou80 s, ou80 }, {ou81 s, ou81 },
add-shift network. In the proposed architecture, configurable {ou82 s, ou82 }, and {ou83 s, ou83 }. The 4-point Integer DCT
carry save adder (CSA) tree based multiplier is used. Fig. 3(a) outputs are {ou40 s, ou40 }, {ou41 s, ou41 }, {ou42 s, ou42 },
shows the series of multiplexers used for configurable carry {ou43 s, ou43 }, {ou44 s, ou44 }, {ou45 s, ou45 },
save addition based multiplication in the proposed architec- {ou46 s, ou46 }, and {ou47 s, ou47 }. The 2-point Integer DCT
ture. The maximum number of values to be added in the outputs are {ou20 s, ou20 }, {ou21 s, ou21 },...{ou215 s, ou215 }.
configurable carry save addition based 32-point Integer DCT Fig. 4(b) shows the 32 X 32-Buffer architecture, where 32
is log2 N = log2 32 = 5. For example, the multiplication of numbers of 1 × 32-Buffers are used. The 1 × 32-Buffer inputs
the co-efficient 87 with the input signal sample value xi is are the outputs from the column of 5-to-1 multiplexers, with
equal to 87xi = 64xi + 16xi + 4xi + 2xi + xi . The minimum select line se. Here, se = 0, 1, 2, 3, and 4 for 32, 16, 8, 4,
number of values to be added in the configurable carry save and 2-point Integer DCTs respectively. Each 1 × 32-Buffer is
addition based 32-point Integer DCT is 1. For example, the made up of 32 numbers of registers and 2-to-1 multiplexers
multiplication of the co-efficient 4 with the input signal sample with common select line. The select lines used in the
value xi is equal to 4xi = 4xi + 0xi + 0xi + 0xi + 0xi . 1 × 32-Buffers 0, 1, ... 30, and 31 are en0 , en1 ,...en30 , and
So, the corresponding left-shifted (power of two) input signal en31 respectively. The output from Fig. 4(a) can be stored at
values are sent as the input of the series of multiplexers used in one particular 1 × 32-Buffer with corresponding select line
Fig. 3(a), which is named as Cell. The maximum possible cells as 1. The 1 × 32-Buffer architecture is shown in Fig. 5. The
used to obtain one multiplication result is 5. Therefore, five outputs of ith 1 × 32-Buffer are b32 i, b16 i, b8 i, b4 i, and b2 i,
Cells are used in Fig. 3(b). So, the maximum possible levels which are the resultants of 32, 16, 8, 4, and 2-point Integer
of the configurable carry save adder (CSA) tree is log2 5 = 3. DCTs respectively. Here, eni = 0 to maintain the values (32
The Sum and Carry from the final carry save adder are added values) stored in the buffer and eni = 1 if the the new value

122
Fig. 3. The proposed block architecture (Block) used for 32-point 1D-Integer DCT with (a) Series of multiplexers used for configurable carry save addition
based multiplication (Cell) (b) configurable carry save adder tree based multiplication unit (c) Series of multiplexers used to find the resultant sign bits for
the multiplication.
TABLE I
T HEORETICAL ANALYSIS OF VARIOUS ARCHITECTURES FOR I NTEGER DCT

N = 32 N = 16 N = 8 N = 4 N = 2 Critical path depth No. of cycles

N -point 1D Odd even [3] YES YES YES YES NO (1 + log2 N2 )T (add)+T (add-shif t)+T (mux) 1
N -point 1D [4] YES YES YES YES NO (log2 N )T (add)+T (add-shif t)+T (mux) N
N -point 1D [5] NO NO YES NO NO (log2 N )T (add)+T (add-shif t)+T (mux) 1
N -point 1D [6] NO NO YES NO NO (log2 N )T (add)+T (add-shif t)+T (mux) N
N -point 1D [7] NO NO YES NO NO (log2 N )T (add)+T (add-shif t)+T (mux) N
N -point 1D [8] YES YES YES YES NO T (add-shif t)+T (mux)+T (add) N
N -point 1D [10] YES YES YES YES NO (log2 N )T (add)+T (add-shif t) 1
N -point 1D Proposed YES YES YES YES YES (log2 N )T (add)+T (cla)+T (mux)+
(log2 log2 N )T (csa) N
N X N -point 2D Folded/Parallel [3] YES YES YES YES NO (1 + log2 N2 )T (add)+T (add-shif t)+T (mux) 2N
N X N -point 2D Folded/Parallel [4] YES YES YES YES NO (log2 N )T (add)+T (add-shif t)+T (mux) 2N 2
N X N -point 2D Parallel [5] NO NO YES NO NO (log2 N )T (add)+T (add-shif t)+T (mux) 2N
N X N -point 2D Folded/Parallel [8] YES YES YES YES NO T (add-shif t)+T (mux)+T (add) 2N 2
N X N -point 2D Parallel [9] YES YES YES YES NO T (add-shif t)+T (add)+T (mux) 2N 2
N X N -point 2D Parallel [10] YES YES YES YES NO (log2 N )T (add)+T (add-shif t)+T (mux) 2N
N X N-point 2D Folded/Parallel YES YES YES YES YES (log2 N )T (add)+T (cla)+T (mux)+
Proposed (log2 log2 N )T (csa) 2N 2
T (add), T (mux), T (cla), T (csa), and T (add-shif t) are the critical path depth of signed fixed point adder, multiplexer, recursive doubling
based carry look ahead adder, carry save adder, and add-shift network based multiplier respectively.

Integer DCT, pro mul, pro add, pro

is arrived from input. Tdelay = T (mux) + Tdelay + Tdelay (6)
In the Buffer architecture, the shaded boxes represent the ( N
) ( N
× N
)
clocked registers. The critical path depth for the proposed MN,2kpro = M(N2k×N2),k pro = 2k ; k = 0, 1, 2, ...(log2 N ) − 1
Integer DCT, pro
N -point Integer DCT (Tdelay ) is shown in (6). (7)
N
N ( )
The equation (7) shows the number of 2k
-point (MN,2kpro ) III. D ESIGN M ODELING , I MPLEMENTATION , AND R ESULTS
N N
( k× k)
and number of ( 2Nk × 2Nk )-point (M(N2 ×N2), pro ) Integer DCTs All the existing and proposed designs are modeled in Verilog
using proposed N -point 1D and N ×N -point 2D architectures HDL. These Verilog HDL models are simulated and verified
respectively. Here, T (mux) is the critical path depth for using Xilinx ISE simulator. The timing, area, total number
multiplexers used in the proposed architecture. The proposed of cells, and power analysis of this implementation are done
N -point 1D and N × N -point 2D Integer DCTs require N with Cadence 6.1 ASIC design tool. All the designs are
and 2N 2 cycles to complete the operation respectively. Here, implemented for 45 nanometer technology, where the library
the row and column process will take N 2 cycles for each. tcbn45gsbwpbc088 ccs.lib is used. Here, the operating volt-
age is 0.88v. In general, performance of a circuit depends
mul, pro
Tdelay = T (cla) + (log2 log2 N )T (csa) (4) on circuit delay, circuit area, and power dissipation. The
worst path circuit delay is defined as the path from input
add, pro
Tdelay = (log2 N )T (add) (5) to output with largest (worst path) delay in the circuit. The

123
Fig. 4. VLSI architectures for (a) proposed 32-point 1D-Integer DCT (b) 32 X 32-Buffer

Fig. 5. 1 × 32-Buffer architecture

careful optimization in these parameters will ensure the highest area, net power, and power delay product (PDP) or energy per
performance. Table I shows the theoretical analysis of various operation [12] between various 1D and 2D Integer DCT ar-
Integer DCT architectures, where add-shif t network based chitectures. The PDP stands for the average energy consumed
multipliers along with adders are the part of critical path in per switching event and it is apparent from the units (W.s =
existing designs while the CSA based multipliers along with Joule). The PDP can be easily calculated by multiplying worst
adders are the part of critical path in proposed designs. Also, path delay with sum of switching and leakage powers. The
Table I shows the possible length (32 or 16 or 8 or 4 or 2- proposed 32 × 32-point parallel Integer DCT achieves 59.1%
point), critical path depth, and number of cycles of various N of improvement in worst path delay compared with odd-even
and N × N -point Integer DCTs. decomposition [3] based architecture because regular adders
Table II shows the comparison of worst path delay, total are used in [3], whereas in proposed technique, CSA based

124
TABLE II
P ERFORMANCE ANALYSIS OF DIFFERENT ARCHITECTURES FOR I NTEGER DCT WITH INPUT SIGNAL SAMPLE VALUES AS 8- BITS WIDE WITH 45 nm
CMOS TECHNOLOGY.

Worst path Frequency Total area Total no. Net power Switching Leakage EOP
1D/2D Integer DCT architecture delay (ps) (M Hz) (µm2 ) of cells (nw) power (nw) power (nw) (f J)
32-point 1D Odd even [3] 3026.2 330.4 83051.3 64868 1623391.2 5274320.3 3515339.8 26599.2
32-point 1D [4] 1560.9 640.6 67379.3 57839 731229.9 2929816.4 4470877.4 11551.7
8-point 1D [5] 1768.4 565.6 36795.2 35579 499123.2 1781991.2 2567233.2 7691.1
8-point 1D [6] 1167.1 856.8 30685.1 21569 461001.2 1311001.1 1142243.1 2863.1
8-point 1D [7] 1682.2 594.5 33588.2 31168 485291.5 1671071.7 2340745.8 6748.6
32-point 1D [8] 1587.4 630.1 81836.2 52111 853460.1 2796111.6 4384529.6 11398.5
32-point 1D [10] 1889.4 529.2 89845.3 66789 1832311.1 5424219.3 3835311.8 17494.9
32-point 1D Proposed 1399.7 714.4 42810.2 42578 517698.2 2218746.4 3333070.4 7770.8
32 X 32-point 2D Folded [3] 3967.8 252.0 361980.2 211072 3140026.7 11773025.1 17121276.8 114646.8
32 X 32-point 2D Folded [4] 1568.8 637.7 265778.1 65140 889125.9 7893432.2 10009573.4 28086.2
32 X 32-point 2D Folded [8] 1773.9 564.0 321985.1 172032 2054512.9 8343453.9 14794677.3 41044.7
32 X 32-point 2D Folded Proposed 1755.1 569.8 164754.3 57839 731227.5 3620249.4 6767937.4 18232.3
32 X 32-point 2D Parallel [3] 3835.0 260.7 441948.4 223040 3194824.4 13092679.2 18148370.2 119809.4
32 X 32-point 2D Parallel [4] 1568.1 637.7 367075.3 156717 5918420.1 9003839.2 10125521.3 29996.7
8 X 8-point 2D Parallel [5] 1762.9 567.2 170122.1 93829 1454342.1 4731477.9 9007501.5 24220.4
32 X 32-point 2D Parallel [8] 1589.0 629.3 401226.1 218432 2612410.6 10386085.2 19721696.1 47841.2
32 X 32-point 2D Parallel [9] 2256.3 443.2 385511.7 219539 2706847.8 10496871.8 18285502.7 64941.6
32 X 32-point 2D Parallel [10] 1899.5 526.4 467981.2 237872 3314227.7 14312679.2 20226511.2 65607.1
32 X 32-point 2D Parallel Proposed 1569.2 637.3 269967.8 131798 1315835.7 6982468.5 11017179.3 28245.0

adders are used. The architectures shown in [5], [6], and [7]
require less area than proposed design because these existing
techniques are only for 8-point Integer DCT operation. The
parallel 2D architectures [4] and [8] achieve high performance
than proposed design but the area of those existing techniques
are greater than proposed design because of parallel refinement
units and accumulators respectively. Since the critical path of
[8] includes only one accumulator, the critical path delay of [8]
is less than other existing designs. Fig. 6 shows the chip layout
diagram for proposed folded 32 × 32-point 2D-Integer DCT
architecture using 45-nm technology. The main difference
between the proposed parallel and folded architectures is the
number of clock cycles and area. In parallel architecture, total Fig. 6. Chip layout diagram for proposed 32×32-point 2D-Integer DCT using
folded architecture with core area as 181229.7µm2 , die space around core
area is greater than folded. In folded architecture, number of as 60µm, and total chip area as 235904.49µm2 using 45 nm technology.
clock cycles is greater than parallel. Therefore, the parallel
architecture can be used in the applications, where time opti-
mization (high throughput) is primary goal (Example - Super
Computer). Similarly, the folded architecture can be used in TSMC library. The proposed 32 × 32-point parallel Integer
the applications, where area optimization is the primary goal DCT achieves 59.1% of improvement in worst path delay
(Example - Handheld devices). compared with odd-even decomposition [3] based architecture.

IV. C ONCLUSION R EFERENCES

In this paper, high performance VLSI architecture for integer [1] Mohamed Asan Basiri M and Noor Mahammad Sk, “Multimode Par-
discrete cosine transform (DCT) is proposed that are used in allel and Folded VLSI Architectures for 1D-Fast Fourier Transform”,
real time high efficiency video coding (HEVC) applications. Integration, the VLSI Journal, Elsevier, vol. 55, pp. 43-56, Sept. 2016.
[2] Fei Liang, Xiulian Peng, and Jizheng Xu2, “A light-weight HEVC
Here, the multiplier is designed with configurable carry save encoder for image coding”, IEEE International Conference on Visual
adder tree and hence the depth of the circuit is within the Communications and Image Processing (VCIP), pp. 1-5, Nov. 2013.
bounds of O(log2 N ). The proposed 1D Integer DCT is used [3] Pramod Kumar Meher, Sang Yoon Park, Basant Kumar Mohanty, Khoon
Seong Lim, and Chuohao Yeo,, “Efficient Integer DCT Architectures
to perform one N -point or multiple N2 , N4 , ...2-point transfor- for HEVC”, IEEE Transactions on Circuits and Systems for Video
mations in parallel. The proposed 1D architecture is used to Technology, vol. 24, no. 1, pp. 168- 178, Jan. 2014.
design 2D folded and parallel designs. The performance results [4] Pai-Tse Chiang and Tian Sheuan Chang, “A Reconfigurable Inverse
Transform Architecture Design for HEVC Decoder”, IEEE International
show that the proposed architecture gives good improvement Symposium on Circuits and Systems (ISCAS), pp. 1006-1009, May
as compared with existing architectures using 45 nm CMOS 2013.

125
[5] Honggang Qi, Qingming Huang, and Wen Gao, “A Low-Cost Very Large (ISCAS), pp. 2511-2514, June 2014.
Scale Integration Architecture for Multi Standard Inverse Transform”, [9] Hong Liang, He Weifeng, Zhu Hu, and Mao Zhigang, “A Cost Effective
IEEE Transactions on Circuits and Systems - II, Express Briefs, vol. 2-D Adaptive Block Size IDCT Architecture for HEVC Standard”,
57, no. 7, pp. 551-555, July 2010. IEEE 56th International Midwest Symposium on Circuits and Systems
[6] Khan Wahid, Muhammad Martuza, Mousumi Das, and Carl McCrosky, (MWSCAS), pp. 1290-1293, Aug. 2013.
“Resource Shared Architecture of Multiple Transforms for Multiple Video [10] Wenjun Zhao, Takao Onoye, and Tian Song, “High-Performance Mul-
Codecs”, IEEE International Canadian Conference on Electrical and tiplierless Transform Architecture for HEVC”, IEEE International Sym-
Computer Engineering (CCECE), pp. 947-950, May 2011. posium on Circuits and Systems, pp. 1668-1671, May 2013.
[11] Mohamed Asan Basiri M and Noor Mahammad Sk, “An Efficient VLSI
[7] Kanwen Wang, Jialin Chen, Wei Cao, Ying Wang, Lingli Wang, and
Architecture for Discrete Hadamard Transform”, IEEE International
Jiarong Tong, “A Reconfigurable Multi-Transform VLSI Architecture
VLSI Design Conference, pp. 140-145, Jan. 2016.
Supporting Video Codec Design”, IEEE Transactions on Circuits and
[12] Ricardo Gonzalez, Benjamin M. Gordon, and Mark A. Horowitz,
Systems - II, Express Briefs, vol. 58, no. 7, pp. 432-436, July 2011.
“Supply and Threshold Voltage Scaling for Low Power CMOS”, IEEE
[8] Yao Ziyou, He Weifeng, Hong Liang, He Guanghui, and Mao Zhigang, Journal of Solid State Circuits, vol. 32, no. 8, pp. 1210-1216, Aug. 1997.
“Area and Throughput Efficient IDCT/IDST Architecture for HEVC
Standard”, IEEE International Symposium on Circuits and Systems

126

Dimplex WPM 2004 Engl
100% (1)
Dimplex WPM 2004 Engl
60 pages
The Feasibility of Vegetable Oil and Sal
No ratings yet
The Feasibility of Vegetable Oil and Sal
22 pages
How Do Vestas Manufacture Nacelles - PE Rev3
No ratings yet
How Do Vestas Manufacture Nacelles - PE Rev3
67 pages
sangyoonpark2013DCT
No ratings yet
sangyoonpark2013DCT
4 pages
G Nageshwara Reddy - 13MVD1036
No ratings yet
G Nageshwara Reddy - 13MVD1036
8 pages
ASIC BASED DCT2016
No ratings yet
ASIC BASED DCT2016
5 pages
32 DCT
No ratings yet
32 DCT
57 pages
Two Dimensional DCTIDCT Architecture 2001
No ratings yet
Two Dimensional DCTIDCT Architecture 2001
29 pages
Low Power DCT Architecture For Image/Video Coders: IPASJ International Journal of Electronics & Communication (IIJEC)
No ratings yet
Low Power DCT Architecture For Image/Video Coders: IPASJ International Journal of Electronics & Communication (IIJEC)
10 pages
VLSI Architecture For DCT Based On High Quality DA: Urbi Sharma, Tarun Verma, Rita Jain
No ratings yet
VLSI Architecture For DCT Based On High Quality DA: Urbi Sharma, Tarun Verma, Rita Jain
4 pages
Efficient Implementation of Low Power 2-D DCT Architecture
No ratings yet
Efficient Implementation of Low Power 2-D DCT Architecture
6 pages
Efficient Area and Delay Integer DCT Architecture Using Modified Transbuffer Implemented On Fpga
No ratings yet
Efficient Area and Delay Integer DCT Architecture Using Modified Transbuffer Implemented On Fpga
5 pages
A Low-Power, High-Speed DCT Architecture For Image Compression: Principle and Implementation
No ratings yet
A Low-Power, High-Speed DCT Architecture For Image Compression: Principle and Implementation
6 pages
Gupta 2016
No ratings yet
Gupta 2016
5 pages
DCT Thesis
No ratings yet
DCT Thesis
12 pages
Bit-Plane Decomposition Matrix-Based VLSI
No ratings yet
Bit-Plane Decomposition Matrix-Based VLSI
57 pages
FPGA Based Implementation of 2D Discrete Cosine Transform Algorithm
No ratings yet
FPGA Based Implementation of 2D Discrete Cosine Transform Algorithm
13 pages
Vlsi Implementation of Integer DCT Architectures For Hevc in Fpga Technology
No ratings yet
Vlsi Implementation of Integer DCT Architectures For Hevc in Fpga Technology
12 pages
Poplin Dwight 1997
No ratings yet
Poplin Dwight 1997
76 pages
Artigo Científico
No ratings yet
Artigo Científico
4 pages
Reconfigurable DCT Architecture Using Vector Scaling
No ratings yet
Reconfigurable DCT Architecture Using Vector Scaling
13 pages
Subramanian 2010
No ratings yet
Subramanian 2010
4 pages
Wu Icip08
No ratings yet
Wu Icip08
4 pages
[email protected]
No ratings yet
[email protected]
6 pages
Fast Calculation of 8 8 Integer DCT in The Software Implementation of H.264/Avc
No ratings yet
Fast Calculation of 8 8 Integer DCT in The Software Implementation of H.264/Avc
9 pages
A Scalable Approximate DCT Architecture For Efficient Hevc Compliant Video Coding
No ratings yet
A Scalable Approximate DCT Architecture For Efficient Hevc Compliant Video Coding
13 pages
Area and Power Efficient DCT Architecture For Image Compression
No ratings yet
Area and Power Efficient DCT Architecture For Image Compression
9 pages
jiazhu2013
No ratings yet
jiazhu2013
4 pages
High-Efficiency and Low-Power Architectures For 2-D DCT and IDCT Based On CORDIC Rotation
No ratings yet
High-Efficiency and Low-Power Architectures For 2-D DCT and IDCT Based On CORDIC Rotation
6 pages
2 - FPGA Implementation of Pipelined 2D-DCT and Quantization Architecture For JPEG Image Compression.
No ratings yet
2 - FPGA Implementation of Pipelined 2D-DCT and Quantization Architecture For JPEG Image Compression.
6 pages
binDCT VLSI
No ratings yet
binDCT VLSI
14 pages
Efficient Hardware Implementation of Hybrid Cosine-Fourier-Wavelet Transforms On A Single FPGA
No ratings yet
Efficient Hardware Implementation of Hybrid Cosine-Fourier-Wavelet Transforms On A Single FPGA
4 pages
IET Image Processing - 2015 - Pastuszak - Hardware architectures for the H 265 HEVC discrete cosine transform (1)
No ratings yet
IET Image Processing - 2015 - Pastuszak - Hardware architectures for the H 265 HEVC discrete cosine transform (1)
11 pages
Potluri 2014
No ratings yet
Potluri 2014
14 pages
9.4 Slides
No ratings yet
9.4 Slides
9 pages
DCT/IDCT Implementation With Loeffler Algorithm
No ratings yet
DCT/IDCT Implementation With Loeffler Algorithm
5 pages
A Hybrid Transformation Technique For Advanced Video Coding: M. Ezhilarasan, P. Thambidurai
No ratings yet
A Hybrid Transformation Technique For Advanced Video Coding: M. Ezhilarasan, P. Thambidurai
7 pages
Image Compression Using High Efficient Video Coding (HEVC) Technique
No ratings yet
Image Compression Using High Efficient Video Coding (HEVC) Technique
3 pages
wenjunzhao2013
No ratings yet
wenjunzhao2013
4 pages
Systematic Approach of Fixed Point 8x8 IDCT and DCT
No ratings yet
Systematic Approach of Fixed Point 8x8 IDCT and DCT
6 pages
The Discrete Cosine Transform
No ratings yet
The Discrete Cosine Transform
15 pages
Discrete Cosine Transform and Quantization Processor and Inverse Quantization Inverse Discrete Cosine Transform Processor and Controller
No ratings yet
Discrete Cosine Transform and Quantization Processor and Inverse Quantization Inverse Discrete Cosine Transform Processor and Controller
1 page
DCT Presentation1
100% (1)
DCT Presentation1
39 pages
DCT
No ratings yet
DCT
39 pages
Design and Implementation of A High-Speed, Low-Power VLSI Chip For The DCT Transform
No ratings yet
Design and Implementation of A High-Speed, Low-Power VLSI Chip For The DCT Transform
34 pages
DCT
No ratings yet
DCT
17 pages
Dctinfpga
No ratings yet
Dctinfpga
85 pages
DCT Theory and Application
No ratings yet
DCT Theory and Application
32 pages
Mini Project: Fpga Implementation of 2D DCT
No ratings yet
Mini Project: Fpga Implementation of 2D DCT
16 pages
Application: The DCT and JPEG Image and Video Processing Dr. Anil Kokaram Anil - Kokaram@tcd - Ie
No ratings yet
Application: The DCT and JPEG Image and Video Processing Dr. Anil Kokaram Anil - Kokaram@tcd - Ie
24 pages
A Hybrid Transformation Technique For Advanced Video Coding: M. Ezhilarasan, P. Thambidurai
No ratings yet
A Hybrid Transformation Technique For Advanced Video Coding: M. Ezhilarasan, P. Thambidurai
7 pages
Progress Report On Project Phase-1first Oral Review: Radix-2 DCT Algorithm
No ratings yet
Progress Report On Project Phase-1first Oral Review: Radix-2 DCT Algorithm
12 pages
Architecture For Efficient Implementation of 3 - D DCT - Ii
No ratings yet
Architecture For Efficient Implementation of 3 - D DCT - Ii
6 pages
A_Multiplier-Free_Discrete_Cosine_Transform_Architecture_Using_Approximate_Full_Adder_and_Subtractor
No ratings yet
A_Multiplier-Free_Discrete_Cosine_Transform_Architecture_Using_Approximate_Full_Adder_and_Subtractor
4 pages
Cintra Et Al (2014) - DCT Approximations Based On Integer Functions
No ratings yet
Cintra Et Al (2014) - DCT Approximations Based On Integer Functions
14 pages
DCT Haweel 17 2016
No ratings yet
DCT Haweel 17 2016
31 pages
Polynomial Transform Based DCT Implementation
No ratings yet
Polynomial Transform Based DCT Implementation
5 pages
Analog Dialogue, Volume 48, Number 1: Analog Dialogue, #13
From Everand
Analog Dialogue, Volume 48, Number 1: Analog Dialogue, #13
Analog Dialogue
4/5 (1)
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
From Everand
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
Fouad Sabry
No ratings yet
Foundations of Image Science
From Everand
Foundations of Image Science
Harrison H. Barrett
No ratings yet
Digital Image Processing: Fundamentals and Applications
From Everand
Digital Image Processing: Fundamentals and Applications
Fouad Sabry
No ratings yet
Digital Engineering: Complex System Design
From Everand
Digital Engineering: Complex System Design
S Mathioudakis
No ratings yet
Advances in Chemical Physics
From Everand
Advances in Chemical Physics
Stuart A. Rice
No ratings yet
10 1109@ispa 2003 1296436
No ratings yet
10 1109@ispa 2003 1296436
6 pages
Thakur 2017
No ratings yet
Thakur 2017
5 pages
Kouadria 2019
No ratings yet
Kouadria 2019
4 pages
Kratochvil
No ratings yet
Kratochvil
4 pages
Reliability of Objective Picture Quality Measures: Sonja Grgi C - Mislav Grgi C - Marta Mrak
No ratings yet
Reliability of Objective Picture Quality Measures: Sonja Grgi C - Mislav Grgi C - Marta Mrak
8 pages
IWSSIM
No ratings yet
IWSSIM
14 pages
Video Coding
No ratings yet
Video Coding
19 pages
TIP SSIM MathProperties
No ratings yet
TIP SSIM MathProperties
12 pages
Peculiarities of 3D Compression of Noisy Multichannel Images
No ratings yet
Peculiarities of 3D Compression of Noisy Multichannel Images
4 pages
Karen Egiazarian ( ), Jaakko Astola ( ), Nikolay Ponomarenko ( ), Vladimir Lukin ( ), Federica Battisti ( ) and Marco Carli ( )
No ratings yet
Karen Egiazarian ( ), Jaakko Astola ( ), Nikolay Ponomarenko ( ), Vladimir Lukin ( ), Federica Battisti ( ) and Marco Carli ( )
4 pages
Yao 2007
No ratings yet
Yao 2007
4 pages
4-2011-Integer Paket Wavelet Compression Performance
No ratings yet
4-2011-Integer Paket Wavelet Compression Performance
3 pages
CADWorx Plant 2016 Enhancement
No ratings yet
CADWorx Plant 2016 Enhancement
17 pages
Computer Science Practical Manual 24-25
No ratings yet
Computer Science Practical Manual 24-25
28 pages
Module 1-18EE53-Notes
No ratings yet
Module 1-18EE53-Notes
20 pages
Ball Intium 2 Processor
No ratings yet
Ball Intium 2 Processor
15 pages
Band Skill Experience Range Sbu-Vbu - Cbu Name
No ratings yet
Band Skill Experience Range Sbu-Vbu - Cbu Name
12 pages
Kenmore/Minnesota "E" 117.48 Sewing Machine Instruction Manual
No ratings yet
Kenmore/Minnesota "E" 117.48 Sewing Machine Instruction Manual
13 pages
Winston Churchill: The Wartime Prime Minister
No ratings yet
Winston Churchill: The Wartime Prime Minister
24 pages
Angono Petroglyphs: Ethnic Tradition
No ratings yet
Angono Petroglyphs: Ethnic Tradition
5 pages
The Indian Institute of Welding-ANB: Declaration For Online Transition Arrangement of Iwe/Iwt / Iws / Iwp
No ratings yet
The Indian Institute of Welding-ANB: Declaration For Online Transition Arrangement of Iwe/Iwt / Iws / Iwp
2 pages
Nokia BSC Commands
100% (1)
Nokia BSC Commands
3 pages
IFS en Ready 2 Go System Brochure
No ratings yet
IFS en Ready 2 Go System Brochure
36 pages
Pamumuhay Sa Kanayunan Case Study On Des
No ratings yet
Pamumuhay Sa Kanayunan Case Study On Des
65 pages
ISO ASTM 52628-20
No ratings yet
ISO ASTM 52628-20
13 pages
IDP Brochure V8
No ratings yet
IDP Brochure V8
3 pages
Ethics in Research
No ratings yet
Ethics in Research
2 pages
Decision Making Process
No ratings yet
Decision Making Process
11 pages
(SP-31) - Memorial For Plaintiffs.
No ratings yet
(SP-31) - Memorial For Plaintiffs.
39 pages
30 1235 Honda CRV 02 06 Instal Carid
No ratings yet
30 1235 Honda CRV 02 06 Instal Carid
4 pages
Kalimat Simple Past Tense Positif Negatif Interogratove
No ratings yet
Kalimat Simple Past Tense Positif Negatif Interogratove
5 pages
Metallurgy & Material Science Lab Manual
No ratings yet
Metallurgy & Material Science Lab Manual
42 pages
TN TRB Aeeo Syllabus
No ratings yet
TN TRB Aeeo Syllabus
3 pages
1414744276140LPH H1 2010 KualaLumpur PDF
No ratings yet
1414744276140LPH H1 2010 KualaLumpur PDF
84 pages
Data Manual: PCI2050B
No ratings yet
Data Manual: PCI2050B
88 pages
Force12 C3-C3e Manual
No ratings yet
Force12 C3-C3e Manual
27 pages
Assam Budget 2020-21 Speech-English-Edited PDF
No ratings yet
Assam Budget 2020-21 Speech-English-Edited PDF
126 pages
Rms Energy Filter
No ratings yet
Rms Energy Filter
5 pages
Asco Progress Academy P
No ratings yet
Asco Progress Academy P
2 pages

High Performance Integer DCT Architectures For Hevc: Mohamed Asan Basiri M, Noor Mahammad SK

Uploaded by

High Performance Integer DCT Architectures For Hevc: Mohamed Asan Basiri M, Noor Mahammad SK

Uploaded by

2017 30th International Conference on VLSI Design and 2017 16th International Conference on Embedded Systems

High Performance Integer DCT Architectures for

Abstract—This paper proposes an efficient VLSI architecture

2380-6923/16 $31.00 © 2016 IEEE 121

N = 32 N = 16 N = 8 N = 4 N = 2 Critical path depth No. of cycles

Integer DCT, pro mul, pro add, pro

Fig. 5. 1 × 32-Buffer architecture

IV. C ONCLUSION R EFERENCES

You might also like