A High Speed and Low Power Flip-Flop Design Using Topologically Compressed Technique
A High Speed and Low Power Flip-Flop Design Using Topologically Compressed Technique
Abstract—An extremely low-power flip-flop (FF) named Finally, in Section VII, we show the way to apply the
topologically-compressed flip-flop (TCFF) is proposed. As proposed FF effectively to various systems in view of
compared with conventional FFs, the FF reduces power power and performance.
dissipation by 75%. This power reduction ratio is the highest
among FFs that have been reported so far. The reduction is
achieved by applying topological compression method,
merger of logically equivalent transistors to an
unconventional latch structure. The very small number of
transistors connected to clock signal reduces the power
drastically, and the smaller transistor count assures the
reduces cell area as conventional FFs. Inaddition, fully
static full swing operation makes the cell tolerant of supply
voltage and input slew variation. An experimental chip
design with 25nm CMOS technology shows that almost all Fig.1.Conventional transmission-gate flip-flop(TGFF).
conventional FFs are replaceable with proposed FF while
preserving increasing system performance and reduces the
layout area.
I. INTRODUCTION
www.ijltemas.in Page 90
Volume III, Issue XI, November 2014 IJLTEMAS ISSN 2278 - 2540
variation. Thus, their optimization is relatively difficult,
and performance degradation across various process
corners is a concern.
Let us summarize the analysis on previously reported low
power FFs. For DiffFF [1] and XCFF [7], pre-charge
operation is a concern especially in lower data activity. As
regards CCFF [4], its cell area becomes a bottleneck to
use. And for ACFF
Fig.3.Conditional-clocking flip-flop(CCFF).
Fig.4.Cross-charge control flip-flop(XCFF). In order to reduce the power of the FF while keeping
competitive performance and similar cell area, we tried to
reduce the transistor count, especially those operating with
clock signals, without introducing any dynamic or pre-
charge circuit. The power of the FF is mostly dissipated in
the operation of clock-related transistors, and reduction of
transistor count is effective to avoid cell area increase and
to reduce load capacitance in internal nodes. In the
conventional FF shown in Fig. 1, there are 12 clock-related
transistors. To reduce clock-related transistor counts
directly from this circuit is quite difficult. One reason is
because transmission-gates need a 2-phase clock signal,
Fig.5.Adaptive-coupling flip-flop(ACFF).
thus the clock driver cannot be eliminated. Another reason
And mainly due to this size issue, it becomes hard to use is that transmission-gates should be constructed by both
if the logic area is relatively large in the chip. PMOS and NMOS to avoid degradation of data transfer
Fig. 4 shows the circuit of cross-charge control FF characteristics caused by single-channel MOS usage.
(XCFF) [7]. The feature of this circuit is to drive output Therefore, instead of transmission-gate type circuit, we start
transistors separately in order to reduce charged and with a combinational type circuit as shown in Fig. 6. To
discharged gate capacitance. However, in actual reduce the transistor-count based on logical equivalence,
operation, some of the internal nodes are pre-set with we consider a method consisting of the following two steps.
clock signal in the case of data is high, and this operation As the first step, we plan to have a circuit with two or more
dissipates extra power to charge and discharge internal logically equivalent AND or OR logic parts which have the
nodes. As a result, the effect of power reduction will same input signal combination, especially including clock
decrease. Circuits including pre-set operation have the signal as the input signals. Then, merge those parts in
same problem [8]. transistor level as the second step.
The adaptive-coupling type FF (ACFF) [9], shown in Fig.
5, is based on a 6-transistor memory cell. In this circuit, IV. PROPOSED TOPOLOGICALLY-COMPRESSED
instead of the commonly used double-channel FLIP-FLOP
transmission-gate, a single channel transmission-gate with
additional dynamic circuit has been used for the data line A. Proposed FF and Transistor Level Compression After
in order to reduce clock-related transistor count. However, investigating many kinds of latch circuits, we have set up
in this circuit, delay is easily affected by input clock slew an unconventionally structured FF, shown in Fig. 7. This
variation because different types of single channel FF consists of different types of latches in the master and
transmission-gates are used in the same data line and the slave parts. The slave-latch is a well-known Reset-Set
connected to the same clock signal. Moreover, (RS) type, but the master-latch is an asymmetrical single
characteristics of single channel transmission-gate circuits data-input type. The feature of this circuit is that it operates
and dynamic circuits are strongly affected by process in single phase clock, and it has two sets of logically
www.ijltemas.in Page 91
Volume III, Issue XI, November 2014 IJLTEMAS ISSN 2278 - 2540
equivalent input AND logic, X1 and Y1, and X2 and Y2.
Fig. 8 shows the transistor-level schematic of Fig. 7. Based
on this schematic, logically equivalent transistors are
merged as follows. For the PMOS side, two transistor pairs
in M1 and S1 blocks in Fig. 8 can be shared as shown in
Fig. 9. When either N3 or CP is Low, the shared common
node becomes VDD voltage level, and N2 and N5 nodes
are controlled by PMOS transistors gated N1 and N4
individually. When both N3 and CP are High, both N2 and
N5 nodes are pulled down to VSS by NMOS transistors
gated N3 and CP. As well as M1 and S1 blocks, two PMOS
transistor pairs in M2
www.ijltemas.in Page 92
Volume III, Issue XI, November 2014 IJLTEMAS ISSN 2278 - 2540
both nodes are in VDD voltage level, and either N2 or N3
is ON. When CP is High, each node is in independent
voltage level as shown in Fig. 12. In consideration of this
behavior, the CP-input transistors are shared and
connected as shown in Fig. 11. The CP-input transistor is
working as a switch to connect S1 and S2.
This process leads to the circuit shown in Fig. 13. This
circuit consists of seven fewer transistors than the original
circuit shown in Fig. 8. The number of clock-related Fig. 16. TCFF with reset type.
transistors is only three. Note that there is no dynamic off, the NMOS transistor connected to CP turns on, and
circuit or pre-charge circuit, thus, no extra power the slave latch becomes the data output mode. In this
dissipation emerges. We call this reduction method condition, the data in the master latch is transferred to the
Topological Compression (TC) method. The FF, TC- slave latch, and then outputted to Q. In this operation, all
Method applied, is called Topologically-Compressed Flip- nodes are fully static and
Flop (TCFF). full-swing. The current from the power supply does not
flow into the master and the slave latch simultaneously
B. Cell Operation because the master latch and the slave latch become active
alternately. Therefore, timing degradation is small on cell
Fig. 14 shows simulation waveforms of the circuit shown
performance even though many transistors are shared with
in Fig. 13. In Fig. 13, when CP is low, the PMOS
no increase in transistor size.
transistor connected to CP turns on and the master latch
becomes the data input mode. Both VD1 and VD2 are
C. Cell Variation
pulled up to power-supply level, and the input data from
D is stored in the master latch. When CP is high, the LSI designs require FFs having additional functions like
PMOS transistor connected to CP turn scan, reset, and set. The performance and cell area for
these cells are also important. TCFF easily realizes these
cells with less transistor-count than conventional FFs. The
circuit diagrams of TCFF with scan, reset, and set are
shown in Figs. 15–17. Each circuit can be designed with
similar structure, and these FFs also have three transistors
connected to CP so the power dissipation is nearly the
same as that of TCFF. Detailed characteristics are shown
in Section V.
Fig. 15. TCFF with scan type. Fig. 18. Power simulation results of TCFF and other FFs
www.ijltemas.in Page 93
Volume III, Issue XI, November 2014 IJLTEMAS ISSN 2278 - 2540
V. PERFORMANCE SIMULATION
www.ijltemas.in Page 94
Volume III, Issue XI, November 2014 IJLTEMAS ISSN 2278 - 2540
TABLE II and cell area are explained. In the next section, how
PERFORMANCE COMPARISONS OF VARIOUS effectively TCFF is applied to actual chip design is shown
TCFFS by placement and routing experiment.
TABLE III
EXPERIMENTAL CHIP LAYOUT DESIGN SUMMARY
www.ijltemas.in Page 95
Volume III, Issue XI, November 2014 IJLTEMAS ISSN 2278 - 2540
executed independently and those results are compared. In various systems especially to a higher speed case in terms
logic synthesis, the power-reduction option is highly of power and performance. Fig. 26 shows the result about
applied. replacement rate of TGFF to TCFF in various clock
Fig. 24 and Table III show the logic synthesis and P&R frequencies. The same front-end cell libraries and netlists as
results at 250 MHz clock frequency. After applying the Section VI are used, and only clock cycle condition is set
library including TCFF, 98% of TGFFs which occupy up from 200 MHz to 333
44% of the random logic are replaced by TCFFs. Almost
all TCFFs meet the timingconstraints despite TCFF
having larger setup time of 105 ps instead of 38 ps of
TGFF as shown in Table I. As regards area size, only
1.3% increased with the TCFF-included library even
though TCFF uses one more metal layer than TGFF in
cell layout. This shows TCFF has no disadvantage in P&R
process.
Fig. 27. Power dissipation for TCFF and the resized TCFF.
www.ijltemas.in Page 96
Volume III, Issue XI, November 2014 IJLTEMAS ISSN 2278 - 2540
clock-related transistors, by changing the size of those [6] B.-S. Kong, S.-S. Kim, and Y.-H. Jun, “Conditional-capture
flip-flop for statistical power reduction,” IEEE J. Solid-State
transistors, performance can be changed. Changing only
Circuits, vol. 36, no. 8, pp. 1263–1271, Aug. 2001.
three transistors in 21 transistors of a TCFF circuit does not [7] A. Hirata, K. Nakanishi, M. Nozoe, and A. Miyoshi, “The
affect cell area much. Table IV shows performance of cross charge control flip-flop: A low-power and high-speed flip-
TGFF, TCFF, and the resized TCFF. In the resized TCFF, flop suitable for mobile application SoCs,” in Symp. VLSI
Circuits Dig. Tech. Papers, 2005, pp. 306–307.
only the three clock-related transistors are doubled in size.
[8] K. Absel, L. Manuel, and R. K. Kavitha, “Low-power dual
Fig. 27 shows the normalized power dissipation for TCFF dynamic node pulsed hybrid flip-flop featuring efficient
and the resized TCFF compared to TGFF. Compared to the embedded logic,” IEEE Trans. VLSI Syst., vol. 21, pp. 1693–
original TCFF, delay and setup time is improved by 5% and 1704, Sep. 2013.
[9] C.-K. Teh, T. Fujita, H. Hara, and M. Hamada, “A 77% energy-
21%, respectively, in the resized TCFF. Power dissipation
saving 22-transistor single-phase-clocking D-flip-flop with
increases 39%, but is still 53% lower than TGFF. Fig. 28 adaptive-coupling configuration in 40 nm CMOS,” in IEEE
shows the result of replacement in 333 MHz clock ISSCC Dig. Tech. Papers, 2011, pp. 338–340.
frequency including the resized TCFF in addition to TGFF [10] J. Yuan and C. Svensson, “High-speed CMOS circuit
technique,” IEEE J. Solid-State Circuits, vol. SC-24, no. 1, pp.
and the original TCFF. Total replacement rate is as much as
62–70, Feb. 1989.
95%, and 88% is replaced by the original TCFF and 7% is [11] H. Kojima, S. Tanaka, and K. Sasaki, “Half-swing clocking
replaced by the resized TCFF. In summary, including a scheme for 75% power saving in clocking circuitry,” IEEE J.
variety of clock-related transistor sizes, TCFF can be Solid-State Circuits, vol. 30, no. 4, pp. 432–435, Apr. 1995.
[12] H. Partovi, R. Burd, U. Salim, F. Weber, L. Digregorio, and D.
applied to various speed systems, and it can reduce whole
Draper, “Flow-through latch and edge triggered flip-flop hybrid
chip power more effectively. elements,” in IEEE ISSCC Dig. Tech. Papers, 1996, pp. 138–
139.
[13] F. Klass, “Semi-dynamic and dynamic flip-flops with
VIII. CONCLUSION embedded logic,” in Symp. VLSI Circuits Dig. Tech. Papers,
1998, pp. 108–109.
[14] V. Stojanovic and V.-G. Oklobdzija, “Comparative analysis of
An extremely low-power FF, TCFF, is proposed with master slave latches and flip-flops for high-performance and
topological compression design methodology. TCFF has low-power systems,” IEEE J. Solid-State Circuits, vol. 34, no.
the lowest power dissipation in almost all range of the data 4, pp. 536–548, Apr. 1999.
activity compared with other low-power FFs. The power [15] N. Nedovic and V.-G. Oklobdzija, “Hybrid latch flip-flop with
improved bower efficiency,” in Proc. Symp. Integr. Circuits
dissipation of TCFF is 75% lower than that of TGFF at 0% Syst. Design, 2000, pp. 211–215.
data activity without area overhead. The topology of TCFF [16] S. Nomura, F. Tachibana, T. Fujita, C.-K. Teh, H. Usui, F.
is easily expandable to various kinds of FFs without Yamane, Y. Miyamoto, C. Kumtornkittikul, H. Hara, T.
performance penalty. Applying to a 250 MHz experimental Yamashita, J. Tanabe, M. Uchiyama, Y. Tsuboi, T. Miyamori,
T. Kitahara, H. Sato, Y. Homma, S. Matsumoto, K. Seki, Y.
chip design with 40 nm CMOS technology, 98% of Watanabe, M. Hamada, and M. Takahashi, “A 9.7 mW AAC-
conventional FFs are replaced by TCFFs. In a whole chip, decoding, 620 mW H.264 720 p 60 fps decoding, 8-core media
17% power reduction is estimated with little overhead of processor with embedded forward-body-biasing and power-
area and timing performance. gating circuit in 65 nm CMOS technology,” in IEEE ISSCC
Dig. Tech. Papers, 2008, pp. 262–263.
ACKNOWLEDGMENT
R.Mohan completed his M.E
Applied Electronics in Karunya
we would like to thank Mr.K.G.Parthiban, ASP Institute of Technology,
&HoD/ECE for the support. Completed B.E Electrical and
Electronics Engineering in
Government College of
REFERENCES Engineering, Coimbatore and
have 11 years of teaching
[1] H. Kawaguchi and T. Sakurai, “A reduced clock-swing flip-flop experience. Now working as a
(RCSFF) for 63% power reduction,” IEEE J. Solid-State associate professor in ECE
Circuits, vol. 33, no. 5, pp. 807–811, May 1998. department in M.P.Nachimuthu
[2] J.-C. Kim, S.-H. Lee, and H.-J. Park, “A low-power half-swing M.Jaganathan Engineering
clocking scheme for flip-flop with complementary gate and College,Erode.
source drive,” IEICE Trans. Electronics, vol. E82-C, no. 9, pp.
1777–1779, Sep. 1999. K.Nanthakumar pursuing M.E
[3] M. Matsui, H. Hara, Y. Uetani, L. Kim, T. Nagamatsu, Y. VLSI Design. and Completed his
Watanabe, A. Chiba, K. Matsuda, and T. Sakurai, “A 200 MHz B.E in Electronics and
13 mm 2-D DCT macrocell using sense-amplifying pipeline Communication Engineering in
flip-flop scheme,” IEEE J. Solid-State Circuits, vol. 29, no. 12, M.P.Nachimuthu M.Jaganathan
pp. 1482–1490, Dec. 1994. Engineering
[4] M. Hamada, H. Hara, T. Fujita, C.-K. Teh, T. Shimazawa, N. College,Erode,Tamil Nadu.
Kawabe, T. Kitahara, Y. Kikuchi, T. Nishikawa, M. Takahashi,
and Y. Oowaki, “A conditional clocking flip-flop for low power
H.264/MPEG-4 audio/ visual codec LSI,” in Proc. IEEE CICC,
2005, pp. 527–530.
[5] Y. Ueda, H. Yamauchi, M. Mukuno, S. Furuichi, M. Fujisawa,
F. Qiao, and H. Yang, “6.33 mW MPEG audio decoding on a
multimedia processor,” in IEEE ISSCC Dig. Tech. Papers,
2006, pp. 1636–1637.
www.ijltemas.in Page 97