0% found this document useful (0 votes)
39 views

Implementation of Bit-Serial Adders Using Robust D

This document summarizes a paper that implements two bit-serial carry save adders using a differential logic style. The adders use a novel flip-flop structure that significantly reduces the number of clocked transistors compared to previous designs. The logic style is robust and suitable for high-speed, low-power operation in both bit-serial and bit-parallel implementations. The adders achieved a maximum clock frequency of 300MHz in a 0.8um process.

Uploaded by

Muhammad Daffa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

Implementation of Bit-Serial Adders Using Robust D

This document summarizes a paper that implements two bit-serial carry save adders using a differential logic style. The adders use a novel flip-flop structure that significantly reduces the number of clocked transistors compared to previous designs. The logic style is robust and suitable for high-speed, low-power operation in both bit-serial and bit-parallel implementations. The adders achieved a maximum clock frequency of 300MHz in a 0.8um process.

Uploaded by

Muhammad Daffa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

IMPLEMENTATION OF BIT-SERIAL ADDERS

USING ROBUST DIFFERENTIAL LOGIC


Magnus Karlsson, Mark Vesterbacka, Lars Wanhammar

Department of Electrical Engineering, Linkšping University, Sweden


Telephone: +46 (0) 13-28 40 59
Fax: +46 (0) 13-13 92 82
E-mail: [email protected], [email protected], [email protected].

In this paper two bit-serial carry save adders are implemented using a recently proposed
differential logic style. The clocking scheme uses a single clock phase with non-precharged
stages of logic that may be merged with the latches or the flip-flops. A novel flip-flop structure
is used in one of the adders, which significantly lowers the number of clocked transistors. The
logic style used in the adder realizations suits high speed and low power operation in both bit-
serial and bit-parallel implementations, since all logic nets are purely in NMOS. The logic style
is also robust for clock slope and yields a data noise margin equal to Vdd/2. The adders reached
a maximal clock frequency of 300 MHz in a 0.8 mm process with a 3.0 V power supply
voltage.

1. INTRODUCTION
In recent years, several differential logic styles have been proposed. In [1] the circuits use only
NMOS transistors in the logic trees but require precharging and a large number of clocked
transistors (four per latch). In [2] the circuit is precharged and combined with a set/reset NAND
pair and the number of clocked transistors is large. A similar circuit is proposed in [3], uses
precharging, both PMOS and NMOS transistors in the logic nets, and a large number of
clocked transistors. The recently proposed Single Transistors Clocked latches (STC) [4] are not
precharged with a minimum clock load consisting of a single clocked transistor per latch, which
makes them interesting. The latches use a novel flip-flop concept, i.e., the non-transparent input
state of the N latch shown in Fig. 1a is used. The outputs of the P latch Fig. 1b are allowed to
go low in its latched state. This leads to only one low-to-high transition in its evaluation phase,
which results in high speed. However, in these latches it is difficult to incorporate the logic due
to problems with charge sharing, but they serve well as flip-flops. Charge sharing in the STC N
latch is due to the common node A of the two NMOS branches. A charged internal node in one
branch can discharge to the other branch through the common node A above the clocked
transistor. The STC P latch shown in Fig. 1b has the same problem if the two branches of logic
share transistors, e.g., a simplified XOR net that shares the nodes between both NMOS
branches. When a charged internal node of the logic nets affects the output nodes charge
sharing may ruin the low output state. The charge sharing problems can cause a static power
dissipation in the next logic stage, thus, making the circuit unsuitable for low power
implementations and unreliable due to possible faults in the logic evaluation. Another problem
with the STC latches, due to the share node A, has been commented by Blair [5].

In another recently proposed logic style, presented in [6], the bottleneck is the P latch, which
has PMOS transistors in the logic as shown in Fig. 2a. Since the logic nets are connected to a
cross-coupled NMOS transistor pair this realization suffers from a severe ratio problem. To
gain high speed, the PMOS transistors must be much larger than the pull-down NMOS
transistors, which results in a large load and high power consumption even for weak undersized
NMOS transistors. The large PMOS clock transistor impose a large clock load. The N latch
shown in Fig. 2b is also ratio sensitive but this is not so severe since the logic is implemented
using the NMOS transistors, which can easily be made stronger than the two cross-coupled
pull-up PMOS transistors. Charge sharing is not a problem in these latches since there is no
direct path from the internal nodes in the logic to the outputs. Precharging is not required in
these latches due to the use of two cross-coupled pull-up or pull-down transistors.
2.8 2.8 4
¿
Q Q
2 2
N 4 4 N* Q Q
A
4
¿ N 2 2 N*

Fig. 1a. Single Transistor Clocked N latch.


Fig. 1b. Single Transistor Clocked P latch.

2.8 2.8
*
P 4 4 P Q Q
4
¿ 4
4 ¿ 4

Q Q N 4 4 N*
2/1.6 2/1.6

Fig. 2a. Robust P latch with merged P logic.


Fig. 2b. Robust N latch with merged N logic.

The restrictions on the clock slope are mild for these latches [6], i.e., the latching is not
dependent on the clock slope. When the clock signal makes a high-to-low transition for the N
latch or a low-to-high transition for the P latch, the data is already latched by the cross-coupled
transistor pair and will be kept after the clock transition. This makes these latches robust against
slow clock transitions and enables a decrease in area and power consumption by using a smaller
clock driver. This logic style will be referred to as the P-N logic style.

In the following it is shown how a new P latch without the ratio problem can replace the P latch
in the P-N logic style. The new structure for the P latch is further combined with the N latch to
form a new differential flip-flop with a low transistor count. A bit-serial carry save adder is
used in implementation examples for both the original logic style and the new flip-flop
structure.

2. THE DIFFERENTIAL LOGIC STYLE


The logic style used to implement the adders in this work aims at overcoming the bottlenecks in
[1, 2, 3, 4, 6]. The logic and latches should be merged without charge sharing problems. This
increases the speed significantly and decreases the power consumption due to fewer switching
nodes and reduces glitching. In bit-serial implementations, which is important in algorithm-
specific DSP applications, most of the logic can be merged with the latches. In common bit-
parallel implementations, parts of the logic can be merged. Only the faster NMOS transistor
should be used to implement the logic. The NMOS transistors can be made small yielding a
reduced load and thereby a reduced power consumption. It also allows the use of more complex
logic inside a single latch, which enables most of the logic to be merged with the latches.
Precharging should be avoided since it yields higher switching activity than non-precharged
logic. By the use of complementary outputs no additional inverters are required, which
decreases the number of gate delays and switching nodes and the number of gates to design is
fewer (AND/NAND or OR/NOR is the same circuit, with only switched inputs or outputs).
XOR functions, which are extensively used in full adders for implementation of arithmetic, do
clearly benefit from the availability of complementary input signals. The latches should be clock
slope insensitive, hence lower power consumption can be obtained by the use of a smaller clock
driver. Not only the number of clocked transistors but their sizes should be minimized. The
new logic style is a combination of the N latch shown in Fig. 2b and the novel P latch presented
below. It is also possible to use a static variant of the N latch [6], where the cross-coupled
PMOS transistors are replaced by two cross-coupled inverters. The static N latch and the novel
P latch form a semi-static logic style, which is important feature in low power circuits. Hence,
the clock can be idled at the low clock phase when the circuit is not used. The P latch may also
be merged with the N latch forming a new flip-flop that requires use of fewer transistors.

2.1. The Novel P Latch


The novel P latch is constructed from an ordinary Cascade Switch Logic (CVSL) gate as a base
shown in Fig. 3a. The CVSL gate consist of two complementary NMOS switch structures
connected to a pair of cross-coupled PMOS pull-up transistors. By the use of two cross-
coupled PMOS transistors as pull-up transistors, precharging is not needed. When the inputs
switch, either node b or b is pulled low. Positive feedback applied to the PMOS pull-up
transistors causes the gate to switch.

The order of input switching is important. First both complementary inputs must go low,
otherwise both nodes b and b are pulled low, which results in a short-circuit. After that the
complementary input is allowed to go high. The outputs of the N latch in Fig. 2b switches in
the right order. The logic trees may be further minimized from the full differential form, e.g., a
two input XOR gate may be minimized to only 6 NMOS transistors from the full differential
form with 8 transistors in the complementary NMOS nets.

To form a P latch, which is latching when the clock is high and evaluating when the clock is
low, two clocked PMOS transistors are added to the CVSL gate as shown in Fig. 3b. The two
clocked PMOS transistors prevents the outputs Q or Q to switch from low-to-high before the
clock ¿ switch from high-to-low. This also solves the charge sharing problem, since a low-to-
high transition cannot occur at the outputs before the clock goes low. Charged internal nodes in
the logic make the transition even faster when the internal nodes in the NMOS nets discharge to
the cross-coupled PMOS transistors. The restrictions on the clock slope for the P latch become
mild, since the latching works in the same fashion as for the N latch.

A problem with the gate in Fig. 3b is the threshold voltage Vt loss during pull-down. This
problem is solved by adding two NMOS transistors as shown in Fig. 4a.

¿ ¿
b b Q b b Q

N N* N N*

Fig. 3a. Cascade Switch Logic (CVSL) gate.


Fig. 3b. DCVSL gate with clocked PMOS transistors
When the inputs cause the CVSL gate to switch, the PMOS transistors prevents the output Q or
Q to switch low-to-high, while the added NMOS transistor causes one output to switch high-
to-low, i.e., the non-transparent input state of the following N latch is used [4]. Hence, it is
required that the proceeding N latch is sufficiently fast to have pulled down the output node
before the input is turned off. This is easily solved by implementing both the P and N latch with
equal rise and fall time. The complete P latch in Fig. 4a is redrawn in Fig. 4b to show the novel
concept of connecting NMOS logic nets between the PMOS transistors.

In the novel P latch the ratio problem is not so severe since the logic is in NMOS and the pull-
up PMOS transistors can be kept at minimum size. The two clocked transistors can also be kept
at minimum size, yielding a small clock load. In fact, all the transistors in the latch can be kept
at minimum size. Only the logic net must be sized if stacked NMOS transistors are required.

Simulations show that it is possible to combine the new logic style with the STC latches, with
only one restriction, the novel P latch will not work well with a STC N latch as load due to the
coupling capacitance (gate-source capacitance) between the gate and the common node A of the
STC N latch. When the clock switch low-to-high, the common node A is discharged to ground
and the gate-source capacitance makes the output from the novel P latch to follow the node A
and drop. This is also observed by simulation of a flip-flop constructed with the STC latches
[5].

2 2
¿ ¿
Q b b Q ¿ 2 2
¿

N* Q Q
N N 2 2 N*
2.8 2.8

Fig. 4a. The novel robust P latch with merged N logic


Fig. 4b. Redrawn P latch.

2.2. The Novel Flip-Flop


The novel flip-flop shown in Fig. 5 can be constructed by adding two clocked NMOS
transistors to the novel P latch in Fig. 4a. The flip-flop becomes negative edge-trigged and the
logic nets are merged with flip-flop in the same manner as in the case with the latches. The
construction of the flip-flop can also be viewed as a merging of the P and N latch by simply
sharing the cross-coupled PMOS transistors. The resulting flip-flop is similar to the flip-flop in
[4] but this flip-flop suffers from the same problems as the STC N latch. With the novel flip-
flop these problems are removed.

¿ ¿
Q b b Q

N N*

Fig. 5. Novel flip-flop with merged N logic.


3. ROBUSTNESS
Noise appears at the data and the clock inputs, yielding two different noise margins, data noise
margin and clock noise margin. Assuming a supply voltage if 3.0 V, the noise margins at the
data inputs are 2.0 V and 1.8 V for the P latch and N latch, respectively. This is a larger noise
margin than Vdd/2 due to hysteresis, which shows that the NMOS nets are somewhat weak.
The noise margins for the clock are 0.85 V and 1.8 V for the P latch and N latch, respectively.
The clock noise margin for the N latch is similar to the data noise margin since the clocked
transistors have to compete with the pull-up PMOS transistors in the same way as the logic. The
P latch clock noise margin is equal to Vt, because the clock transistor only works as a switch in
this case. Hence, the data noise margin is significantly improved compared to dynamic single-
rail logic. For the novel flip-flop is the data noise margin equal to data noise margin for the N
latch due to the same function. The clock noise margin is equal to Vt when the clock is high,
and equal to the data noise margin when the clock is low.

For examining the clock slope sensitivity, a divide-by-two counter has been implemented. The
counter consisted of one P latch and one N latch in series, where the outputs from the N latch
are fed back to the P latch inputs. The circuit was simulated with HSPICE with a triangular
clock. In Fig. 6 the simulation result is shown. Both outputs of the P and N latch are shown.
When the clock slope is 25 ns, the output from the N latch deviates 10 percent from the high
level during a falling clock. This is due to the rising output of the P latch that rises before the N
latch has finished the evaluation phase. This problem may be solved by making the P latch
slower.

Fig. 6. Divide-by-two circuit.

4. BIT-SERIAL ADDER IMPLEMENTATIONS


A bit-serial carry save adder has been used as an example circuit in a comparison between the
structure consisting of the novel P latch and the N latch and the novel flip-flop. In Fig. 7 the
solution with the separate P and N latches is shown. This solution also uses an STC P latch in
one place in order to reduce the number of clocked transistors. This bit-serial adder required 9
clocked transistors and 38 non-clocked transistors of which only 11 are PMOS transistors. The
highest number of stacked NMOS transistors is 3. In this adder efficient realizations have been
used for the 2 input XOR gates and the 2 input multiplexer where transistors in the logic nets
are shared. Only 6 transistors are required in the logic nets for both the XOR gates and the
multiplexer. In the bit-serial adder, the carry loop is cleared before the start of a new
computation by raising of the clr input to the AND gate.
p-xor n-xor
¯ ¯
å
¯ ¯
a å

b b

STC n- C
p-latch mux
¯
C

¯ ¯

b b

p-and
¯ ¯

clr
clr

Fig. 7. Bit-serial adder realized with separate latches.

The bit-serial carry save adder based on the novel flip-flop is shown in Fig. 8. This solution
used 8 clocked transistors. The number of non-clocked transistors is 30 of which only 4 are
PMOS. In this adder the number of stacked NMOS transistors is 4. Also in this adder an
efficient realization of the XOR gate has been used where a 3-input XOR gate has been realized
with sharing of its complementary nets. This realization required only 10 transistors in the logic
net. The low number of stacked transistors in the carry net allowed an AND function to be
integrated with the carry function, allowing for the carry loop to be cleared before the start of a
new computation. With this solution the number of switching output nodes is reduced to only 4
compared to the adder with separate latches that contained 10 switching output nodes.
carry sum
å
¯ ¯ ¯ ¯
C C
å

¯ ¯ ¯ ¯
a
clr clr
a a

a a
b
b b
b b

Fig. 8. Bit-serial adder using the novel flip-flop.

5. COMPARISON
A power delay product (PDP) comparison of the adders in Fig. 7 and Fig. 8 has been
performed using simulations on layouts in a 0.8 mm AMS process. For the simulations
HSPICE [7] was used. In Table 1 the simulation results are shown together with the area
measure, power consumption, and the maximum clock frequency.

The adder with the new flip-flop structure required 30 percent less area than the adder with
separate latches. This reduction in area is caused by the smaller number of PMOS transistors in
the flip-flop structure and the smaller number of gates, which reduces the internal routing. The
maximum clock frequency is 300 MHz for both implementations. This result is due to different
effects. The clock frequency in the adder using the flip-flop structure is limited by the larger
number of stacked transistors in the logic nets while for the adder with separate latches the
larger number of cascaded gates limits the clock frequency. The power consumption for the
clock driver in the adder with the flip-flop structure is reduced with 10 percent due to the
reduced number of clocked transistors. The power consumption for the logic is reduced with 37
percent which is due to the reduced number of switching output nodes and smaller internal
routing. The power delay product thereby is reduced with about 30 percent for the adder with
the new flip-flop structure.

Area Fmax Power,Clock Power,Logic PDP


Bit-serial Adder
[mm2] [MHz] [mW] [mW] [pJ]
Separate Latches 3700 300 0.31 0.57 1.5
Flip-flop 2600 300 0.28 0.36 1.1

Table 1. Comparison of the bit-serial adders.


6. CONCLUSION
It was shown how efficient bit-serial carry save adders is implemented using a recently
proposed differential logic style. For the XOR gates and multiplexers used in the adders it was
possible to share parts of the complementary logic nets, yielding savings in transistor count. A
new flip-flop was also presented which further was used in the realization of an adder. The
layout of two adders was simulated using HSPICE and the results were compared. The adder
that used the novel flip-flop structure had reduced area requirements with 30 percent and a
lower power consumption with 30 percent compared to the adder realized with separate latches.

REFERENCES
[1] Hong-Yi Huang: True-single-phase All-N-Logic Differential Logic (TADL) for Very High
Speed Complex VLSI, Proc. IEEE ISCAS-96, Vol. 4, pp. 296-299, Atlanta, USA, 1996.

[2] Huang C. G.: Implementation of true single-phase clock D flipflops, IEE Electronics
Letters, Vol. 30, pp. 1373-1374, Aug., 1994.

[3] Renshaw D. and Choon How Lau.: Race-free clocking of CMOS pipelines using a single
global clock, IEEE J. Solid-State Circuits, Vol. SC-25, pp. 766-769, June, 1996.

[4] Yuan J. and Svensson C.: New Single-Clock CMOS Latches and Flipflops with Improved
Speed and Power Savings, IEEE J. Solid-State Circuits, vol. 32, no. 1, pp. 62-69, Jan.
1997.

[5] Blair G. M.: Comment on new differential flipflops from Yuan and Svensson, IEE,
Electronics Letters Vol. 32, No 23, pp. 2125-2126, Nov., 1996.

[6] Afghahi M.: A Robust Single Phase Clocking for Low Power, High Speed VLSI
Applications, IEEE J. Solid-State Circuits, Vol. SC-31, No. 2, pp. 247-254, Feb. 1996.

[7] Karlsson M., Vesterbacka M., and Wanhammar L.: A Robust Differential Logic Style with
NMOS Logic Nets, Proc. of IEE IWSSIP, pp. 61-64, Poland, May, 1997.

You might also like