Implementation of Bit-Serial Adders Using Robust D
Implementation of Bit-Serial Adders Using Robust D
In this paper two bit-serial carry save adders are implemented using a recently proposed
differential logic style. The clocking scheme uses a single clock phase with non-precharged
stages of logic that may be merged with the latches or the flip-flops. A novel flip-flop structure
is used in one of the adders, which significantly lowers the number of clocked transistors. The
logic style used in the adder realizations suits high speed and low power operation in both bit-
serial and bit-parallel implementations, since all logic nets are purely in NMOS. The logic style
is also robust for clock slope and yields a data noise margin equal to Vdd/2. The adders reached
a maximal clock frequency of 300 MHz in a 0.8 mm process with a 3.0 V power supply
voltage.
1. INTRODUCTION
In recent years, several differential logic styles have been proposed. In [1] the circuits use only
NMOS transistors in the logic trees but require precharging and a large number of clocked
transistors (four per latch). In [2] the circuit is precharged and combined with a set/reset NAND
pair and the number of clocked transistors is large. A similar circuit is proposed in [3], uses
precharging, both PMOS and NMOS transistors in the logic nets, and a large number of
clocked transistors. The recently proposed Single Transistors Clocked latches (STC) [4] are not
precharged with a minimum clock load consisting of a single clocked transistor per latch, which
makes them interesting. The latches use a novel flip-flop concept, i.e., the non-transparent input
state of the N latch shown in Fig. 1a is used. The outputs of the P latch Fig. 1b are allowed to
go low in its latched state. This leads to only one low-to-high transition in its evaluation phase,
which results in high speed. However, in these latches it is difficult to incorporate the logic due
to problems with charge sharing, but they serve well as flip-flops. Charge sharing in the STC N
latch is due to the common node A of the two NMOS branches. A charged internal node in one
branch can discharge to the other branch through the common node A above the clocked
transistor. The STC P latch shown in Fig. 1b has the same problem if the two branches of logic
share transistors, e.g., a simplified XOR net that shares the nodes between both NMOS
branches. When a charged internal node of the logic nets affects the output nodes charge
sharing may ruin the low output state. The charge sharing problems can cause a static power
dissipation in the next logic stage, thus, making the circuit unsuitable for low power
implementations and unreliable due to possible faults in the logic evaluation. Another problem
with the STC latches, due to the share node A, has been commented by Blair [5].
In another recently proposed logic style, presented in [6], the bottleneck is the P latch, which
has PMOS transistors in the logic as shown in Fig. 2a. Since the logic nets are connected to a
cross-coupled NMOS transistor pair this realization suffers from a severe ratio problem. To
gain high speed, the PMOS transistors must be much larger than the pull-down NMOS
transistors, which results in a large load and high power consumption even for weak undersized
NMOS transistors. The large PMOS clock transistor impose a large clock load. The N latch
shown in Fig. 2b is also ratio sensitive but this is not so severe since the logic is implemented
using the NMOS transistors, which can easily be made stronger than the two cross-coupled
pull-up PMOS transistors. Charge sharing is not a problem in these latches since there is no
direct path from the internal nodes in the logic to the outputs. Precharging is not required in
these latches due to the use of two cross-coupled pull-up or pull-down transistors.
2.8 2.8 4
¿
Q Q
2 2
N 4 4 N* Q Q
A
4
¿ N 2 2 N*
2.8 2.8
*
P 4 4 P Q Q
4
¿ 4
4 ¿ 4
Q Q N 4 4 N*
2/1.6 2/1.6
The restrictions on the clock slope are mild for these latches [6], i.e., the latching is not
dependent on the clock slope. When the clock signal makes a high-to-low transition for the N
latch or a low-to-high transition for the P latch, the data is already latched by the cross-coupled
transistor pair and will be kept after the clock transition. This makes these latches robust against
slow clock transitions and enables a decrease in area and power consumption by using a smaller
clock driver. This logic style will be referred to as the P-N logic style.
In the following it is shown how a new P latch without the ratio problem can replace the P latch
in the P-N logic style. The new structure for the P latch is further combined with the N latch to
form a new differential flip-flop with a low transistor count. A bit-serial carry save adder is
used in implementation examples for both the original logic style and the new flip-flop
structure.
The order of input switching is important. First both complementary inputs must go low,
otherwise both nodes b and b are pulled low, which results in a short-circuit. After that the
complementary input is allowed to go high. The outputs of the N latch in Fig. 2b switches in
the right order. The logic trees may be further minimized from the full differential form, e.g., a
two input XOR gate may be minimized to only 6 NMOS transistors from the full differential
form with 8 transistors in the complementary NMOS nets.
To form a P latch, which is latching when the clock is high and evaluating when the clock is
low, two clocked PMOS transistors are added to the CVSL gate as shown in Fig. 3b. The two
clocked PMOS transistors prevents the outputs Q or Q to switch from low-to-high before the
clock ¿ switch from high-to-low. This also solves the charge sharing problem, since a low-to-
high transition cannot occur at the outputs before the clock goes low. Charged internal nodes in
the logic make the transition even faster when the internal nodes in the NMOS nets discharge to
the cross-coupled PMOS transistors. The restrictions on the clock slope for the P latch become
mild, since the latching works in the same fashion as for the N latch.
A problem with the gate in Fig. 3b is the threshold voltage Vt loss during pull-down. This
problem is solved by adding two NMOS transistors as shown in Fig. 4a.
¿ ¿
b b Q b b Q
N N* N N*
In the novel P latch the ratio problem is not so severe since the logic is in NMOS and the pull-
up PMOS transistors can be kept at minimum size. The two clocked transistors can also be kept
at minimum size, yielding a small clock load. In fact, all the transistors in the latch can be kept
at minimum size. Only the logic net must be sized if stacked NMOS transistors are required.
Simulations show that it is possible to combine the new logic style with the STC latches, with
only one restriction, the novel P latch will not work well with a STC N latch as load due to the
coupling capacitance (gate-source capacitance) between the gate and the common node A of the
STC N latch. When the clock switch low-to-high, the common node A is discharged to ground
and the gate-source capacitance makes the output from the novel P latch to follow the node A
and drop. This is also observed by simulation of a flip-flop constructed with the STC latches
[5].
2 2
¿ ¿
Q b b Q ¿ 2 2
¿
N* Q Q
N N 2 2 N*
2.8 2.8
¿ ¿
Q b b Q
N N*
For examining the clock slope sensitivity, a divide-by-two counter has been implemented. The
counter consisted of one P latch and one N latch in series, where the outputs from the N latch
are fed back to the P latch inputs. The circuit was simulated with HSPICE with a triangular
clock. In Fig. 6 the simulation result is shown. Both outputs of the P and N latch are shown.
When the clock slope is 25 ns, the output from the N latch deviates 10 percent from the high
level during a falling clock. This is due to the rising output of the P latch that rises before the N
latch has finished the evaluation phase. This problem may be solved by making the P latch
slower.
b b
STC n- C
p-latch mux
¯
C
¯ ¯
b b
p-and
¯ ¯
clr
clr
The bit-serial carry save adder based on the novel flip-flop is shown in Fig. 8. This solution
used 8 clocked transistors. The number of non-clocked transistors is 30 of which only 4 are
PMOS. In this adder the number of stacked NMOS transistors is 4. Also in this adder an
efficient realization of the XOR gate has been used where a 3-input XOR gate has been realized
with sharing of its complementary nets. This realization required only 10 transistors in the logic
net. The low number of stacked transistors in the carry net allowed an AND function to be
integrated with the carry function, allowing for the carry loop to be cleared before the start of a
new computation. With this solution the number of switching output nodes is reduced to only 4
compared to the adder with separate latches that contained 10 switching output nodes.
carry sum
å
¯ ¯ ¯ ¯
C C
å
¯ ¯ ¯ ¯
a
clr clr
a a
a a
b
b b
b b
5. COMPARISON
A power delay product (PDP) comparison of the adders in Fig. 7 and Fig. 8 has been
performed using simulations on layouts in a 0.8 mm AMS process. For the simulations
HSPICE [7] was used. In Table 1 the simulation results are shown together with the area
measure, power consumption, and the maximum clock frequency.
The adder with the new flip-flop structure required 30 percent less area than the adder with
separate latches. This reduction in area is caused by the smaller number of PMOS transistors in
the flip-flop structure and the smaller number of gates, which reduces the internal routing. The
maximum clock frequency is 300 MHz for both implementations. This result is due to different
effects. The clock frequency in the adder using the flip-flop structure is limited by the larger
number of stacked transistors in the logic nets while for the adder with separate latches the
larger number of cascaded gates limits the clock frequency. The power consumption for the
clock driver in the adder with the flip-flop structure is reduced with 10 percent due to the
reduced number of clocked transistors. The power consumption for the logic is reduced with 37
percent which is due to the reduced number of switching output nodes and smaller internal
routing. The power delay product thereby is reduced with about 30 percent for the adder with
the new flip-flop structure.
REFERENCES
[1] Hong-Yi Huang: True-single-phase All-N-Logic Differential Logic (TADL) for Very High
Speed Complex VLSI, Proc. IEEE ISCAS-96, Vol. 4, pp. 296-299, Atlanta, USA, 1996.
[2] Huang C. G.: Implementation of true single-phase clock D flipflops, IEE Electronics
Letters, Vol. 30, pp. 1373-1374, Aug., 1994.
[3] Renshaw D. and Choon How Lau.: Race-free clocking of CMOS pipelines using a single
global clock, IEEE J. Solid-State Circuits, Vol. SC-25, pp. 766-769, June, 1996.
[4] Yuan J. and Svensson C.: New Single-Clock CMOS Latches and Flipflops with Improved
Speed and Power Savings, IEEE J. Solid-State Circuits, vol. 32, no. 1, pp. 62-69, Jan.
1997.
[5] Blair G. M.: Comment on new differential flipflops from Yuan and Svensson, IEE,
Electronics Letters Vol. 32, No 23, pp. 2125-2126, Nov., 1996.
[6] Afghahi M.: A Robust Single Phase Clocking for Low Power, High Speed VLSI
Applications, IEEE J. Solid-State Circuits, Vol. SC-31, No. 2, pp. 247-254, Feb. 1996.
[7] Karlsson M., Vesterbacka M., and Wanhammar L.: A Robust Differential Logic Style with
NMOS Logic Nets, Proc. of IEE IWSSIP, pp. 61-64, Poland, May, 1997.