A 1V 2mW 17GHz Multi-Modulus Frequency Divider Based On TSPC Logic Using 65nm CMOS
A 1V 2mW 17GHz Multi-Modulus Frequency Divider Based On TSPC Logic Using 65nm CMOS
Serializer
Retiming To laser
clocked (TSPC) prescalers. High-speed and low-power operation D Q driver or line
driver
was achieved by merging the combinatorial counter logic with TxData[n] Clk Clk Output buffer
the flip-flop stages and removing circuit nodes at the expense of
Clk
Transmitter
allowing a small short-circuit current during a short fraction of Low-speed Up Loop LC-
the operation cycle, thus minimizing the amount of nodes in the ref. clock PFD CP filter VCO
Clk
circuit. The divider is designed for operation in wireline or fibre- Down
Program. divider
optic serial link transceivers with programmable divider ratios of ÷64 or ÷80 or ÷96 or
64, 80, 96, 100, 112, 120 and 140. ÷112 or ÷120 or ÷140
CMU
The divider is implemented as part of a phase-locked loop Clock setting
Clk
around a quadrature voltage controlled oscillator in a 65nm Clk Receiver
CMOS technology. The maximum operating frequency is
Deserializer
RxData[1] VGA
measured to be 17GHz with 2mW power consumption from a From optical
CDR
1.0V supply voltage, and occupies 25x50m2. receiver or
RxData[n] low-noise
amplifier
Keywords— Multi-modulus divider, frequency divider, Fig. 1. Serial link transceiver (PFD: phase/frequency detector, CP: charge
prescaler, true single-phase clocked (TSPC) logic pump, VCO: voltage controlled oscillator, CDR: clock and data recovery,
VGA: variable gain amplifier, CMU: clock multiplier unit).
I. INTRODUCTION
high-speed parts of future transceivers need to be carefully
Rapid growth of cloud computing, social media, etc. are considered. In this paper, we address one critical part namely
putting the data centres which support these services under the divider in the clock multiplier units (CMUs) which derive
pressure. Data centres consist of tens of thousands of servers line rate clocks from a low-speed reference clock. In the
connected via short-reach links, usually fibre-optic cables. transmitter the high-speed clock is used for re-timing and
Transceivers for such links achieve up to 100Gb/s by serialisation; in the receiver it is used for clock and data
multiplexing non-return to zero modulated (NRZ) 10x 10Gb/s recovery and deserialisation. Due to stringent jitter
or 4x 25Gb/s lanes. In the next few years there will be a need specifications, this CMU is usually a phase locked loop (PLL),
for links with capacities of at least 400Gb/s. Advanced see Fig. 1. The frequency divider divides the output from the
modulation formats such as 4-level or 8-level pulse amplitude voltage controlled oscillator (VCO) prior to comparing it to the
modulation (4-PAM, 8-PAM), 16-ary quadrature amplitude reference clock using the phase/frequency detector (PFD). As
modulation (16-QAM) and discrete multi-tone modulation are this divider operates at the full clock frequency, it can use a
now considered to achieve these speeds [1]. significant part of the overall transceiver power consumption.
Transceivers for these applications face stringent cost, Gigahertz capable frequency dividers can be implemented
physical size and power consumption targets [2]. In particular in several ways. Injection locked dividers (ILDs) offer high-
power consumption is a challenging goal: the Optoelectronics frequency operation at low powers at the expense of a limited
Industry Development Association (OIDA) has targets of 10- locking range [3]. ILDs offer also only static (usually divide-
50pJ/bit by 2017 and 10-20pJ/bit by 2022 in their roadmap for by-2 per inductor, with some recent circuits also supporting
rack-to-rack links; chip/board level links are even more divide-by-3 operation without additional inductors, see [4])
challenging and will need to operate at 10x better energy division ratios. Digital dividers offer robust operation with a
efficiency [2, Fig. 4]. Additionally, to maximize sale volumes wide range of divider ratios, which can be programmable. Such
and reduce cost, transceiver chipsets need to be multirate to dividers usually consist of high-speed latches or flip-flops,
support the wide variety of standards that exist today (Ethernet, together with combinatorial logic to swallow pulses or detect
Infiniband, Fibre Channel, Serial Attached SCSI etc.). end-of-count states. Current-mode logic (CML) is typically
To meet the mentioned power consumption targets, all the used as this offers very high-speed operation at the expense of
high power consumption [5,6]. Another alternative are true
This work was funded by MCCI – Microelectronic Circuits Centre single-phase clocked (TSPC) logic cells [7]. Recent results
Ireland, through the Enterprise Ireland and IDA Ireland Technology Centres have demonstrated that multi-gigahertz operation is feasible
programme. and by Science Foundation Ireland under grants 11/SIRG/12112 with a power consumption a fraction of CML based dividers.
and 12/RC/2276.
Authorized licensed use limited to: SHANGHAI UNIVERSITY. Downloaded on June 12,2024 at 03:15:00 UTC from IEEE Xplore. Restrictions apply.
fIN fOUT fIN fOUT
IN Q IN Q IN Q IN Q IN Q IN Q IN Q IN Q IN Q
÷2/÷3 ÷2/÷3 ÷2/÷3 ÷2/÷3 ÷2 ÷4/÷5 ÷2/÷3 ÷2/÷3 ÷2
MO P MI MO P MI MO P MI MO P MI P MO P MI MO P MI
P0 P1 P6 P7 P0 P1 P2
Fig. 2. Conventional multi-modulus divider based on ÷2/÷3 cells. Fig. 3. Proposed multi-modulus divider.
TABLE I EXAMPLE DIVIDER RATIOS FOR WIRELINE STANDARDS fOUT
Div. Ref. freq. Bitrate, baudrate and
ratio (MHz) modulation format N1 D Q D Q D Q
DFF1 DFF2 DFF3
64 125 64Gb/s, 16GBaud/s 16-QAM Clk Clk Q Clk
80 100 48Gb/s, 16GBaud/s 8-PAM fIN
M
96 125 48Gb /s, 24Gbaud/s 4-PAM Fig. 4. Conventional ÷4/÷5 dual-modulus prescaler.
100 125 50Gb/s, 25Gbaud/s 4-PAM of flip-flops, while keeping the same minimum (÷64) and
112 125 56Gb/s, 28Gbaud/s 4-PAM somewhat larger (÷140) divider ratio. Indeed the conventional
120 125 120Gb/s, 30Gbaud/s 16-QAM implementation needs seven ÷2/÷3 dual-modulus prescalers
140 100 56Gb/s, 28Gbaud/s 4-PAM cells to realize a divider ratio between 64 and 127 and requires
In [8], a divide-by-16/17 multi-modulus prescaler is presented 14 flip-flops. Our proposed implementation requires only nine
which can operate at 5.8GHz using TSPC cells integrated in a flip-flops (the ÷4/÷5 prescaler uses 3 flip-flops and the ÷2/÷3
0.18m CMOS technology. The power consumption was prescalers require 2 flip-flops, see hereafter). A final ÷2
2.6mW from a 1.6V supply voltage. In [9], extended TSPC (E- divider of a single TSPC D-flip-flop restores the duty cycle of
TSPC) logic [10] was used to design a 10GHz programmable
the output clock to 50% irrespective of the divider ratio.
divide-by-16…31 multi-modulus divider using 0.13m
CMOS, which consumed 16mW. Although E-TSPC logic can III. DUAL-MODULUS ÷4/÷5 PRESCALER
achieve higher operating frequencies than TSPC logic, it has A conventional ÷4/÷5 prescaler is shown in Fig. 4: when
been shown to be sensitive to the input bias voltage and can be mode signal M is high, the input signal fIN traverses DFF1 and
difficult to interface with a VCO [11]. Furthermore, E-TSPC DFF2 and the circuit divides by four. When mode signal M is
suffers from large short-circuit currents which can create low, DFF3 and the NOR-gate N1 swallow one pulse every four
voltage spikes on the supply lines and disturb e.g. transceiver’s input pulses and the circuit divides by five. In ÷5 mode, the
VCO. Therefore, we selected TSPC logic for the divider. long feedback path (shown with dotted line) limits the
II. ARCHITECTURE OF THE PROGRAMMABLE DIVIDER maximum operating frequency. Our topology (Fig. 5) avoids
this long feedback path: TSPC flip-flops with merged pulse
Programmable multi-modulus dividers are typically swallowing logic (transistors M1 and M2) were used to improve
implemented using a sequence of n dual-modulus ÷2/÷3 speed. When the mode signal M is high, transistor M2 is off
prescalers, see Fig. 2 [12]. This divider can realize any integer and hence DFF3 does not affect the circuit operation. DFF2 is
divider ratio between 2n and 2n+1-1. The maximum achievable connected as a ÷2 divider; its output fD is used as the clock for
frequency is limited by the fact that the gating signals ‘MO’, DFF3 which is also connected as a ÷2 divider, hence when the
must have settled within at least (assuming zero setup time of mode signal M is high the circuit is a ÷4 divider. When the
the flipflops) one pulse width (half the period) of the input mode signal M is low, transistor M2 is on: DFF3 together with
signal, making this architecture difficult to use above 10GHz transistor M1 now form a pulse swallowing network which
swallows one pulse every four cycles of the input signal:
with TSPC logic.
indeed when transistor M1 is turned on, node Q is pulled down
However, unlike wireless applications which require since transistor M2 is also on and the bit stored by the master
frequency synthesis with high resolution (small frequency latch of DFF2 is erased (see Fig. 6). Note that when both M1
steps), dividers for wireline applications need to realize only a and M2 are on and fIN is low, the voltage on node Q is defined
number of specific divider ratios. An example of seven useful by the ratio of the on-resistance of MP to that of M1 plus M2:
divider ratios is given in Table 1. These divider ratios (÷64, the voltage on node Q is therefore highly dependent on
÷80, ÷96, ÷100, ÷112, ÷120 and ÷140) can be realized using process, supply voltage and temperature (PVT) corners.
the multi-modulus divider shown in Fig. 3, in which higher However since fIN is low, the slave latch of DFF2 is in hold
input operation frequency has been traded off against the mode and the actual voltage of node Q does not affect the
number of realizable divider ratios. A first ÷2 prescaler circuit operation. During this time, a small short-circuit current
is drawn from the supply, which however is less than 20% of
consisting of a TSPC D-flip-flop (with its output fed back to the total power consumption of the prescaler. To avoid this
its data input) halves the input frequency such that it can be short-circuit current one can in principle include a parallel
handled by the subsequent dual-modulus ÷4/÷5 prescaler. combination of PMOS transistors whose gates are controlled
Next a cascade of two ÷2/÷3 TSPC dividers is used as in the by the corresponding gate signals from M1 and M2 (thus
conventional divider architecture (see Fig. 2), and realizes the forming a NAND gate merged with DFF2) in series with MP,
divider ratios ÷4, ÷5, ÷6 and ÷7. There is now no long however the additional capacitive load and on-resistance slows
feedback path limiting the maximum input frequency. An down the prescaler. When fIN becomes high, transistor MP is
additional advantage is the significant reduction in the number turned off and node Q is discharged to ground via M1 and M2,
432
Authorized licensed use limited to: SHANGHAI UNIVERSITY. Downloaded on June 12,2024 at 03:15:00 UTC from IEEE Xplore. Restrictions apply.
Master latch Slave latch
DFF1
fIN fIN fD
MP
Q fD fO
fIN fIN fIN fD fD
M1
fIN fIN fIN fD
M2
DFF3 DFF2
M
Fig. 5. Proposed TSPC based ÷4/÷5 dual-modulus prescaler. Master latch Slave latch
fIN 1 2 3 4 5 DFF1 DFF2
fIN fIN
MP
Gate drive M 1 Q fD
fIN fIN fIN fIN
Gate drive M2 M1
fIN fIN
fD M2
fO MO
MI
Pulse from input signal swallowed
Fig. 7. The ÷2/÷3 dual-modulus prescaler.
Fig. 6. Divide-by-5 operation of the ÷4/÷5 dual-modulus prescaler.
now a ÷5 divider. Note that unlike the conventional PFD pump & Q-VCO
architecture, the critical path (indicated with the dotted line in down filter I+
Di2SE
Fig. 5) involves one flip-flop and a latch rather than three flip- Multi-modulus
I−
1.8
gate merged with flip-flop DFF1. The pulse swallowing 1.6
÷2/ ÷3 (20W)
318W
network is formed by flip-flop DFF2 and transistors M1,M2. 1.4 1 st (HF) ÷2
Similar to the operation of the ÷4/÷5 dual-modulus prescaler 1.2 668 W
explained above, a pulse from the input signal fIN is swallowed 1.0
÷4/÷5
64 80 96 100 112 120 140
when the gates of both M1 and M2 are high. Rather than (938W)
Divider ratio
implementing a full NOR gate as in [11], the use of only
transistors M1 and M2 to swallow a pulse reduces the amount Fig. 9a. Power vs. div. ratio Fig. 9b. Power breakdown
of nodes in the circuit thus improving speed. The trade-off is consumption is given in Fig. 9b for a division factor of 112
the appearance of a short-circuit current as explained (highest power consumption).
previously, which however represents a small fraction (~24%) To test the chip, the PLL was locked to a crystal reference;
of the total power consumption of the prescaler. the divider ratios can be selected using a digital serial interface.
V. SIMULATION AND MEASUREMENT RESULTS The divider worked for all divider ratios up to 17GHz (the
The divider was integrated into a PLL build around a maximum VCO output frequency measured on the test chip):
quadrature VCO (16.5GHz…17.0GHz, KVCO ≈ 440MHz/V), the divider waveforms are shown in Fig. 10 for six different
see Fig. 8a. The circuit was designed and fabricated using a divider ratios (ratio ÷120 left out to maintain page limits).
65nm CMOS technology and verified across process, supply An important consideration when using TSPC logic is that
voltage (1.0V +/-5%) and temperature (0°C till 110°C) (PVT) this logic is single-ended and hence may induce voltage spikes
corners; it was packaged into a 24-pin QFN package, the chip on the supply lines, which may be larger than e.g. differential
micrograph is shown in Fig. 8b. The maximum specified CML. In turn such spikes can couple to the VCO control
operating frequency of 15GHz was achieved across all PVT voltage resulting in unwanted spurs. To minimize such
corners, simulated including RC parasitics extracted from the coupling we used different supply domains for the VCO and
layout. Fig. 9a shows the power consumption for the different digital logic. The worst-case spurs occurred at a divider ratio of
divider factors: it is relatively independent of the divider ratio 140, the spectrum for a Q-VCO oscillation frequency of
as the power consumption is dominated by the first ÷2 16.66GHz (where the VCO gain was highest) is shown in Fig.
prescaler and ÷4/÷5 dual-modulus prescalers, which operate at 11. The spur is 58.9dB below the carrier, resulting in a peak-to-
the highest frequencies. The breakdown of the power peak jitter of ~47fs, acceptable for fibre-optic transceivers. The
433
Authorized licensed use limited to: SHANGHAI UNIVERSITY. Downloaded on June 12,2024 at 03:15:00 UTC from IEEE Xplore. Restrictions apply.
3.74ns 4.70ns
fO =267MHz fO =213MHz
Div. ratio = 64 2ns/div Div. ratio = 80 2ns/div
5.60ns 5.88ns
fO =179MHz fO =170MHz
Div. ratio = 96 2ns/div Div. ratio = 100
6.59ns 8.24ns Fig. 12. Phase noise spectrum at 16.66GHz, divider ratio = 140.
TABLE II COMPARISON TO STATE-OF-THE-ART
Ref. Max. Power Supply Divider Logic
fO =152MHz fO =121MHz
Div. ratio = 112
2ns/div
Div. ratio = 140 2ns/div [] freq. (mW) (V) ratios
(GHz)
Fig. 10 Divider outputs for different divider ratios. [13] 19 39.8 1.5 16…31 CML
[14] 5.8 2.2 1.8 32,33,47,48 TSPC
[15] 12 28.1 1.8 256,260, CML+
264,268 TSPC
This 17 2.0 1.0 64, 80, 96, TSPC
work 100, 112,
120, 140
[3] S. Verma, H.R. Rategh, and T.H. Lee, “A unified model for injection-
locked frequency dividers,” IEEE J. Solid-State Circuits, vol. 38, pp.
1015–1027, June 2003.
[4] M. P. Kennedy et.al. “A high frequency 'divide-by-odd number' CMOS
LC injection-locked frequency divider, ” J. Analog Integrated Circuits
and Signal Processing, Springer, Oct. 2013.
Fig. 11. Output spectrum at 16.66GHz, divider ratio = 140. [5] D. Lim et.al., “Performance variability of a 90GHz static CML
frequency divider in 65nm SOI CMOS,” in Tech. Dig. Papers. IEEE Int
phase noise spectrum for the same conditions is shown in Fig. Solid-State Circuits Conf. (ISSCC), pp. 542-621, Feb. 2007.
12. The phase noise spectral density at 1MHz relative to the [6] J.-O. Plouchart et.al., “Performance variations of a 66GHz static CML
carrier is −94dBc/Hz. This was ~10dB higher than anticipated divider in 90nm CMOS,” in Proc. Tech. Dig. Papers. IEEE Int Solid-
from the simulations, an issue which was found to be due to State Circuits Conf. (ISSCC), pp. 2142-2151, Feb. 2006.
interconnect resistance in the LC tank network of the Q-VCO, [7] J. Yuan and C. Svensson, “High-speed CMOS circuit technique,” IEEE
and not the divider logic. J. Solid-State Circuits, vol. 24, pp.62-70, Feb. 1989.
[8] W. Zhu, H. Yang, T. Gao, F. Liu, D. Zhang and H. Zhang, “A 5.8GHz
VI. CONCLUSION wideband TSPC divide-by-16/17 dual-modulus prescaler”, IEEE. Trans.
A frequency divider based upon TSPC logic which can VLSI Systems, Feb. 2014.
achieve the divider ratios ÷64, ÷80, ÷96, ÷112, ÷120 and ÷140 [9] M. Jung et.al., “A 10GHz low-power multi-modulus frequency divider
has been demonstrated. Measurement results on a prototype using extended true single-phase clock (E-TSPC) logic,” in Proc. Eur.
Microwave IC conference, pp. 508-511, 2012.
fabricated using 65nm CMOS show a maximum operating
[10] J. Navarro and W. Van Noije, “E-TSPC: extended true single-phase
frequency of 17GHz (at least 14GHz verified on the netlist clock CMOS circuit technique,” in Proc. Int. Conf. Integrated Syst. IFIP
with R,C parasitics across supply, temperature and process and VLSI, U.K., pp. 165-176, 1997.
corners). Its power consumption was 2mW from a 1V supply. [11] M.V. Krishna et.al., “Design and analysis of ultra low power true single
The divider was tested as part of a PLL build around a phase clock CMOS 2/3 prescaler,” IEEE Trans. Circuits Systems-I, vol.
quadrature LC-VCO: spurs due to coupling from the single- 57, pp. 72-82, Jan. 2010.
ended TSPC logic were at least 58.9dB below the carrier. Table [12] C. Voucher et. al., “A family of low-power truly modular progammable
2 compares our circuit to state-of-the-art multi-modulus (>2) dividers in standard 0.35um CMOS technology,” in IEEE J. Solid-State
Circuits, vol. 35, pp. 1039-1045, July 2000.
dividers: to the extent of our knowledge it achieves best
efficieny (GHz/mW) reported for a multi-modulus divider and [13] Q. Jane Gu and Z. Gao, “A CMOS high speed multi-modulus divider
with retiming for jitter suppression,” IEEE Microwave and Wireless
operates at lowest supply voltage (1.0V). Letters, vol. 23, pp. 554-556, Feb. 2013.
REFERENCES [14] V.K. Krishna, M.A. Do, C.C. Boon and K.S. Yeo, “A low-power single-
[1] IEEE 802.3 400Gb/s Ethernet Study Group, available online at phase clock multiband divider,” IEEE Trans. VLSI Systems, , vol. 20, pp.
http://www.ieee802.org/3/400GSG/ 376-380, Feb. 2012
[2] C. DeCusatis, “Optical interconnect networks for data communications,” [15] J.-H. Tsai and H.-D. Shih, “A 7.5GHz-12GHz divide-by-
J. Lightwave Technol., vol. 32, pp. 544-552, Feb. 2014. 256/260/264/268 frequency divider for frequency synthesizers,” 2012
Int. Conf. on Microwave and Millimeter wave technology, 2012.
434
Authorized licensed use limited to: SHANGHAI UNIVERSITY. Downloaded on June 12,2024 at 03:15:00 UTC from IEEE Xplore. Restrictions apply.