0% found this document useful (0 votes)
15 views

77.-Improving Linearity in CMOS Phase Interpolators

Uploaded by

tanmingtao06
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

77.-Improving Linearity in CMOS Phase Interpolators

Uploaded by

tanmingtao06
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

IEEE JOURNAL OF SOLID-STATE CIRCUITS 1

Improving Linearity in CMOS Phase Interpolators


Amit Kumar Mishra , Member, IEEE, Yifei Li, Member, IEEE, Pawan Agarwal , Member, IEEE,
and Sudip Shekhar , Senior Member, IEEE

Abstract— We compare the prior art in phase interpola- a clock and data recovery (CDR) system. The PI output
tors (PIs), classifying them as current-mode, voltage-mode, and is a local clock that samples the data optimally in time
integrating-mode PI. Next, we present an integrating-mode PI for recovering the data with the lowest possible bit error
where the voltage slopes with high phase linearity are gener-
ated through the integration of phase-shifted weighted current rate (BER). Depending on clock recovery architecture, the
sources. The constant and variable voltage slopes are generated PI output clock requires either phase or both phase and
by current sources/sinks created using stacked devices in a frequency offset correction. Only phase deskew is needed for
0.75-V 5-nm finFET technology. This PI technique supports the source-synchronous clocking where the transmitter (TX)
the high-speed and low-power operation and achieves dual-edge clock is sent to the RX [4], [5]. In plesiochronous systems,
interpolation with improved duty-cycle distortion characteristics.
The PI generates an output clock with 9 bits of resolution the RX needs to recover its clock from the data. However,
and a small peak-to-peak integral nonlinearity (INLpp ) and due to a mismatch between the crystal oscillators at the TX
peak-to-peak differential nonlinearity (DNLpp ) of 2.4◦ and 1.4◦ , and the RX, the global PLL clock frequency at the RX differs
respectively, at 13.3 GHz with just quadrature clock inputs. The from the TX by 1 f . The CDR, therefore, rotates the PI for
PI has a 71-fsrms random jitter (integrated from 3 MHz to 3 GHz) accumulating 1 f in a PI-based CDR. Moreover, the PI can
and occupies an active area of 0.006 mm2 while consuming 6-mW
power at 14 GHz. An integrated rotation spur of −42.6 dBc also be used to provide the frequency modulation required for
for 256-ppm modulation at 13.3-GHz operating frequency is spread-spectrum clocking [6].
achieved for 1-GHz update rate for the dynamic linearity The PI time quantization error, TLSB , combined with phase
measurements. differential nonlinearity (DNL) and phase integral nonlinearity
Index Terms— AM–PM, digital-to-phase converter (DPC), (INL) constitutes the PI deterministic jitter (DJ) [7]. Narrow
dynamic linearity, fractional frequency synthesis, integrating- symbol time periods at higher data rates reduce the sampling
mode, phase mixer, phase rotator, plesiochronous. time margin demanding small jitter and high linearity from a
PI. Moreover, the ability of a PI to work with quadrature inputs
I. I NTRODUCTION greatly relaxes the MPG design and input phase correction
concerns. However, PI linearity degrades for greater input
F AST-GROWING data traffic in data centers demands
the wireline transceivers to operate at higher data rates.
A multilane transceiver implementation is required to sup-
phase separation [3], [8]. Therefore, eight or more phases
are often required from an MPG which consumes signifi-
cant power, uses complex architectures, and faces challenges
port high data rates necessitating low-power and compact
for phase accuracy. In [3], both a quadrature delay-locked
clocking. The CMOS technology scaling also helps advanced
loop (DLL) and an eight-phase injection-locked ring oscil-
nodes [1], [2] provide fast-switching transistors. However,
lator (ILRO) work in tandem to lower the phase errors for
the reduced supply voltage which accompanies node scaling
eight-phase generation at the cost of higher power dissipation.
favors digital-friendly techniques, and consequently, clocking
An eight-phase delay line with three cascaded eight-phase
solutions using current-mode logic (CML) [3] are unamenable.
ILROs is used in [9] to improve the phase accuracy increasing
In a receiver (RX), a global phase-locked loop (PLL) often
the clocking power consumption.
produces a differential clock which is distributed to multiple
We present a scaling friendly, low power, and compact
lanes. The differential phases are provided to a multiphase
PI [10] which addresses the above mentioned problems by:
generator (MPG) in each lane for generating multiple phase
1) employing a higher resolution of 9 bit for reducing TLSB ,
clocks, which are supplied to a phase interpolator (PI) in
and 2) using a high phase linearity technique for reduc-
Manuscript received 20 May 2022; revised 11 October 2022, ing the phase INL and DNL. Furthermore, the input phase
24 November 2022, and 10 January 2023; accepted 2 February 2023. separation requirement is relaxed due to high phase linear-
This article was approved by Associate Editor Hui Pan. This work was ity, enabling the PI to operate with just quadrature phase
supported in part by Maxlinear, Inc., in part by the Natural Sciences and
Engineering Research Council of Canada, and in part by Intel Corporation. inputs.
(Corresponding author: Amit Kumar Mishra.) This article is organized as follows. Section II describes the
Amit Kumar Mishra and Sudip Shekhar are with the Department of classification and comparative analysis of PIs and reasons for
Electrical and Computer Engineering, The University of British Columbia,
Vancouver, BC V6T1Z4, Canada (e-mail: [email protected]). their phase nonlinearity. Section III discusses the conceptual
Yifei Li and Pawan Agarwal are with Maxlinear, Inc., Carlsbad, operation, architecture, design considerations, and static and
CA 92008 USA. dynamic linearity of the high-speed PI in this work. Measure-
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/JSSC.2023.3243305. ment results are presented in Section IV and this article is
Digital Object Identifier 10.1109/JSSC.2023.3243305 concluded in Section V.
0018-9200 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: The University of British Columbia Library. Downloaded on April 27,2023 at 18:38:40 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2 IEEE JOURNAL OF SOLID-STATE CIRCUITS

Fig. 2. Voltage-mode phase interpolator.

scheme often employed [11], [12] allows ease of imple-


mentation; however, it produces a diamond constellation
Fig. 1. Current-mode phase interpolator. that causes phase nonlinearity. The INL for the first
quadrant (see Fig. 1) is given as
II. C LASSIFICATION OF P HASE I NTERPOLATORS π

1−α

PIs for high-speed links can be broadly classified into three INL = (1 − α) − arctan (1)
2 α
categories: 1) current-mode PI (CMPI); 2) voltage-mode PI
where 0 ≤ α ≤ 1. Therefore, the INLpp contribution just
(VMPI); and 3) integrating-mode PI (IMPI).
from the diamond constellation weight implementation
is 8.14◦ .
A. Current-Mode Phase Interpolator 4) An RC network with tuning is widely used as an output
As shown in Fig. 1, a CMPI is implemented as an I -Q load [3], [7], [11]; however, the harmonic filtering may
phase mixer architecture where two orthogonal sinusoids are still be inadequate for various applications. LC networks
weighted and summed to produce an interpolated output. The can be used for better filtering [13], [18] at the cost
prominent sources of phase nonlinearity for this PI architecture of area and susceptibility to electromagnetic coupling.
are: 1) high voltage swings [11]; 2) higher harmonics in An active inductor load used in [19] provides band-
the inputs clock signals [11]; 3) implementation of a linear width extension while occupying a small area; however,
weighting scheme for interpolation [12]; and 4) inadequate it requires load tunability to adjust the peaking transfer
harmonic filtering at the output node [13]. function with PVT change.
The methods employed to mitigate each of the mentioned
sources of nonlinearity are as follows. B. Voltage-Mode Phase Interpolator
1) Input amplitude is prudently chosen—large enough to Interpolation in the voltage domain with the input signals,
steer the current but small enough to avoid hard switch- V8in1 and V8in2 , as square waves does not yield an interpolated
ing and generation of higher harmonics. CMOS-CML phase output. For the square wave inputs, if the output rise
conversion in [3] with tunability sets an appropriate (fall) time of the PI inverter is considerably short than the time
voltage swing. duration of the [V8in1 , V8in2 ] = [0, 1] ([1, 0]) region, the
2) Higher harmonics introduce distortion into the interpo- PI output settles at an intermediate voltage level between the
lated output, degrading phase linearity. Techniques such VDD and the GND in that region [20]. These code-dependent
as poly-phase filtering [14] and LC-based filtering [11] voltage levels are formed due to the voltage division from the
remove higher harmonics at the inputs. PMOS-NMOS (PN) short-circuit path between the VDD and
3) Ideally, a sinusoidal weight implementation scheme is the GND arising in those regions and substantially degrades
required, and the I and Q weights, α and β, respectively, the PI phase linearity. Therefore, input signals used for inter-
lie on a circle such that the generated interpolated polation need to slew, about three times their input time
phases at the load, V X , have a constant amplitude (see separation, requiring slew control circuits at the inputs [1], [8],
Fig. 1). A constant amplitude is desired because the [20]. A larger input slew requires a smaller input phase sepa-
swing-dependent delay characteristic of the CML-to- ration, which implies using more input phases. Therefore, it is
CMOS (C2C) circuit results in AM–PM distortion which challenging to use a few input phases, e.g., quadrature inputs,
eventually manifests as phase nonlinearity. and simultaneously achieve high phase linearity in a VMPI.
Some techniques, such as octagonal weight constella- Some VMPI architectures also use filtering at the output to
tion [9], [15], [16], and 45◦ offset I -Q combination [17] improve phase linearity [21]. In a slice-based implementation
coarsely approach the circular weight constellation for for VMPI, (M−K ) and K slices are selected for the V8in1
reducing AM at the expense of additional circuitry and and V8in2 , respectively, to achieve an interpolation factor of
implementation complexity. An octagonal constellation K /M (see Fig. 2). A slice is usually implemented as a tri-state
CMPI can be implemented more compactly (close to a inverter. The interpolated outputs, however, are variable-slope
diamond constellation) than the CMPI with 45◦ offset signals with slope-shape dependent on K . These interpolated
combination technique. However, the latter technique signals suffer code-dependent delay when evaluated by a
has shown better phase linearity. A linear weighting comparator, leading to phase nonlinearity [22].

Authorized licensed use limited to: The University of British Columbia Library. Downloaded on April 27,2023 at 18:38:40 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

MISHRA et al.: IMPROVING LINEARITY IN CMOS PIs 3

Fig. 3. (a) Generic IMPI architecture. Variable and constant slope generation. (b) Possible implementations for I1 and I2 —as current source, sink, or source-sink
duo. (c) IMPI prior art-I waveforms. Single-edge interpolation and output DCD as architectural limitations are illustrated.

C. Integrating-Mode Phase Interpolator 1) it only provides single-edge interpolation; 2) the outputs


A generic IMPI architecture, as shown in Fig. 3(a), under- have duty-cycle distortion (DCD), resulting in PWM outputs;
and 3) the DCD depends on the interpolation factor.
pins the IMPI operation in prior works [5], [23], [24], [25]
Moreover, PWM signal propagation through buffers, sharing
and this work. In this architecture, two input phases, either
of PWM signals between complementary PIs [23], and reset
directly or through a control logic, control the current sources
using feedback [24] poses challenges for implementation and
I1 and I2 to charge/discharge the capacitor (C O ), producing
frequency scaling. IMPI in [5] improves on the architectures
voltage waveforms, V X . The PI code sets the magnitude of
in [23], and [24] for linearity and power supply sensitivity, uses
I1 and I2 . A discharge signal, Vdischrg (optional, used in some
a differential comparator to remove the inverter (comparator)
architectures), performs the reset operation and is generated
threshold variation, and employs replica integrators to remove
by the control logic [23] and/or PI feedback [5], [24]. The
the nonlinearity caused by quiescent currents in [24]. However,
current sources can be implemented either as current sources,
the architecture [5] still faces several challenges.
sinks, or a current source-sink duo, as shown in Fig. 3(b).
Constant and variable voltage slope regions are integral to 1) Need for Current Calibration: Three cases are shown
IMPI operation. Fig. 3(a) shows the variable and constant slope in Fig. 4 (top) where the current is: i) lower than
generation in prior art [5], [23], [24], [25]. In the variable slope optimal; ii) optimal; and iii) higher than optimal. Case i)
region where only I1 is active, the PI code sets the current leads to voltage swing reduction at V X , resulting in
magnitude giving rise to code-controlled (variable) slopes. a decreased comparison time for the comparator eval-
Variable slopes essentially create linear-spaced vertical voltage uating V X waveforms. A shrink in comparison time
levels at the end of the variable slope region from where the poses difficulty for high-frequency operation. In case iii),
constant slope region starts. For the constant slope, the PI code phase nonlinearity occurs as the comparator threshold,
chooses currents I1 and I2 such that total current I1 + I2 is Vth , crosses the V X signals in the variable slope region.
constant, and therefore C O is charged with a constant current Therefore, current calibration methods using a digital
resulting in constant slopes. The constant slopes emanating DLL in [24] are used.
from these linear-spaced voltage steps will be linear-spaced 2) Need for Optimal Vth : Consider the scenario where the
in time in the constant slope region. A comparator evaluates IMPI is at its optimal current setting, but the Vth changes
these waveforms in the constant slope region, thus producing [see Fig. 4 (bottom)]. If Vth is greater than optimal, as
linear interpolated outputs. in case 1), the comparison time reduces, affecting the
1) IMPI Prior Art-I: The IMPIs described in [23], [24], high-frequency operation. If Vth is lower than optimal,
and [5] are operationally similar and are labeled here as as in case 3), then phase nonlinearity occurs.
prior art-I. In [5], variable slopes are generated in region-I 3) Need for Auxiliary Circuit: Since these IMPIs work on
from (M-K )·Iu current and constant slopes are generated a single edge, an edge combiner is used for combining
in region-II from the M·Iu current followed by a reset in two single-edge PWM outputs to construct differential
region-III producing voltage V X [see Fig. 3(c)]. Here, Iu is signals with 50% duty cycle. The improved IMPI in [5]
the current that flows through a unit current source (sink) for still uses several auxiliary circuits, which include replica
charging (discharging) C O . When evaluated by a comparator, integrating cores, differential comparators, buffers, edge
a pulsewidth modulation (PWM) signal VO is generated. combiner, etc., costing power and area. Furthermore, the
Being operationally similar, all prior art-I architectures pro- comparators still need optimal Vth and input common-
duce the same V X waveforms and have some limitations: mode voltage, VCM, which were provided externally.

Authorized licensed use limited to: The University of British Columbia Library. Downloaded on April 27,2023 at 18:38:40 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

4 IEEE JOURNAL OF SOLID-STATE CIRCUITS

TABLE I
C OMPARING H IGH -S PEED IMPI W ITH S TATE - OF - THE -A RT PI S

Moreover, the generation of S and R signals uses feed-


back and complex logic; and therefore, they are difficult
to generate for a small Tperiod . Thus, this architecture
is not amenable for high-frequency operations. For the
Fig. 4. IMPI prior art-I: Cases illustrating the need for current and Vth
implementation in [25], a small phase spacing of 22.5◦
calibration. between input signals favorably provided a small time
span of Tperiod /16 for the variable slope region, which
2) IMPI Prior Art-II: Another IMPI architecture, prior considerably relaxed the linearity requirements from this
art-II [25], as shown in Fig. 5(a) is a slice-stacked-based IMPI. However, such small phase spacing requires more
implementation that provides dual-edge operation with 50% phases from the MPG.
duty cycle outputs at 2-GHz operating frequency. The input
clock phases, V8in1 and V8in2 , control the top/bottom PN D. Comparison of CMPI, VMPI, and IMPI
current sources. The S1–S4 switches control the current flow In contrast to IMPI, both CMPI and VMPI require some or
in and out of the current sources to the V X node and, therefore, all of these: 1) input slew control; 2) input harmonic filtering;
the interpolation factor. Retention cells, structurally similar to and 3) output harmonic filtering. However, the residual phase
the IMPI slice, are required to retain the logic level when nonlinearity remains in a: 1) CMPI due to inadequate harmonic
the interpolation operation is over to prevent node V X from filtering and nonsinusoidal weighting and 2) VMPI from the
floating. In regions I, II, IV, and V, the S signals enable/disable variable-slope signals, which suffer phase distortion from the
the controlled transistors in IMPI, and the IMPI effectively evaluating comparator.
assumes the current source-sink structure shown in Fig. 5(c). An IMPI works suitably with linear weights and relieves
In these regions, the R signals cut the signal path from the the abovementioned requirements because of the following
retention cell to node V X , thus isolating the IMPI from the reasons.
retention cell. In regions III and VI, the S signals disable all 1) Its operation involves switching the current sources
the controlled transistors in IMPI, which presents the issue of ON / OFF, suited for inputs as square waves. Therefore,
the floating node at V X . To prevent this condition, the R signals slew or harmonic filtering at inputs is undesired and
connect the node V X to the retention cell, providing the desired therefore not required.
logic voltage level. Thus, S and R signals keep changing to 2) In an IMPI, the output is evaluated by a comparator in
EN and disable the transistors in IMPI and retention cells, the constant-slope region. Thus, the phase nonlinearity
respectively, for guiding the IMPI from region-I to region-VI which the comparator otherwise presents for evaluating
in a Tperiod . It is important to note that the control signals, varying slope signals is eliminated [22].
S and R, are not fixed for a PI code but are PWM signals A comparison is provided in Table I. In summary, a CMPI
switching close to clock rate. works at high frequencies and provides superior power supply
1) Regions of Operation and Waveforms: In a clock time noise rejection but does not scale well with the process or
period (Tperiod ), this PI undergoes six regions, as shown VDD and requires nonlinear weighting to improve phase
in Fig. 5(b). In region-I, V8in1 = 0 and V8in2 = linearity. A VMPI implements linear weighting, is compact
1, and thus (M−K ) PMOS current sources are ON, and scaling compatible, and works at high frequency but
providing variable rising slopes. Both (M−K ) and K provides low-moderate linearity. The resolution for CMPI and
PMOS current sources are ON in region-II, giving rise to VMPI are primarily limited by their phase nonlinearities.
constant rising slopes. In region-III, all the S switches An IMPI has fundamentally better phase linearity. However,
turn off; therefore, the V X node is floating. A signal its implementation is complex and faces challenges for high-
from the retention cell is used to maintain the logic frequency operation, as described in Section III. This work
level. A similar operation sequence follows with NMOS presents an IMPI that works at high speed and achieves a
current sinks during the falling edge, and the PI goes resolution of 9 bits.
through regions IV-to-VI.
2) Challenges in Scaling Operating Frequency: However, III. H IGH -S PEED IMPI (T HIS W ORK )
it is challenging to accommodate two interpolation and This work aims for an IMPI with high linearity and
two retention regions within Tperiod as the operating high-speed operation having: 1) no control logic for the region
frequency increase and Tperiod shrinks [see Fig. 5(d)]. determination; 2) inherent dual-edge operation with 50% duty

Authorized licensed use limited to: The University of British Columbia Library. Downloaded on April 27,2023 at 18:38:40 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

MISHRA et al.: IMPROVING LINEARITY IN CMOS PIs 5

Fig. 5. IMPI prior art-II. (a) Architecture. (b) Regions of operation. (c) Equivalent architecture with current sources and sinks. (d) Waveforms during different
regions of operation.

cycle outputs; 3) operation at the highest frequency attainable B. Architecture and Operation
by digital circuits in a technology; 4) no calibration (slew-rate,
Due to its superior phase linearity, the proposed IMPI
current magnitude, comparator threshold, etc.); 5) no voltage
relaxes the input phase separation requirement, allowing its
bias requirements for gate control of current sources; 6) no
operation with just quadrature inputs. Not requiring eight input
retention/reset region of operation; and 7) no feedback control.
phases reduces power consumption, implementation complex-
ity, and phase mismatch in MPG. As shown in Fig. 6(b),
A. Conceptual Operation DCD and I -Q correction block clean the phase errors on the
The concept for this IMPI is described in Fig. 6(a). A square quadrature inputs clocks. For phase selection, a 9-bit code is
wave bidirectional current charges/discharges a capacitor peri- applied to a decoder which generates the enable (EN) signals
odically to produce a triangular-shaped output voltage at node for selecting slices within the PI. Out of 9 bit, 2 bit choose a
V X . Accordingly, a phase-shifted square wave input current quadrant, and the remaining 7 bit provide interpolation within
yields a phase-shifted triangular output. the quadrant. PI core output passes through a series capacitor,
The following idea emerges: When these phase-shifted V X Cser , to a C2C converter consisting of a resistor-biased inverter
signals are interpolated; during the time frame where the inter- followed by a buffer. Two such PI cores produce differential
polating V X signals have identical slopes “S,” the interpolated outputs through proper slice selection by EN-bits. Cross-
output retains the constant slope S. However, during the time coupled inverters are connected between the buffer chains
frame where the V X signals have opposite slope signs, S and in complementary C2C paths for maintaining the outputs in
−S, the interpolated output slope varies. Furthermore, if the differential. The PI core contains two IMPI_2x blocks and two
interpolation weights are linear, the slope variation will also IMPIs constitute each IMP_2x block. An IMPI contains two
be linear and a function of K . This interpolation, therefore, tunable current source-sink duos, I1 and I2 , which are realized
gives rise to a family of piecewise linear (PWL) signals, which as slice-stacks and receive complementary EN-bits.
are linear-spaced in time in the constant slope regions. When 1) IMPI_2x Structure, Quadrant Switching: The IMPI_2x
evaluated by a comparator, linear phase interpolated outputs block thus comprises four slice-stacks connected to quadrature
results. This interpolation at the output can be conveniently inputs. EN-bits configure these four stacks into two IMPIs
achieved by combining linear weighted square wave current where one is active, and another is inactive. The active
at the inputs, forming the basis of this IMPI realization. and inactive IMPIs are dynamically reconfigured as per the

Authorized licensed use limited to: The University of British Columbia Library. Downloaded on April 27,2023 at 18:38:40 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

6 IEEE JOURNAL OF SOLID-STATE CIRCUITS

Fig. 6. High-speed IMPI (this work). (a) Conceptual operation. (b) Overall architecture. (c) Half-slice design comparison. (d) IMPI_2x structure and its
dynamic configuration for quadrant switching.

quadrant location of the expected output phase [see Fig. 6(d)]. variation with good linearity and was chosen for this IMPI.
In quadrant-I, the active IMPI is formed by stacks connected In a half-slice, the input clocks are connected to transistors M1
to 0◦ and 90◦ ; in quadrant-II, the stacks connected to 90◦ and and M2 while M5 and M6 are connected to rail-to-rail EN-bits
180◦ are used, and a similar operation follows for the other for switching the slice ON/OFF.
quadrants. Therefore, the EN-bits are coded to select: 1) appro- The finFET transistors closer to the output node [i.e.,
priate stacks for choosing a quadrant and 2) slices within the M5 and M6 in Fig. 6(c)] operate mostly in saturation.
stacks for interpolation. This scheme obviates the requirement On the other hand, other transistors, such as M1 and M2 ,
of a multiplexer (MUX) between quadrature inputs and the operate in the triode region, providing source degenera-
PI core. Therefore, the phase nonlinearity originating from tion for the devices in saturation and increase the rout at
the propagation delay mismatch of different phases at the the stack output [26]. The highly cascoded architecture-I
MUX outputs is eliminated. Although using MUX would realizes a larger rout ≈ [(4rdeg )·gm ro ]/2 than the rout ≈
reduce the core IMPI area since the slice stacks can be reused [R + rdeg ·gm ro ]/2 in architecture-II, where rdeg is the resis-
for different phases. However, since the presented PI already tor realized by a transistor providing degeneration (such
occupied a small area, phase linearity improvement measures as M1 /M2 ), and gm and ro represent the transconductance
were prioritized over further area reduction. and output resistance of a transistor in saturation (such as
An IMPI half-slice forms the unit of these slice stacks. M5 /M6 ). Thus, architecture-I realizes a better current source
A high output impedance is required for the half-slice to work than architecture-II, which results in higher phase linearity.
as a good current source. Two architectures were evaluated It should be noted that the transistors in architecture-I need
for the half-slice design [see Fig. 6(c)]. The first architecture to be appropriately upsized (than architecture-II) for obtaining
obtains high output impedance by transistor stacking, while the same Iu . However, since architecture-I is an all-transistor-
the second uses both the resistor and transistor stacking. based implementation, it suffers from a larger mismatch in
While the first architecture provides better linearity, the second slices resulting in a higher INLpp variation (1.8× higher) than
produces less jitter and has lower INL change from the process architecture-II.

Authorized licensed use limited to: The University of British Columbia Library. Downloaded on April 27,2023 at 18:38:40 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

MISHRA et al.: IMPROVING LINEARITY IN CMOS PIs 7

Fig. 7. High-speed IMPI. (a) IMPI slice. (b) Equivalent architecture with current sources and sinks. (c) Input and output waveforms. (d) Regions of operation.
(e) Slope as a function of K in variable slope regions.

Two such half-slices connected at their output create an pedestals in region-III will be linear-spaced in time, providing
IMPI slice in Fig. 7(a). Conceptually, this IMPI can be seen high phase linearity.
as two slice-stack-based current source-sink duos whose (M-
K ) and K slices are enabled through EN-bits to achieve an C. Design Considerations
interpolation factor of K /M [see Fig. 7(b)].
This PI is designed such that the triangular output volt-
2) Regions of Operation: Similar to other IMPIs, this age swing is ∼60% VDD for linearity considerations [see
IMPI also undergoes variable and constant slope regions as Fig. 8(a)]. Our IMPI half slice can be represented as a current
described next. In region-I, [V8in1 , V8in2 ] = [1, 1], NMOS in Iu that periodically charges and discharges an R p ∥C load
both the half-slices are ON, and thus constant output slope [unlike Fig. 6(a)], where R p and C are the slice output resis-
is generated. In region-II, [V8in1 , V8in2 ] = [0, 1], both the tance and output capacitance, respectively, and C = C O /M.
top-left PMOS and the bottom-right NMOS are ON, generating For a specific R p C product, assume that the output voltage
a variable slope output. Similarly, constant slope occurs in V rises from 0.2Iu R p to 0.8Iu R p in Tperiod /2. Applying
region-III followed by variable slope in region-IV. The IMPI Kirchhoff’s Current Law, and assuming [Iu − (V /R p )] = k,
configuration in the four regions is shown in Fig. 7(c) and we can write
(d). In summary, two constant slope and two variable slope Z Tperiod /2   Z 0.2Iu
1 1
regions are generated in a Tperiod without using any control − dt = dk. (2)
0 R pC 0.8Iu k
signals for region discrimination. Also, no retention or reset
regions occur for this IMPI. On solving (2), we get the following expression:
3) Slope as a Function of K in Variable Slope Regions: Tperiod
R pC = ≈ 0.36 · Tperiod . (3)
In the variable-slope regions, slopes are a function of K . 2 · ln 4
As illustrated in Fig. 7(e), a current (M−2K ) · Iu flows Thus, the R p C product should be roughly 36% of the clock
into the capacitor in region-II. It can be observed that as time period for attaining 60% VDD swing, e.g., 25.7 ps for a
K is linearly varied, the Scode also changes linearly by the 14-GHz clock.
(2K /M) factor. These linear increments in slope lead to linear The IMPI design methodology is as follows. We aim to
voltage increments at the end of region-II forming linear- construct the triangular output waveform at frequency f =
spaced voltage pedestals. Constant slopes starting from voltage 1/Tperiod with a swing Vpp = 1V = 60% of VDD. To obtain

Authorized licensed use limited to: The University of British Columbia Library. Downloaded on April 27,2023 at 18:38:40 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

8 IEEE JOURNAL OF SOLID-STATE CIRCUITS

Fig. 8. High-speed IMPI. (a) Output swing considerations. (b) Reducing DCD generation from the PN strength-mismatch.

the lowest power, the smallest current, I , and smallest capac- the same percentage variation in IMPI amplitude swings,
itor, C O , should be used for achieving 1V swing where I is a higher IMPI swing results in considerably less C2C AM–PM
the current in the constant slope region; I = M · Iu . The IMPI distortion [3]. Thus, this trade-off between PI and C2C serves
power dissipation, P, is given by as a compensation method. Notably, implementing a swing
calibration technique will improve the PI phase linearity with
P = C O · VDD · 1V · f. (4) process variations.
The maximum peak-to-peak amplitude variation for ideal
To minimize the power at a particular frequency, the small-
PWL signals is half the maximum signal swing value. How-
est possible C O should be used for achieving 1V , which is
ever, the amplitude variation for the actual PWL signals is
realized using only the output parasitics of the IMPI slices
comparably lower, and the resultant C2C AM–PM nonlin-
and the C2C input load. Then the appropriate Iu value is
earity is alleviated because: 1) the pointed waveform tips at
chosen to achieve the 1V swing using a cascode transistor
extremes are tapered off due to limited bandwidth at IMPI
and then increasing the values of R in Fig. 6(c). Increasing
output, reducing the amplitude variation, and 2) the amplitude
the R decreases Iu , thus reducing the IMPI output swing. For
variation among the fundamental harmonic of PWL signals is
the R value for which the output swing reaches ∼1V , the
even small.
overall PI (IMPI and C2C combination) provides optimally
2) Reducing DCD Generation From PN Strength-Mismatch:
good linearity and is chosen for the IMPI design.
In a VMPI, a strength mismatch between PN leads to asym-
1) PI Output Swing Considerations: A voltage swing con-
metrical rise-fall times, resulting in DCD. In contrast, for
trol is not implemented. Consequently, the swing changes by
this IMPI, a stronger (weaker) PMOS than NMOS results
±15% VDD with the process variation (SS/FF corners), but
in an increase (decrease) of the PI average output level,
it does not lead to significant phase linearity degradation,
and the waveform moves up (down), as shown in Fig. 8(b).
as explained next.
The AC coupling capacitor, Cser , discards the average output
The overall phase linearity is achieved by optimizing phase
(and its DC deviation), preventing DCD generation. Likewise,
linearity both from the PI core and the C2C. Large swings
the average output voltage change due to SF (FS) corner is
exacerbate PI linearity, and small swings aggravate C2C
also discarded. SF (FS) corner degrades the linearity slightly
AM–PM distortion, as elaborated further. As PI output swing
for the rising (falling) edge while improving the linearity
increases, the transistors go deeper into the triode regions, and
for the falling (rising) edge. Cross-coupling inverters between
the voltage slope variation increases, becoming worse at the
the complementary PI chains even out this differential lin-
swing extremes. This slope variation affects the linearity of
earity change between differential edges. Moreover, since the
slopes in the variable-slope regions, which are situated at the
resistor-biased inverter evaluates the edges by its own Vth , the
swing extremes, producing phase nonlinearity.
process variation of Vth is also not an issue since the average PI
The C2C serves as a comparator for PI outputs. For the
output always aligns with inverter Vth at its input node, VC_ser .
maximum linearity, the interpolated PI waveforms should have
the same shapes within the critical comparison window [22].
Consequently, for the PI swings smaller than the comparison
D. Dynamic Operation of PI
window, amplitude variations (occurring in the variable slope
regions) cause AM–PM distortion from C2C, leading to phase In plesiochronous clocking, the PI rotates to accumulate
nonlinearity. The PI phase linearity improves when the output the frequency difference 1 f between the TX and the RX
voltage swing decreases, but the C2C AM–PM nonlinearity [see Fig. 9(a)]. To achieve this 1 f , the codes are updated
worsens. Alternatively, when the PI output swing increases, the to increment/decrement the output phase at a frequency
PI phase linearity deteriorates, but C2C AM–PM nonlinearity of FreqPI_update . For an N -bit PI, for a single LSB phase
reduces. The C2C AM–PM nonlinearity reduces because, for jump/update, 1 f = FreqPI_update /2 N . More LSBs need to jump

Authorized licensed use limited to: The University of British Columbia Library. Downloaded on April 27,2023 at 18:38:40 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

MISHRA et al.: IMPROVING LINEARITY IN CMOS PIs 9

Fig. 9. High-speed IMPI. (a) Dynamic PI operation for plesiochronous clocking. (b) Alleviating dynamic nonlinearity from the PN mismatch.

per PI update to accumulate higher ppm, and 1 f is = NLSB · charging and discharging of all internal source/drain nodes
FreqPI_update /2 N , where NLSB is LSBs jumped/update. also affect the linear slope shapes to some extent.
1) Linearity Degradation Mechanisms for Static and The clock inputs at the gates of M1 and M2 are isolated from
Dynamic PI Operation and Its Mitigation: Linearity degra- the V X through a transistor and a series resistor. Therefore,
dation occurs through three prominent mechanisms in a PI. M1 /M2 Cgd coupling does not affect the PI output linearity.
The mechanisms and mitigation techniques employed are as Furthermore, the inactive slice-stacks in the IMPI_2x are in the
follows. opposite phase to the active slice-stacks and cancel some of the
coupling from the active stacks’ inputs to their common output
a) Phase errors in input quadrature signals: These errors
node V X . Also, since the IMPI clock inputs are strongly driven
are removed by: 1) I -Q correction [27]; 2) DCD correc-
(rail-to-rail) by the inverter buffers, the inputs are largely
tion [28]; and 3) path matching of input phases in layout.
unaffected by the weak coupling feedback from the PI output.
Since the quadrature inputs are available, only a small delay
tuning range in the Q path is required to mitigate the I -Q c) Phase errors in EN-bits: These are mitigated by:
phase error in the input clocks (a slight difference from [27]). 1) minimizing EN-bits path length, achieved by placing the
The I -Q correction loop runs in the background and reduces decoder close to the PI core; 2) resampling EN-bits; and
the quadrature phase error to <150 fs. The PI inputs are rail- 3) path matching of the EN-bits in layout.
to-rail signals having rise/fall times of about 18% of Tperiod . A flip-flop clocked by the PI update clock is interfaced
Increasing the rise/fall time by more than 25% of Tperiod with the EN-bits before reaching the IMPI core. The flip-flop
degrades the phase linearity because the constant and variable outputs are therefore cleaned of the transition time mismatch
region boundaries become less distinct, and the regions overlap among the EN-bits due to reasons such as circuit-delay
increases. mismatch or path-length mismatch. Next, the path length
b) Phase nonlinearity in the PI: To address it, a high of EN-bits from the flip-flop output to the IMPI slices are
linearity IMPI technique is implemented in this work. matched in the layout to the best extent possible.
In an ideal scenario where the IMPI works as excellent cur- The EN-signals are relatively low-speed signals switching
rent sources and all the slices are identical (with no mismatch), at 1 GHz and therefore have fast rise/fall times ∼10% of
the interpolation error will be zero for linear weights. However, Tperiod . A higher rise/fall time is beneficial for phase linearity
some phase nonlinearity does exist for this IMPI. The variation if EN-bits are slightly mismatched in their transition time.
in the slope from the desired linear slope shapes in the variable However, resampling significantly relaxes the EN-bits rise/fall
and constant slope regions is the primary source of its phase times considerations for the PI dynamic linearity.
nonlinearity. A perfect IMPI operation requires ideal current While the techniques mentioned in Sections III-D1a
sources having infinite output resistance, rout , and capable of and III-D1b alleviate static nonlinearity, the technique
sustaining arbitrary large voltage swings. However, imperfect Section III-D1c is also required with Sections III-D1a and III-
current sources realized with stacked transistors-based imple- D1b for mitigating dynamic nonlinearity.
mentation suffer from a finite rout , variation in rout with signal 2) Alleviating Nonlinearity From PN Mismatch: As dis-
swing, and transition into triode regions for considerably large cussed, the PN mismatch affects the average PI output. The PI
signal swings. Furthermore, in a stacked implementation, the slices are turned ON and OFF when the PI code is updated.

Authorized licensed use limited to: The University of British Columbia Library. Downloaded on April 27,2023 at 18:38:40 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

10 IEEE JOURNAL OF SOLID-STATE CIRCUITS

Fig. 10. INL/DNL measurement of the 9-bit IMPI at 13.3 GHz, and the spectrum of the PI output with 256-ppm clock modulation at 1-GHz update showing
rotation spurs.

The cumulative PN mismatch of the ON slices determines the rotation spurs are measured when the PI is operating at
the average PI output. Three cases (1), (2), and (3) having 13.3 GHz. The PI codes are updated at 1 GHz to produce a
different PN mismatch profiles of the slice-stacks connected 256-ppm offset, which corresponds to 3.4 MHz. The fourth
to V8in1 and V8in2 are shown in Fig. 9(b) to explain this harmonic rotation spur at 13.6-MHz offset is −56.21 dBc
effect clearly. As the PI rotates, the PN mismatch manifests which shows excellent IMPI linearity for interpolating within
as code-dependent average voltage variation, creating an AC a quadrant. The integrated rotation spur (IRS) in dBc can be
signal of small magnitude composed of the harmonics of found using the following equation:
1 f . This low-frequency signal adds to the high-frequency
n=N 
An

PI output and periodically modulates its threshold-crossing, X
10
IRS = 10 · log10 10 +3 (5)
leading to DCD at the node Vpi_se . The waveform shape
n=1
of this low-frequency signal depends on the PN mismatch
pattern and is different for each case. Notably, the mismatch where An ’s are the spurs in dBc in one sideband, and N
pattern consisting of interspersed PMOS-strong and NMOS- is the number of spurs (see the Appendix). The IRS is
strong slices [as in case (3)] predominantly generates higher −42.6 dBc in the measurements. A maximum frequency
harmonics of 1 f with rotation, albeit of lower amplitude due offset of 256 ppm was fixed at the CDR architecture level,
to the frequent PN mismatch cancellation. The waveform in which limited the maximum ppm to 256 in measurements
case (4) represents a random mismatch scenario. All four and is not a fundamental limitation for this IMPI.
cases see the same high pass C R filter created by Cser Output spurs were measured for eight lanes showing lane-
and Rfb (providing a 3-dB bandwidth of ∼150 MHz at the to-lane variation in Fig. 11(a) and (b). The worst spur spans
node VC_ser ), suppressing these low-frequency signals while from −45.5 to −52.3 dBc. The IRS is at −39.5 dBc in the
allowing the high-frequency PI output clock to pass through worst case, and the best is −43 dBc. The standard deviation
with minimal attenuation, as illustrated in Fig. 9(b). The R of the INLpp and DNLpp is 50 fs and 15 fs, respectively,
in the C R filter is equal to Rfb /(1 + A), where −A is the for Monte-Carlo simulations for 100 runs. The maximum
open loop voltage gain of the inverter. Thus, dynamic DCD measured integration spur variation of 120 fs is obtained from
and resultant phase errors in outputs are prevented. the measurements of more than 100 chips. The measured PI
phase noise is shown in Fig. 12, and the RMS-integrated jitter
IV. M EASUREMENT R ESULTS from 3-MHz to 3-GHz integration bandwidth is 71 fs. The PI
For the DNL measurements, the control code is swept, and is designed in a 5-nm finFET process, and it occupies an area
the output is measured against a constant reference clock. of 0.006 mm2 . The die micrograph is shown in Fig. 13.
A DNLpp of 1.4◦ and an INLpp of 2.4◦ is obtained, as shown This PI is designed to work at 14 GHz. It also works
in Fig. 10. The static linearity measurements were performed at 9 GHz but with some INL degradation (INLpp ∼ 5◦ )
using a Keysight N1000A DCA-X wide-band oscilloscope. because the IMPI swing increases at a lower frequency,
The TX output from one lane is provided at the DCA inputs, pushing the IMPI transistors deeper into triode regions and
and another lane’s TX output is attached to the trigger input deteriorating its functionality as a good current source.
of the DCA. The PI code for the DCA inputs is swept, To operate at a lower frequency while maintaining a high
and its zero-crossing time is measured against the trigger phase linearity, a capacitor bank can be implemented at
reference kept at a constant phase. An averaging of 256 times node V X to maintain a near-constant value of 1V . However,
per PI code was performed to find the mean zero-crossing it was not implemented in this design, as supporting a lower
time difference. For dynamic nonlinearity measurements, frequency was not required.

Authorized licensed use limited to: The University of British Columbia Library. Downloaded on April 27,2023 at 18:38:40 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

MISHRA et al.: IMPROVING LINEARITY IN CMOS PIs 11

TABLE II
C OMPARING H IGH -S PEED IMPI W ITH S TATE - OF - THE -A RT PI S

Fig. 11. Measurements for different lanes showing (a) worst spurs and
(b) integrated spurs.

Fig. 13. Die micrograph in 5-nm finFET process.

Fig. 12. Measured phase noise of the PI. RMS-integrated jitter (3-MHz to
3-GHz) is 71 fs @ 13.3 GHz.

The role of resolution and linearity for the DJ of an N -bit


PI is elucidated. The normalized PI DJ with respect to Tperiod
Fig. 14. Worst case DJ and Fwc versus clock time period.
is defined as F (in degree), which is a constituent of the PI
jitter budget. The PI worst case DJ, PIDJ_wc , is a sum of LSB
time period, TLSB , and the absolute value of peak-to-peak INL, The worst case normalized jitter, Fwc (in degree),
INLpp [29] in (8) is formed by two constituent terms. The first term,
TLSB (fs)/Tperiod (fs), indicates the quantization error (1/2 N ),
PIDJ (fs)
F= × 360◦ (6) and the second term, |INLpp (fs)|/Tperiod (fs), signifies the non-
Tperiod (fs) linearity contribution toward Tperiod . In this work, the overall
PIDJ_wc = TLSB + |INLpp | (7) worst case normalized jitter is reduced from: 1) the first term
PIDJ_wc (fs) by increasing N and implementing a 9-bit resolution and 2) the
Fwc = × 360◦
Tperiod (fs) second term by implementing a high linearity IMPI technique.

TLSB (fs) |INLpp (fs)|
 Fig. 14 shows the comparison of the worst case jitter versus
= + × 360◦ time period for the state-of-the-art PIs. This IMPI achieves the
Tperiod (fs) Tperiod (fs)
smallest PIDJ_wc of 0.66 ps and the lowest Fwc of 3.1◦ .
|INLpp (fs)|
 
1
= + × 360◦ . (8) Table II compares this IMPI with prior works. Briefly,
2N Tperiod (fs) this PI works with only four input phases, provides a higher

Authorized licensed use limited to: The University of British Columbia Library. Downloaded on April 27,2023 at 18:38:40 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

12 IEEE JOURNAL OF SOLID-STATE CIRCUITS

resolution of 9 bits, and achieves an excellent fractional Assuming the same spur levels in the other sideband,
INLpp /DNLpp , with low noise and low IRS while consuming we express IRS as PdBct + 3 dB, therefore, leading to (5).
a small area.
ACKNOWLEDGMENT
V. C ONCLUSION The authors would like to thank G. Deliyannides, W. Lye,
Although an IMPI is fundamentally better in phase linearity A. Masnadi, P. Masoumi, Y. Meng, S. Lightbody, at Maxlinear,
than a CMPI/VMPI, its prior arts face challenges for imple- Burnaby, BC, Canada and Z. Ning at Maxlinear, Carlsbad,
mentation complexity and high-frequency operation, limiting CA, USA, for their support for this work. They would also
its widespread adoption. This IMPI overcomes the limitations like to thank M. Madiseh at Cisco for guidance, NSERC and
of IMPI prior arts by supporting the high-frequency and Intel for financial support, CMC Microsystems for access to
dual-edge operation, and eliminating control, calibration, and tools, Rohde & Schwarz for Phase Noise Analyzer, and Mentor
biasing circuits. Further, it avoids slew, harmonic filtering, Graphics for Analog FastSPICE.
and tuning circuits usually required for CMPI/VMPI. The
presented architecture lends to simple, low power, high reso- R EFERENCES
lution, and compact implementation while providing improved [1] S. Chen et al., “A 4-to-16 GHz inverter-based injection-locked quadra-
DCD performance and low jitter essential for small BER ture clock generator with phase interpolators for multi-standard I/Os in
7 nm FinFET,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig.
as the sampling time margin shrinks with higher data rates. Tech. Papers, Feb. 2018, pp. 390–392.
The superior static and dynamic nonlinearity performance of [2] Y.-C. Huang and B.-J. Chen, “An 8b injection-locked phase rotator with
this IMPI in a 5-nm finFET places it at the forefront for dynamic multiphase injection for 28/56/112 Gb/s SerDes application,”
in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers,
implementation in advanced technology nodes. Feb. 2019, pp. 486–488.
[3] Z. Wang, Y. Zhang, Y. Onizuka, and P. R. Kinget, “Multi-phase clock
A PPENDIX generation for phase interpolation with a multi-phase, injection-locked
D ERIVATION OF IRS ring oscillator and a quadrature DLL,” IEEE J. Solid-State Circuits,
vol. 57, no. 6, pp. 1776–1787, Jun. 2022.
Let S be the sinusoidal carrier signal of amplitude B S , and [4] S. Shekhar et al., “Strong injection locking in low-Q LC oscillators:
S1, S2, . . . , SN are the spurs in one sideband representing Modeling and application in a forwarded-clock I/O receiver,” IEEE
Trans. Circuits Syst. I, Reg. Papers, vol. 56, no. 8, pp. 1818–1829,
sinusoids of amplitude B S1 , B S2 , . . . , BSN , respectively. The Aug. 2009.
power of a sinusoidal carrier signal is given by B S2 /2, which [5] T. O. Dickson et al., “A 1.8 pJ/bit 16×16 Gb/s source-synchronous
in dB can be written as parallel interface in 32 nm SOI CMOS with receiver redundancy
  for link recalibration,” IEEE J. Solid-State Circuits, vol. 51, no. 8,
PdB S = 10 log10 B S2 /2 . (9) pp. 1744–1755, Aug. 2016.
[6] M. Pozzoni et al., “A multi-standard 1.5 to 10 Gb/s latch-based 3-tap
Similarly, the power in the spur signal, Sn, is given by DFE receiver with a SSC tolerant CDR for serial backplane commu-
  nication,” IEEE J. Solid-State Circuits, vol. 44, no. 4, pp. 1306–1315,
PdBSn = 10 log10 BSn2
/2 . (10) Apr. 2009.
[7] E. Monaco, G. Anzalone, G. Albasini, S. Erba, M. Bassi, and
Power in dBc of the spur Sn is therefore A. Mazzanti, “A 2–11 GHz 7-bit high-linearity phase rotator based on
  wideband injection-locking multi-phase generation for high-speed serial
links in 28-nm CMOS FDSOI,” IEEE J. Solid-State Circuits, vol. 52,
An = PdBSn − PdB S = 10 log10 BSn 2
/B S2 (11) no. 7, pp. 1739–1752, Jul. 2017.
[8] P. K. Hanumolu, V. Kratyuk, G.-Y. Wei, and U.-K. Moon, “A sub-
which can be rearranged to picosecond resolution 0.5–1.5 GHz digital-to-phase converter,” IEEE J.
2
BSn

An
 Solid-State Circuits, vol. 43, no. 2, pp. 414–424, Feb. 2008.
= 10 10
. (12) [9] A. Cevrero et al., “A 60 Gb/s 1.9 pJ/bit NRZ optical-receiver with low
B S2 latency digital CDR in 14 nm CMOS FinFET,” in Proc. IEEE Symp.
VLSI Circuits, Jun. 2017, pp. C320–C321.
The total power of all spurs, Pt , can be written as [10] A. K. Mishra, Y. Li, P. Agarwal, and S. Shekhar, “A 9b-linear 14 GHz
N integrating-mode phase interpolator in 5 nm FinFET process,” in IEEE
2
B S1 B2 B2 X B2
Sn Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2022,
Pt = + S2 + · · · SN = (13) pp. 1–3.
2 2 2 2
n=1 [11] L. Ye, J. Chen, L. Kong, E. Alon, and A. M. Niknejad, “Design
considerations for a direct digitally modulated WLAN transmitter with
which, in dB, is integrated phase path and dynamic impedance modulation,” IEEE J.
N 2 Solid-State Circuits, vol. 48, no. 12, pp. 3160–3177, Dec. 2013.
X BSn [12] R. Kreienkamp, U. Langmann, C. Zimmermann, T. Aoyama, and
PdBt = 10 log10 (14)
2 H. Siedhoff, “A 10-Gb/s CMOS clock and data recovery circuit with an
n=1 analog phase interpolator,” IEEE J. Solid-State Circuits, vol. 40, no. 3,
and, in dBc, is pp. 736–743, Mar. 2005.
[13] L.-M. Lee and C.-K. Yang, “Phase correction of a resonant clocking
N 2 system using resonant interpolators,” in Proc. IEEE Symp. VLSI Circuits,
X BSn
PdBct = PdBt − PdB S = 10 log10 . (15) Jun. 2008, pp. 170–171.
n=1
B S2 [14] H. Won et al., “A 0.87 W transceiver IC for 100 gigabit Ethernet in
40 nm CMOS,” IEEE J. Solid-State Circuits, vol. 50, no. 2, pp. 399–413,
Substituting (12) in this equation gives Feb. 2015.
[15] G. R. Gangasani et al., “A 16-Gb/s backplane transceiver with 12-tap
N  
X An current integrating DFE and dynamic adaptation of voltage offset and
PdBct = PdBt − PdB S = 10 log10 10 10
. (16) timing drifts in 45-nm SOI CMOS technology,” IEEE J. Solid-State
n=1 Circuits, vol. 47, no. 8, pp. 1828–1841, Aug. 2012.

Authorized licensed use limited to: The University of British Columbia Library. Downloaded on April 27,2023 at 18:38:40 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

MISHRA et al.: IMPROVING LINEARITY IN CMOS PIs 13

[16] P. A. Francese et al., “A 16 Gb/s 3.7 mW/Gb/s 8-tap DFE receiver and Yifei Li (Member, IEEE) received the B.S. degree in
baud-rate CDR with 31 kppm tracking bandwidth,” IEEE J. Solid-State microelectronics from the Harbin Institute of Tech-
Circuits, vol. 49, no. 11, pp. 2490–2502, Nov. 2014. nology, Harbin, China, in 2010, the M.S. degree in
[17] Z. Wang and P. R. Kinget, “A 65 nm CMOS, 3.5-to-11 GHz, less-than- electrical engineering from Korea University, Seoul,
1.45 LSB-INL pp , 7b twin phase interpolator with a wideband, low-noise South Korea, in 2012, with a focus on high-speed
delta quadrature delay-locked loop for high-speed data links,” in IEEE I/O, and the Ph.D. degree in electrical engineer-
Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2022, ing from Iowa State University, Ames, IA, USA,
pp. 292–294. in 2017, where he worked on concurrent multi-band
[18] C.-F. Liang, S.-C. Hwu, and S.-I. Liu, “A 10 Gbps burst-mode CDR RF power amplifiers.
circuit in 0.18 µm CMOS,” in Proc. IEEE Custom Integr. Circuits Conf. Since 2017, he has been with MaxLinear, Inc.,
(CICC), Sep. 2006, pp. 599–602. Carlsbad, CA, USA, where he has worked on clock
[19] M. Erett et al., “A 0.5–16.3 Gbps multi-standard serial transceiver generation/distribution, PLL, TIA, and data converters for optical communica-
with 219 mW/channel in 16-nm FinFET,” IEEE J. Solid-State Circuits, tion SoCs. His research interests include RF/mixed-signal circuits and systems
vol. 52, no. 7, pp. 1783–1797, Jul. 2017. for high-speed communications.
[20] S. Kumaki, A. H. Johari, T. Matsubara, I. Hayashi, and H. Ishikuro,
“A 0.5 V 6-bit scalable phase interpolator,” in Proc. IEEE Asia Pacific
Conf. Circuits Syst., Dec. 2010, pp. 1019–1022.
[21] M.-S. Chen, A. A. Hafez, and C.-K. K. Yang, “A 0.1–1.5 GHz
8-bit inverter-based digital-to-phase converter using harmonic rejection,” Pawan Agarwal (Member, IEEE) received the
IEEE J. Solid-State Circuits, vol. 48, no. 11, pp. 2681–2692, Nov. 2013. B.Tech. and M.Tech. degrees in electrical engineer-
[22] J. Z. Ru, C. Palattella, P. Geraedts, E. Klumperink, and B. Nauta, ing from IIT Madras, Chennai, India, in 2009, with a
“A high-linearity digital-to-time converter technique: Constant-slope thesis on data-converters, and the Ph.D. degree from
charging,” IEEE J. Solid-State Circuits, vol. 50, no. 6, pp. 1412–1423, Washington State University, Pullman, WA, USA,
Jun. 2015. in 2017, with a focus on mm-wave phased arrays for
[23] A. Agrawal, J. F. Bulzacchelli, T. O. Dickson, Y. Liu, J. A. Tierno, and small-cell applications and biomedical implantable
D. J. Friedman, “A 19-Gb/s serial link receiver with both 4-tap FFE systems.
and 5-tap DFE functions in 45-nm SOI CMOS,” IEEE J. Solid-State He was with Applied Micro, Pune, India,
Circuits, vol. 47, no. 12, pp. 3220–3231, Dec. 2012. from 2009 to 2011, and Applied Micro, Sunnyvale,
[24] T. O. Dickson et al., “A 1.4 pJ/bit, power-scalable 16 × 12 Gb/s source- CA, USA, in 2012, where he designed frequency
synchronous I/O with DFE receiver in 32 nm SOI CMOS technology,” synthesizers and serializers for 100G Ethernet. He designed extreme-
IEEE J. Solid-State Circuits, vol. 50, no. 8, pp. 1917–1931, Aug. 2015. performance VCO for mm-wave Backhaul links in 2014 at Maxlinear,
[25] S. Sievert et al., “A 2 GHz 244 fs-resolution 1.2 ps-peak-INL edge Inc., Carlsbad, CA, USA. He is currently developing transmitters for the
interpolator-based digital-to-time converter in 28 nm CMOS,” IEEE next-generation cable modems at Maxlinear, Inc., and previously designed
J. Solid-State Circuits, vol. 51, no. 12, pp. 2992–3004, Dec. 2016. clocking, transmitters, drivers, and TIAs for 1.6T/800G Datacenters connec-
[26] A. L. S. Loke et al., “Analog/mixed-signal design challenges in 7-nm tivity and power amplifiers for 5G communication. He has authored assorted
CMOS and beyond,” in Proc. IEEE Custom Integr. Circuits Conf. IEEE articles.
(CICC), Apr. 2019, pp. 1–8. Dr. Agarwal was a recipient of the IEEE Microwave Theory and Techniques
[27] A. Cevrero et al., “A 100 Gb/s 1.1 pJ/b PAM-4 RX with dual-mode 1-tap Society (MTT-S) Graduate Fellowship Award, the MTT-S IMS Student Paper
PAM-4/3-tap NRZ speculative DFE in 14 nm CMOS FinFET,” in IEEE Competition Award, the Best Paper and Poster Award(s) from SRC Techcon
Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2019, and CDADIC, and the Washington State University Voiland College of Engi-
pp. 112–114. neering and Architecture Outstanding Teaching Assistant Award. He humbly
[28] T. Ali et al., “A 180 mW 56 Gb/s DSP-based transceiver for high density serves as a technical reviewer and a committee member for several IEEE
IOs in data center switches in 7 nm FinFET technology,” in IEEE journals and conferences.
Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2019,
pp. 118–120.
[29] Z. Wang and P. R. Kinget, “A very high linearity twin phase interpolator
with a low-noise and wideband delta quadrature DLL for high-speed data
link clocking,” IEEE J. Solid-State Circuits, early access, Aug. 18, 2022, Sudip Shekhar (Senior Member, IEEE) received
doi: 10.1109/JSSC.2022.3197061. the B.Tech. degree from IIT Kharagpur, Kharagpur,
India, in 2003, and the Ph.D. degree from the
University of Washington, Seattle, WA, USA, in
2008.
From 2008 to 2013, he was with the Circuits
Amit Kumar Mishra (Member, IEEE) received the Research Laboratory, Intel Corporation, Hillsboro,
B.Tech. degree in electronics and communication OR, USA, where he worked on high-speed I/O
engineering from the Indian Institute of Information architectures. He is currently an Associate Professor
Technology, Jabalpur, India, in 2009, the M.Tech. of electrical and computer engineering with The
degree from the Academy of Scientific and Innova- University of British Columbia, Vancouver, BC,
tive Research, New Delhi, India, in 2011, and the Canada. His current research interests include circuits for electrical and optical
Ph.D. degree in electrical and computer engineering interfaces, frequency synthesizers, and wireless transceivers.
from The University of British Columbia, Vancou- Dr. Shekhar was a recipient of the 2022 Schmidt Science Polymath
ver, BC, Canada, in 2022. Award, the 2022 UBC Killam Teaching Prize, the 2019 Young Alumni
From 2011 to 2015, he was a Scientist with CSIR- Achiever Award by IIT Kharagpur, and the 2010 IEEE T RANSACTIONS ON
CEERI, Pilani, India, and worked on the design of C IRCUITS AND S YSTEMS Darlington Best Paper Award and a co-recipient
CMOS sensor signal conditioning and RF circuits. In 2019, he was an Intern of the 2015 IEEE Radio frequency IC Symposium Student Paper Award.
with MaxLinear, Inc., Burnaby, BC, Canada, where he worked on high-speed He serves on the Technical Program Committee of the IEEE International
clocking circuits. His research interests include analog and mixed-signal Solid-State Circuits Conference (ISSCC) and served as a Distinguished
circuits for wireless transceivers and high-speed electrical and optical links. Lecturer for the IEEE Solid-State Circuits Society from 2021 to 2022.

Authorized licensed use limited to: The University of British Columbia Library. Downloaded on April 27,2023 at 18:38:40 UTC from IEEE Xplore. Restrictions apply.

You might also like