0% found this document useful (0 votes)
3 views

A 0.5-1V Input Event-Driven Multiple Digital Low-Dropout-Regulator System for Supporting a Large Digital Load

Uploaded by

kasbekhushi012
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

A 0.5-1V Input Event-Driven Multiple Digital Low-Dropout-Regulator System for Supporting a Large Digital Load

Uploaded by

kasbekhushi012
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

C12-3

A 0.5-1V Input Event-Driven Multiple Digital Low-Dropout-Regulator System for Supporting a Large Digital Load
Sung Justin Kim1, Dongkwun Kim1, Yu Pu2, Chunlei Shi2, Mingoo Seok1, Columbia University1, Qualcomm2
Abstract: Recent digital low-dropout regulators have demonstrated GRPLQRVDPSOLQJDQGUHJXODWLRQ technique.
competitive load regulation performance for a digital load even with a low The GRPLQRVDPSOLQJ operates as follows. As shown in Fig. 5(a), the ADC
input voltage. However, few existing regulator designs have investigated into consists of a combination of i) two primary CT comparators (CMP[2], [3]) and
supporting a spatially large load with realistic grid parasitics. This paper ii) three auxiliary discrete-time (DT) comparators (CMP[0], [1], [4]). CMP[2]
presents a system consisting of nine digital low-drop-out regulators based on and [3] detect VOUT crossing the two innermost levels VREF[2] and [3], which
event-driven control for better supporting such load. At 0.5V (1V) input, our define VSP. On the other hand, CMP[0], [1], and [4] deal with the three outer
prototype improves the load regulation FoM by 3.9X (9.1X) and current levels (VREF[0], [1], [4]). All of the five comparators do not sample VOUT at the
density by 8.7X (2.8X) over the prior state of the arts [1,3,4]. same time like the previous ED ADCs [3]; instead, each sample in a
I. Introduction sequence like a domino (Fig. 5(b)). For instance, once the CMP[2] (CMP[3])
In the recent integrated digital low-dropout regulator design (LDO), it is detects VOUT undershoot (overshoot), it triggers CMP[1] (CMP[4]) to sample,
paramount to support a digital load that has been ever increasing in terms of after which the differential outputs of CMP[1] trigger the next DT comparator
current draw and silicon footprint [1, 2]. To support such load, unfortunately, it CMP[0]. This makes the ADC sample more rapidly in the fast transient region.
is not adequate to simply increase the size of power transistors in the existing The GRPLQRUHJXODWLRQ then operates as follows. The P-controller is broken
single-LDO architecture [3-6]. This is due to the non-negligible parasitic down into five slices (Fig. 6(a)), each receiving the output (LV[4:0]) of the
resistance (RG) of the practical power grid of a load. For instance, as shown in corresponding comparator and producing a 6-bit digital output 3287;>@ that
Fig. 1(a), suppose a part of the load suddenly draws a large amount of current controls the binary-sized power FETs. The POUTs of the five slices are
and thereby induces a dynamic voltage droop (VDROOP). The single-LDO, combined in the power FET stage in the current domain. Each slice holds a
placed potentially in a different location from the droop event, can sense the pre-computed output, e.g., for the 0-WK slice, 3287>@ .3>@ā3(5525>@,
droop only after the parasitic RC delay of the power grid, and during this delay, where PERROR0[2:0] is the preloaded error value of the 0-WK slice. Therefore,
VDROOP would be further developed (Fig. 1(b)). Even in the steady state, the each slice produces an output only after a flop’s C-Q delay. Note that the slice
LDO senses only the nearby grid and cannot sense the IR drop of the power has a shifter for KP multiplication but it is off the critical path. On contrary, the
grid (VIR), resulting in inaccurate regulation. These VDROOP and VIR problems prior P-controller (Fig. 6(b)) needs to perform i) thermometer to binary
become worse proportional to RG. Current practice is to pessimistically add encoding and ii) shifting (multiplying), taking 6.4X longer latency.
the guardband to the set point voltage (VSP) so that we can ensure the The proposed LDO also employs the I-controller to ensure zero (< 1 LSB)
regulator output voltage (VOUT) is always larger than a load specification. But, steady-state error (Fig. 4 bottom). Since its latency is not as critical as the
this degrades the power efficiency of both the regulator and the load. P-controller, the I-controller operates synchronously with ICLK that a local
To support a large load, we need to employ multiple LDOs distributed clock generator produces in an event-driven fashion, i.e., only when the
spatially so as to reduce VDROOP and VIR (Fig. 2). Toward this goal, a recent CT-comparator CMP[2] ([3]) detects undershoot/overshoot. The frequency of
work [1] designed a 9-LDO system. However, it is based on time-driven (TD) ICLK is ~250MHz at 0.5V (900MHz at 1V). After the LDO enters the zero-error
synchronous control and thereby exhibiting the trade-off between clock steady state, the I-controller disables the clock generators which stop all the
period (feedback latency) and power dissipation. To improve this trade-off, [1] DT comparators and the I-controller itself, minimizing quiescent power.
proposed the interleaved clock. However, the worst-case load regulation Using the proposed LDO, we designed a 9-LDO system in a 65 nm
performance improves only marginally because interleaving FDQQRW process (Fig. 7). Each LDO employs only a 100-pF output capacitor (COUT).
guarantee that the LDO that receives the rising clock edge after a droop event An off-chip variable resistor is added between every two LDOs for
will be the one nearest to the event. This is simply because the order of the experimenting RG values from 0.2 to 20. To ease integration, we designed
interleaved clocks is fixed whereas the load event occurrences are random. our LDO very modular: it does not require to modify nor add/remove circuits
Thus, the worst-case latency remains the same as the clock period for integration. We can simply calculate the needed number of LDOs based
regardless of the degree of clock interleaving (Fig. 3(a)), only marginally on total load current and place them based on the current density statistics. In
improving VDROOP and VIR in a power grid having non-negligible RG. the test chip, they are placed uniformly so as the loads. On contrary, the prior
Recently, the event-driven (ED) control has been introduced to the digital work [1] requires to i) design an LDO differently depending on its placement, ii)
LDO design, enabling a better trade-off between feedback latency and power distribute a global clock, and iii) create communication circuits among LDOs.
dissipation over TD based designs [3]. However, none was designed for a The stability of the multi-LDO system, however, must be considered as a
spatially large load having non-zero RG. Intuitively, ED LDOs are promising whole. Each LDO independently detects and responds to a local VDROOP
for such load, since unlike TD LDOs, the nearest ED LDO will DOZD\V and event while the compensating output currents from multiple LDOs couple
LPPHGLDWHO\ respond to a load current change (Fig. 3(b)). Still, it is critical to through the common power grid. An LDO could source current rapidly before
further improve the latency-power tradeoff and facilitate the multi-LDO knowing the actions of the other LDOs, posing a stability concern. We
integration while ensuring the stability for the case where several LDOs developed the analysis framework based on the discretized state space
simultaneously regulate the common power grid. model of two consecutive samples of our system (Fig. 8). We then computed
In this work, we present a 9-LDO system for a spatially large load in a 65 the eigenvalues of the state space matrix as a function of i) the number of
nm process. We propose a novel domino sampling and regulation technique LDOs (NLDO), ii) net power-grid resistance between two neighboring LDOs
to push the envelope of latency and power in the single LDO architecture. To (RG), iii) PI controller gains (KP, KI), and iv) COUT. Based on this, we set KP & KI
ease the integration into a multi-LDO system, we also design each LDO values so as to eliminate any instability caused by large coupling currents
modular so that we can simply use them without modifying LDOs. Its stability, resulting from large NLDO, small RG, and small COUT (Fig. 9). We also verified
however, needs to be analyzed as a whole. Thus, we propose the framework them through the experiments (Fig. 10).
to ensure the stability of the multi-LDO system. For 0.5V (1V), the test-chip III. Measurement Results
supports the maximum load current of 92.7 mA (416.7 mA), achieving the We measured the load-regulation transients (Fig. 11) and then experimented
current density of 248.8 mA/mm2 (1.118 A/mm2). For ~0.1ns transition time the benefits of the multi-LDO over single-LDO. We emulate the single-LDO by
(TEDGE), it also meets the ~10% VDROOP constraint for the load current change enabling only the LDO[4] and the LOAD[0] (Fig. 2). The multi-LDO can reduce
LOAD) of 36.4mA (124.2mA) at ~90% power efficiency and ~99.9% current VDROOP by 3.4X at RG = 10and eliminate the VIR (Fig. 1(c)). This good VIR
efficiency, achieving the load regulation FoM of 4.4 ps (3.75 ps) at 0.5V (1V). (accuracy) performance is achieved at VINs=0.5-1V (Fig. 12). We measured
II. Proposed Multi-DLDO Architecture the commonly-used load-regulation FoM (Table I bottom). It is strongly
Fig. 4 depicts the single LDO architecture in the proposed 9-LDO system. It affected by TEDGE, VIN, and process technology. Thus, we measured it over
consists of an ED ADC with an embedded DAC, a P-controller, an I-controller, different TEDGEs from 0.1ns to 30ns and over VINs from 0.5V to 1V (Fig. 13).
a local clock generator, and power FETs. To improve the load regulation We also measured current efficiency (Fig. 14) and DC load regulation (Fig. 15)
performance, it is critical to shorten the control latency. Recently, ED control both across ILOADs and across VINs. Finally, Table I summarizes the
has substantially shortened the latency [3], but here we further reduce it via a comparisons to the state-of-the-art LDOs. As compared to the most relevant

C128 978-4-86348-718-5 ©2019 JSAP 2019 Symposium on VLSI Circuits Digest of Technical Papers
multi-LDO work [1], our design achieves 8.7X (2.8X) higher current density design still improves the FoM by 9.1X (3.9X) at the VIN of 1V (0.5V).
even at 0.1V (0.2V) lower VIN. [1] reports the FoM only at a long TEDGE of 20ns References:
at 1.2V VIN. However, the FoM quickly improves at long TEDGE, regardless of [1] Y. Lu, et al., ISSCC 2018; [2] J. Bulzacchelli, et al., JSSC 2012; [3] D. Kim,
the circuit latency (Fig. 13). Thus, we compare ours to the state-of-the-art et al., VLSI 2018; [4] L. G. Salem, et al., ISSCC 2018; [5] S. Choi, et al., VLSI
single LDOs reporting measurements at 0.1~0.2ns TEDGEs [3,4], where our 2018; [6] Y. Okuma, et al., CICC 2010
(a) VIN
Total load = LOAD[8:0]
LOAD[0] LOAD[1] LOAD[2]
Binary-sized FETs VIN
On-chip DAC POUT0[5:0]
LDO RG
LDO[0] LDO[1] LDO[2] LV[4:0] POUT1[5:0]
PFET[11:0] VREF[4]
IPWR PFET[8:0] PFET[8:0] PFET[8:0] POUT2[5:0]
VREF[3]
VOUT[4] VOUT[0] Set Point P-Control[4:0] +
VREF[2] + +
RG impedes LOAD[3] LOAD[4] LOAD[5] VREF[1] ED - -
COUT Local load POUT3[5:0] +
Far load ADC
current Load Event LDO[3] LDO[4] LDO[5]
VREF[0]
POUT4[5:0]
PFET[8:0] PFET[8:0] PFET[8:0] VIN
(b)
VIN=0.5V LV[4:0] IOUT[8:0]
Local Clock VOUT
VOUT=0.45V RG = 10ƃ <1 LSB LOAD[6] LOAD[7] LOAD[8] I-Control COUT
Generator ICLK
0.45V
VOUT[4]

LOAD
LDO[6] LDO[7] LDO[8]
Large VDROOP of PFET[8:0] PFET[8:0] PFET[8:0]
CLK_Disable

163mV
Fig. 4: Proposed LDO architecture
50mV VOUT[0] Large VIR of
Fig. 2: Proposed 3X3 distributed LDOs (a) (b)
100ns 100mV PERROR0 PERROR1 PERROR2 PERROR3 PERROR4 LV[0] LV[1] LV[2] LV[3] LV[4]

(c) LOAD[8] event KP[1:0]


<< << << << << Thermometer-to-Binary Encoder
VOUT[4] <1 LSB
ERROR[3:0]
0.45V VSP VOUT[0] LV[0] LV[1] LV[2] LV[3] LV[4]

Critical Path
VDROOP=48.6mV

FFs

FFs

FFs
Clk

Clk

Clk

FFs
Clk
P_Pulse RESETB
VOUT[0] No VIR VOUT[8] LVB[0] LVB[1] LVB[4]

Critical Path
KP[1:0]
Long latency
LOAD[0] = 9.29mA Little current support + + - - <<
from other LDOs[0~7]
LOADs[8:1] = 90μA
Local load event @ LOAD[0] CLK[0] POUT0[5:0] POUT1[5:0] POUT2[5:0] POUT3[5:0] POUT4[5:0]
ůLOAD[0]/TEDGE= 9.2mA/<0.2ns CLK[1]
All LOADs[8:0] = 90μA
CLK[8] POUT[8:0]
Fig. 1: (a, b) Single vs. (c) distributed LDOs Local LDO[8] active Fig. 6: (a) Proposed ED domino vs. (b) conventional ED P-control
D VOUT E
(a) Distributed time-driven interleaved control
3 samples/period ࡷࡼ ή ࡵࢁ ή ࢀࡿ ࢔૚ ή ࢀࡿ ࡵࢁ ή ࢀࡿ ࢀࡿ
VREF[4] CMP[4] 95()>@ VOUT LOAD[8] event ‫ۍ‬૚ െ െ െ ǥ ૙ ‫ې‬ ‫ۍ‬ ‫ې‬
LV[4] ࡯ࡻࢁࢀ ή ࢂࡾࡱࡿ ࡯ࡻࢁࢀ ή ࡾࡳ ࡯ࡻࢁࢀ ή ࢂࡾࡱࡿ
95()>@ ࢋ ሺ࢑ ൅ ૚ሻ ‫ێ‬ ‫ࢋ ۑ‬૚ ሺ࢑ሻ ‫ ࢀࢁࡻ࡯ێ‬ή ࢂࡾࡱࡿ‫ۑ‬
95()>@ ‫ ۍ‬૚ ‫ێ ې‬ ࡷ ૚ ‫ڮ‬ ૙ ‫ۍ‬
‫ࡵ ۑ‬૚ ሺ࢑ሻ ‫ې‬ ૙
‫ࡵ ێ‬૚ ሺ࢑ ൅ ૚ሻ ‫ێ ۑ‬ ‫ێ‬ ‫ۑ‬

LVB[4]
95()>@ VOUT[0] ࢀࡿ ‫ۑ‬ ‫ێ‬ ‫ۑ‬ ࢀࡿ
VREF[3]
CMP[3] Ring oscillator VSP VOUT[8] ‫ࢋ ێ‬૛ ሺ࢑ ൅ ૚ሻ ‫ێ ۑ‬
࡯ࡻࢁࢀ ή ࡾࡳ
૙ ‫ڮ‬ ૙ ‫ࢋ ێ ۑ‬૛
ሺ࢑ሻ ‫ێ ۑ‬ ‫ۑ‬
‫ ࢀࢁࡻ࡯ێ‬ή ࢂࡾࡱࡿ‫ࡰ࡭ࡻࡸࡵ ۑ‬
LV[3] Domino Short Latency ‫ࡵ ێ‬૛ ሺ࢑ ൅ ૚ሻ ‫ ۑ‬ൌ ‫ێ‬
‫ێ‬ ‫ۑ‬ ‫ࡵ ێێ ۑ‬૛ ሺ࢑ሻ ‫ ۑۑ‬൅ ૙
CT
sampling 1 sample
‫ڭ‬ ૙ ૙ ‫ڮ‬ ૙ ‫ڭ‬ ‫ێ‬ ‫ۑ‬
/period Local LDO[8] active ‫ێ‬ ‫ێ ۑ‬ ‫ڭ‬ ‫ڭ‬ ‫ڰ‬ ‫ڭ‬
‫ێۑ‬ ‫ێ ۑ‬ ‫ڭ‬ ‫ۑ‬
‫ࡺࢋێ‬ൈࡹሺ࢑ ൅ ૚ሻ‫ێ ۑ‬ ‫ࢋ ۑ‬ ሺ࢑ሻ
CMP[2] Set point [VSP] LV[2]
VREF[2] CLK[8] Locally event-driven ࡵࢁ ή ࢀࡿ ‫ࡺ ێ ۑ‬ൈࡹ ‫ێ ۑ‬ ࢀࡿ ‫ۑ‬
CT
LV[2]
LV[1] ‫ࡺࡵ ۏ‬ൈࡹሺ࢑ ൅ ૚ሻ‫ێ ے‬ ૙ ૙ ǥ െ ‫ࡺࡵ ۏ‬ൈࡹሺ࢑ሻ‫ے‬
‫ێ‬ ࡯ࡻࢁࢀ ή ࢂࡾࡱࡿ ‫ۑ‬ ‫ ࢀࢁࡻ࡯ێ‬ή ࢂࡾࡱࡿ‫ۑ‬
Ring oscillator LV[0] (b) Distributed event-driven control ‫ۏ‬ ‫ے‬ ‫ۏ‬ ૙ ‫ے‬
૙ ૙ ǥ ૚
VREF[1] CMP[1] LV[1] Conv. Short latency H N : error state at the sample Q: the number of adjacent LDO coupling paths ,8: unit power transistor current
kth
sampling Fig. 3: (a) TD vs. (b) ED controls , N : integration state at the kth sample 76: time interval between two samples 95(6: voltage resolution of the ADC
LVB[1] P_Pulse
VREF[0] CMP[0] LV[0] LV[2:0] 100 111 Fig. 8: Stability analysis
*measurement Unstable * measurement
Long latency KI=1, TS=1ns, COUT=0.1nF, VRES=10mV, R=35: loads[8:0]=4mA Stable Dropout=10% x VIN, RG=20:
LVB[0] 25
1.5 8/8
KP= LDO[8] LDO[7] LDO[6]
Fig. 5: (a) Proposed ADC, (b) domino sampling 64 8/4
8/2
LDO[5] LDO[4] LDO[3]
1.0 16 20 LDO[2] LDO[1] LDO[0]
8/1
VIR (=°VSP VOUT°) [mV]

4 4/8 or
0.5 te err
Imaginary Part

1 4/4 y sta
4/2 15 tead
2% s
KP/KI

0.0 4/1
2/8
Smaller KP 2/4 10
-0.5 2/2
Stable 2/1
-1.0 1/8 5
Unstable 1/4
1/2
-1.5 1/1
0
-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2 4 6 8 10 12 14 16 18 20 0.5 0.6 0.7 0.8 0.9 1.0
Real Part RG [:] VIN [V]
Fig. 9: Stability vs. KP Fig. 10: Stability shmoo plot Fig. 12: VIR vs. VIN
* measurement * measurement Dropout=10% x VIN
* measurement
Dropout=10% x VIN Peak = 99.9%
1.0 VOUT error is <1 LSB for all cases
300 VIN=1V, 'ILOAD=407.7mA 3 100
VIN 0.08mV/mA
275 0.9
VIN=0.5V, 'ILOAD=91.8mA 1.0V
0.18mV/mA
Fig. 7: Die photo 250
95
VIN
0.8
0.9V
0.8V
225
Current Efficiency [%]
FOM (in Table I) [ps]

VIN= 1V 2 1.0V 0.7V 0.18mV/mA


VIN=0.5V 200 VIN= 0.5V
VOUT [V]

0.9V 0.7 0.6V


VDROOP [mV]

VOUT=0.45V <1 LSB 175


90
0.8V 0.5V 0.18mV/mA
150 VIN= 0.5V 0.7V
0.45V 0.6 0.13mV/mA
125 0.6V
49.8mV 1
0.5V
100 79X (VIN=0.5V) 0.5 0.41mV/mA
85
75 VIN= 1V
122ns 50 10 0.4
0 0
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 0.1 1 10 100
50mV 0.1 1 10 100
TEDGE [ns] ILOAD [mA] ILOAD [mA]
100ns Fig. 13: Impact of TEDGE Fig. 14: Current efficiency Fig. 15: DC load regulation
ůILOAD/TEDGE= 4.04mA/<0.1ns Single LDO Distributed LDOs
References [3] VLSI 2018 [4] ISSCC 2018 [1] ISSCC 2018 [2] JSSC 2012 This work
ILOAD = 90μA Technology 65nm 65nm 65nm 45nm SOI 65nm
Control (# of LDOs) Event-Driven (1) Time-Driven (1) Time-Driven (9) Analog (8) Event-Driven (9)
VIN=1V
Total active area [mm2] 0.0057 0.00137 0.7758 0.075 0.067
VOUT=0.9V <1 LSB Total COUT [nF] 0.1 0.365 0.9 1.46 0.9
0.9V VIN [V] 0.5 1 0.5 0.9 0.6 1.2** 1.179 1.47*** 0.5 1
94.1mV VOUT [V] 0.45 0.95 0.3 0.7 0.55 1.1** 0.9 0.92*** 0.45 0.9
IQ [uA] 18.1 - - 48.4 - 500 - 11600 131.4 683.1
135ns ILOAD range [mA] 0.065-5.6* 0.3-18.5* 0.0001-2* 0.01-3 9-36* 9-500* - 42 0.747-92.7 9-416.7
Peak current efficiency [%] 99.7 - - 99.3 99.9 99.9 - 77.5 99.9 99.8
100mV Current density [mA/mm2] 163.7 540.9 17 27.6 28.6 396.8 - 470.9 248.8 1118.6
TEDGE [ns] 0.1 - - <0.2 - 20 - 0.3 <0.1 30 <0.2 20
100ns
49.8mV 20.5mV 125mV 7.6mV 49.8mV 46mV 94.1mV 51mV
VOUT @ ILOAD - - - -
ůILOAD/TEDGE= 13.8mA/<0.2ns @ 2.3mA @ 3.25mA @ 450mA @ 36mA @ 36.4mA @ 91.8mA @ 124.2mA @ 407.7mA
FoM1 [ps] 17 - - 34.3 - 0.28 - 99.6 4.4 0.65 3.75 0.19
ILOAD = 960μA 1FoM = COUT•(VOUT/ILOAD)•(IQ/ILOAD), Smaller FoM is better **Estimated VOUT/VIN used for transients, based on reported figures
*Observed from figures ***VOUT/VIN used for transients
Fig. 11: Local load single LDO transients Table I: Comparison table

2019 Symposium on VLSI Circuits Digest of Technical Papers C129

You might also like