Dynamic
Dynamic
net/publication/2978393
CITATIONS READS
864 1,168
4 authors, including:
All content following this page was uploaded by Tom Burd on 19 December 2014.
[11]. This work extends these efforts to dynamic voltage supply negative feedback loop. In this DVS system, the regulation loop
scaling of a general-purpose microprocessor, under direct oper- was modified so that the output voltage drives the ring oscillator
ating system control, and over a complete system chip-set. whose output clock signal can be readily converted to a digi-
tally measured clock frequency. The operating system controls
II. DVS OVERVIEW the loop by providing a desired clock frequency from which the
measured clock frequency is subtracted to calculate the feed-
There are three key components for implementing DVS in back error. This approach allows the software to directly set
a general-purpose microprocessor system: an operating system the operating frequency, and lets the hardware loop determine
that can intelligently vary the processor speed, a regulation loop the minimum required supply voltage to meet this desired fre-
that can generate the minimum voltage required for the desired quency.
speed, and a microprocessor that can operate over a wide voltage DVS introduces two new performance parameters, transition
range. time and transition energy. Transition time is defined as the du-
A critical characteristic of CMOS circuits is shown in Fig. 2, ration required to alter the clock frequency and supply voltage.
which plots simulated maximum clock frequency versus supply This time impacts both interrupt latency and wake-up latency
voltage for various circuits in our 0.6- m CMOS process. when the system is in its lowest-energy sleep state. Transition
Whether the circuits are simple (NAND gate, ring oscillator) energy is the extra energy consumption due to switching losses
or complex (register file, SRAM), their circuit delays track that is required to change the system supply voltage.
extremely well over a broad range of supply voltage. Thus, as
the processor’s supply voltage varies, all of the circuit delays
III. SYSTEM ARCHITECTURE
scale proportionally making CMOS processor implementations
amenable to DVS. However, subtle variations of circuit delay The complete microprocessor system is comprised of four
with voltage do exist and primarily effect circuit timing, as custom chips, as shown in Fig. 3, all of which were designed for
discussed in Section VI. DVS to maximize system energy efficiency. The regulator chip,
Control of the processor speed must be under software con- discussed in detail in Section V, converts the battery voltage
trol, as the hardware alone cannot distinguish whether the cur- to the variable supply voltage which powers the
rently executing instruction is part of a compute-intensive task microprocessor, the processor bus, the external memory bank,
or a non-speed-critical task. The application programs cannot the I/O interface chip, and the front-end of the regulator. The
set the processor speed because they are unaware of other pro- four chips have been fabricated in a 0.6- m 3-metal -V
grams running in a multitasking system. Thus, the operating CMOS process.
system must control processor speed, as it is aware of the com- The CPU chip, shown in Fig. 4, contains a custom imple-
putational requirements of all the active tasks. Applications may mentation of an ARM8 processor core [12]. The core, which
provide useful information regarding their load requirements, implements the ARM IV instruction set architecture, contains a
but should not be given direct control of the processor speed. five-stage scalar integer pipeline and an eight-word prefetch unit
As processor speed varies, so too must the supply voltage in that performs static branch prediction. A 16-kB 32-way set-as-
order to optimize the energy consumption. However, the soft- sociative unified cache operates at the core clock rate. The cache
ware is not aware of the minimum required supply voltage for a contains 16 physical blocks in which a CAM tag array provides
desired clock frequency since it is a function of the underlying the line decoding for an SRAM data array which is logically or-
hardware implementation, process variation, and operating tem- ganized into 32 lines with 8 words per line. A 12-element write
perature. A ring oscillator, which attempts to scale over voltage buffer multiplexes address and data into a single register file
with the critical paths of the processor, provides the translation and supports a variable number of data words per address to
from supply voltage to operating frequency. accommodate the store multiple registers (STM) instruction. A
A conventional voltage regulation system samples the output bus interface unit connects the CPU to the processor bus and
voltage and compares it to an input reference voltage within a contains a memory controller that provides all the signal gen-
BURD et al.: DYNAMIC VOLTAGE SCALED MICROPROCESSOR SYSTEM 1573
A. Tracking Performance
The voltage converter required for DVS is fundamentally dif-
ferent from a standard voltage regulator because in addition to
regulating voltage for a given speed, it must also change the
voltage when a new speed is requested. A large regulator output
Fig. 10. Transient response of the regulation loop.
capacitance reduces the dominant pole frequency, thereby re-
ducing supply ripple, and increases low-voltage conversion ef-
ficiency, making the loop a better voltage regulator. A small ca- powered by the variable , while the rest of the chip is pow-
pacitance reduces transition time and energy, making the loop a ered by .
better voltage tracking system. Hence, the fundamental trade-off To minimize the sum of on-state and conduction losses, there
in DVS system design is to make the processor more tolerant of is an optimum power FET gate width for fixed load current [16].
supply ripple so that the capacitance can be reduced to mini- Since the load current varies by 50x, the power FETs are dy-
mize transition time and energy [17]. The peak-to-peak ripple namically sized to minimize losses over a broad range of load
constraint for this system was relaxed to 5%, with a maximum current and maximize conversion efficiency. The filter’s SRAM
measured value of 3.8%. look-up table also contains two bits for each power FET for in-
To further improve transition speed, the loop filter utilizes dependent, binary-weighted, sizing control. The gate-width of
feed-forward control. is first multiplied by a gain term, the nMOS and pMOS least-significant bits (LSB) are 10 and
then a feed-forward value is added to it which is solely a function 20 mm, respectively.
of the desired clock frequency. A 16 16-bit SRAM contains
the look-up table for the feed-forward value as well as the fre- C. Transient Loop Response
quency-dependent gain term, and is indexed by the upper four
Fig. 10 shows a scope trace for the system’s maximum
frequency bits. The feed-forward provides quick, but approxi-
low-to-high and high-to-low speed transitions. The signal
mate loop adaptation, while the feedback loop locks onto the
transitions from 1.2 to 3.8 V, then back down to 1.2 V. The
desired clock frequency.
Track Status signal indicates whether the loop is operating in
the tracking or regulation mode. This signal demonstrates that
B. Optimizing Conversion Efficiency the maximum transition time is 70 s for the 5–80 MHz tran-
The key design challenge of this loop was to maintain good sition under full system load, while smaller voltage transitions
conversion efficiency with over 100x variation in load power are executed in less time. During this entire transition period,
dissipation, while keeping the output capacitance sufficiently the processor system can continue to execute instructions.
small to maintain the loop’s tracking performance. A hybrid The signal labeled is the battery current measured going
PWM/PFM algorithm is utilized which combines the high ef- into the regulator, but after the battery’s bypass capacitor. There
ficiency that PWM can provide at high loads with the high effi- is a current spike on a low-to-high transition which is required
ciency that PFM can provide at low loads [15]. to charge up the loop’s output capacitor to the required voltage.
The converter operates in one of two modes, tracking and The negative current spike on the high-to-low transition occurs
regulation. Tracking mode is initiated by a new frequency re- because the power pMOS is removing charge from the capac-
quest. Charge is either delivered to, or removed from the capac- itor and placing it back onto the battery’s bypass capacitor. The
itor depending upon the sign of , and is delivered via a conversion loss of the regulator while charging and discharging
variable-width pulse which has 4 bits of control. When the error the output capacitor becomes the transition energy, and is pro-
magnitude is less than 4 MHz, the converter switches to regu- portional to the size of the capacitor. This transition energy is
lation mode in which the converter will still deliver energy to a maximum of 4 J for a 5–80 MHz transition, which equals
the output when is greater than zero, but only the micro- the energy consumed by the processor running at 80 MHz for
processor system can remove charge. When is less than 712 full-load cycles.
zero, the loop filter is disabled to suppress the charge pulse that
would otherwise remove charge and drive to zero in a
VI. DIGITAL CIRCUIT DESIGN FOR DVS
strictly PWM system. Thus, the only part of the loop that is
continuously running is the front-end which calculates . One approach to designing a processor system that switches
To improve low-voltage conversion efficiency, these circuits are voltage dynamically is to halt processor operation during the
1576 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 11, NOVEMBER 2000
inputs are low such that the output node is undriven at a value
. If ramps down by more than a diode drop by the end
of the evaluation state, the drain-well diode will become for-
ward biased. Current may be injected into the parasitic p-n-p
transistor of the pMOS device and induce latchup [18]. This
condition occurs when
(1)
Fig. 11. Ring oscillator adapting to varying V (simulated).
where is the average clock period as varies by
switching transient. The drawback to this approach is that in- a diode voltage drop . Since the clock period is longest at
terrupt latency increases and potentially useful processor cy- lowest voltage, this is evaluated as ranges from
cles are discarded. Since static CMOS gates are quite tolerable to , where mV. For our 0.6- m
of a varying voltage supply, there is no fundamental need to process, the limit is 20 V/ s. Another failure mode occurs if
halt operation during the transient. When the gate’s output is ramps up by more than by the end of the evaluation
low, it will remain low independent of . However, when state, and the output drives a pMOS device resulting in a false
the output is high, it will track via the pMOS device(s). logic low, giving a functional error. This condition occurs when
Simulation demonstrated that for a minimum-sized pMOS de-
vice in our 0.6- m process, the RC time constant of the pMOS (2)
drain-source resistance and the load capacitance is a maximum
of 5 ns, at low voltage. Thus, static CMOS gates track quite and is evaluated as varies from to
well for a in excess of 100 V/ s. Because all logic , since this condition is also most severe at low voltage. For
high nodes will track very closely, the circuit delay will our 0.6- m process, the limit is 24 V/ s.
instantaneously adapt to the varying supply voltage. Since the These limits assume that the circuit is in the evaluation state
processor clock is derived from a ring oscillator also powered for no longer than half the clock period. If the clock is gated,
by , its output frequency will dynamically adapt as well, as leaving the circuit in the evaluation state for consecutive cycles,
demonstrated in Fig. 11. these limits drop proportionally. Hence, the clock should only be
Yet, there are constraints when using a design style other than gated when the circuit is in the precharge state. These limits may
static CMOS as well as limits on allowable . The pro- be increased to that of static CMOS logic using a small bleeder
cessor system design contains a variety of different styles, in- pMOS device to hold the output at while it remains un-
cluding not only static CMOS logic, but dynamic logic, CMOS driven. The bleeder device also removes the constraint on gating
pass-gate logic, memory cells, sense-amps, bus drivers, and I/O the clock, and since the bleeder device can be made quite small,
drivers. As will be shown, the maximum that the cir- there can be insignificant degradation of circuit delay due to
cuits in this design can tolerate is approximately 5 V/ s. The the pMOS bleeder fighting the nMOS pull-down devices. The
converter loop has a maximum of only 0.2 V/ s, pro- charge-redistribution problem of dynamic logic will be magni-
viding sufficient design margin. These design constraints sacri- fied by a varying supply voltage such that the internal nodes of
fice a small amount of energy-efficiency in the circuit design, nMOS stacks should be properly precharged [18].
but return much larger gains at the system level via DVS.
C. Tri-State Busses
A. Pass Gate Logic
Tri-state busses that are not constantly driven for any given
NMOS pass gates are often used in low-power design due to cycle suffer from the same two failure modes as seen in dy-
their small area and input capacitance. However, they are limited namic logic circuits due to their floating capacitance. The re-
by not being able to pass a voltage greater than , such sulting can be much lower if the number of consecu-
that a minimum of is required for proper operation. tive undriven cycles is unbounded. Tri-state busses can only be
Since throughput and energy consumption vary approximately used if one of two design methods are followed.
by 4x over the voltage range to , using nMOS pass gates The first method is to ensure by design that the bus will al-
restricts the range of operation by a significant amount, and are ways be driven. While this is done easily on a tri-state bus with
not worth the moderate improvement in energy efficiency. In- only two drivers, this may become expensive to ensure by de-
stead, CMOS pass gates, or an alternate logic style, should be sign for a large number of drivers , which requires routing ,
utilized to maximize the voltage range for DVS. or , enable signals.
The second method is to use weak, cross-coupled inverters
B. Dynamic Logic which continually drive the bus. This is preferable to just a
Dynamic logic styles are often preferable over static CMOS bleeder pMOS as it will also maintain a low voltage on the
as they are more efficient for implementing complex logic func- floating bus. Otherwise, leakage current may drive the bus high
tions. They can be used with a varying supply voltage, but re- while it is floating for an indefinite number of cycles. The size
quire some additional design considerations. One failure mode of these inverters can be quite small, even for a large bus. For
can occur while the circuit is in the evaluation state and the gate our 0.6- m process, the inverters could be designed to tolerate a
BURD et al.: DYNAMIC VOLTAGE SCALED MICROPROCESSOR SYSTEM 1577
D. Sense Amps nificantly more expensive in area and/or power (e.g., memory
The basic sense-amplifier topology, shown in Fig. 12, re- address decoder).
sponds to the varying in a desirable manner. When
increases, the cell current drive pulling down increases be- VII. ARCHITECTURAL ENHANCEMENTS FOR DVS
cause the cell’s internal voltage increases, and the trip point of
A. Desired Frequency Register
the sense amplifier shifts up. Likewise, when decreases,
the cell current drive decreases, and the trip-point shifts down. The primary architectural support for DVS is the addition
The net affect is that the decrease/increase in response time of of the desired frequency register, which has been added to the
the sense amplifier with is relatively similar to the de- system coprocessor. Writes to this register send a new frequency
crease/increase in clock period. Thus, the basic sense amplifier request to the regulator, and reads report the current measured
is very suitable for DVS, though second-order delay variation clock frequency. This allows the operating system to actively
limits on the order of 5 V/ s, which ultimately deter- monitor the operating frequency. To reduce the pin count on the
mines the maximum slew rate allowed on the supply voltage. CPU-regulator interface, the 7-bit frequency value is serialized
by the CPU and transmitted to the regulator upon writing to the
E. Circuit Delay Variation register. The regulator then converts the serial data back to a
7-bit word. The interface requires just three pins to transmit the
While circuit delays track well over voltage, subtle delay vari- new frequency value, and one pin to transmit the clock signal
ations do exist which impact circuit timing. To demonstrate this, from the ring oscillator.
three chains of inverters were simulated whose loads were dom-
inated by gate, interconnect, and diffusion capacitance respec- B. Dynamic Performance Monitors
tively. To model paths dominated by stacked devices, a fourth
chain was simulated consisting of four pMOS and four nMOS The system coprocessor also contains several read-only reg-
transistors in series. The relative delay variation of these circuits isters that monitor system run-time performance. Four registers
is shown in Fig. 13 for which the baseline reference is an inverter track processor performance by counting the number of cycles
chain with a balanced load capacitance similar to the ring oscil- the processor spends in each of its states: active, idle, sleep, and
lator. stalled. A separate register counts the number of instructions ex-
The relative delay of all four circuits is a maximum at only ecuted. Another four registers track cache system performance
the lowest or highest operating voltages. This is true even in- by counting the cache hits, misses, cache-line write-backs, and
cluding the effect of the interconnect’s RC delay. Since the gate uncached accesses. These nine registers provide dynamic feed-
dominant curve is convex, combining it with one or more of the back to the operating system on processor utilization, which can
other effects’ curves may lead to a relative delay maxima some- be used to vary the processor speed accordingly.
where between the two voltage extremes. However, all the other
curves are concave and roughly mirror the gate dominant curve C. Ring Oscillator
such that this maxima will be at most a few percent higher than To accommodate process variation over the die as well as sim-
at either the lowest or highest voltage, and therefore insignifi- ulation error, the oscillator was designed to be programmable
cant. Thus, timing analysis is only required at the two voltage from 50% to 150% of nominal frequency with 5 bits of con-
extremes, and not at all the intermediate voltage values. trol. The frequency control is designed to be glitch-free so that
As demonstrated by the series dominant curve, the relative it can be programmed via software through another register in
delay of four stacked devices rapidly increases at low voltage, the system coprocessor.
and larger stacks will further increase the relative delay [17]. The basic oscillator architecture, shown in Fig. 14, consists
Thus, to improve the tracking of circuit delay over voltage, a of five binary weighted delay blocks, plus a return path to close
general design guideline is to limit the number of stacked de- the loop. Each of the delay blocks has both a slow and fast
vices, except for circuits whose alternative design would be sig- path which is selected by the ctrl signal. A new value for this
1578 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 11, NOVEMBER 2000
TABLE I TABLE II
MEASURED BENCHMARK ENERGY CONSUMPTION (NORMALIZED) MEASURED POWER DISSIPATION WITH THE VOLTAGE SCHEDULER
IX. CONCLUSION
1) MPEG: MPEG-2 decoding of an 80-frame 192 144 The prototype processor system demonstrates that DVS can
video at 5 frames/s, requiring an average 50-MHz clock improve the energy efficiency of battery-powered processor
rate in a single-task environment. systems by up to a factor of 10x without sacrificing peak
2) AUDIO: IDEA decryption of a 10-s 11-kHz mono audio throughput. DVS is amenable to standard digital CMOS pro-
stream, divided into 1-kB frames with a 93-ms deadline, cesses, with a few additional circuit design constraints. Existing
requiring an average 17-MHz clock rate. operating systems can be retrofitted to support DVS, with little
3) UI: A simple address-book user interface allowing simple modification, as the voltage scheduler can be added to the
searching, selection, and database selection. 432 frames operating system in a modular fashion. Finally, the prototype
are processed, each defined as a user triggered event, such system demonstrated that when running real programs, typical
as pen-down, which ends when the corresponding ac- of those run on notebook computers and PDAs, DVS provides a
tion has been completed. Most frames require less than significant reduction in measured system energy consumption,
a 10-MHz clock rate, while some frames are very com- thus significantly extending battery life.
pute intensive.
The key parameter to measure the energy-efficiency improve- ACKNOWLEDGMENT
ment of DVS is the system energy consumption. Energy con- The authors would like to thank P. Laramie, O. Rowhani,
sumption was measured by charging up a large (3.5 F) capacitor C. Chang, R. Davis, and J. C. Rudell for their contributions.
to the battery voltage, and measuring the voltage drop on it over
the duration of the benchmark. The benchmarks were first run at REFERENCES
constant maximum throughput to measure the baseline energy
[1] J. Montanaro, et al., “A 160–MHz, 32–b 0.5W CMOS RISC processor,”
consumption. They were then run with the voltage scheduler and IEEE J. Solid-State Circuits, vol. 31, pp. 1703–1714, Nov. 1996.
the energy consumption was measured again. [2] E. Vittoz, “Micropower IC,” in Proc. IEEE ESSCC, Sept. 1980, pp.
Table I shows the measured system energy consumption for 174–189.
[3] A. Chandrakasan, S. Sheng, and R. W. Brodersen, “Low-power CMOS
the three benchmarks, and is normalized to when the system digital design,” IEEE J. Solid-State Circuits, vol. 27, pp. 473–484, Apr.
is running at maximum throughput, which is the typical oper- 1992.
ating mode of a processor system that operates from a fixed [4] B. Davari, R. Dennard, and G. Shahidi, “CMOS scaling for high perfor-
mance and low power—The next ten years,” Proc. IEEE, pp. 595–606,
supply voltage. The row labeled Optimal is the energy reduction Apr. 1995.
when all the computational requirements are known a priori, [5] Advanced Configuration and Power Interface Specification, Revision
and is an estimated value derived from simulation. The optimal 1.0b, Intel/Microsoft/Toshiba, Feb. 1999, pp. 67–69.
[6] J. Rabaey, Digital Integrated Circuits—A Design Perspec-
values represent the maximum achievable energy reduction for tive. Englewood Cliffs, NJ: Prentice Hall, 1996.
these benchmarks. The last row is the measured energy con- [7] V. von Kaenel, P. Macken, and M. Degrauwe, “A voltage reduction tech-
sumption with the voltage scheduler enabled. As expected, the nique for battery-operated systems,” IEEE J. Solid-State Circuits, vol.
25, pp. 1136–1140, Oct. 1990.
compute-intensive MPEG benchmark has only a 11% energy [8] T. Kuroda, et al., “Variable supply-voltage scheme for low-power high-
reduction from DVS. However, DVS demonstrates significant speed CMOS digital design,” IEEE J. Solid-State Circuits, vol. 33, pp.
improvement for the less compute-intensive AUDIO and UI 454–462, Mar. 1998.
[9] L. Nielsen, C. Niessen, J. Sparso, and K. van Berkel, “Low-power opera-
benchmarks, which have a 4.5 and 3.5 energy reduction, re- tion using self-timed circuits and adaptive scaling of the supply voltage,”
spectively. Comparing the DVS results against the optimal re- IEEE Trans. VLSI Syst., vol. 2, pp. 391–397, Dec. 1994.
sults demonstrates that while the voltage scheduler’s heuristic [10] A. Chandrakasan, V. Gutnik, and T. Xanthopoulos, “Data driven signal
processing: An approach for energy efficient computing,” in IEEE
algorithm has a difficult time optimizing for compute-intensive ISLPED Dig. Tech. Papers, Aug. 1996, pp. 347–352.
code, it performs extremely well on non-speed-critical applica- [11] G. Wei, J. Kim, D. Liu, S. Sidiropoulos, and M. Horowitz, “A variable-
tions. frequency parallel I/O interface with adaptive power supply regulation,”
in IEEE ISSCC Dig. Tech. Papers, Feb. 2000, pp. 298–299.
Table II shows the average power dissipation of the three [12] “ARM 8 Data-Sheet,” ARM Ltd., Doc. No. ARM-DDI-0080C, July
benchmarks with the voltage scheduler operating. The effective 1996.
MIPS/W is calculated as the ratio of peak throughput (85 MIPS) [13] T. Pering, T. Burd, and R. W. Brodersen, “Voltage scheduling in the
1pARM microprocessor system,” in IEEE ISLPED Dig. Tech. Papers,
to average power dissipation, and demonstrates the achievable July 2000, pp. 96–101.
increase in energy efficiency when the system is running real [14] T. Pering, “Energy-efficient operating system techniques,” Ph.D. disser-
programs. Both the UI and AUDIO benchmarks have an average tation, Univ. California, Berkeley, CA, 2000.
[15] A. Stratakos, “High-efficiency, low-voltage dc-dc conversion for
power dissipation on the order of 10mW, yielding an energy ef- portable applications,” Ph.D. dissertation, Univ. California, Berkeley,
ficiency on the order of 10 000 MIPS/W. CA, 1999.
1580 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 11, NOVEMBER 2000
[16] A. Stratakos, R. W. Brodersen, and S. Sanders, “High-efficiency low- Robert W. Brodersen (M’76–SM’81–F’82) re-
voltage dc-dc conversion for portable applications,” in Int. Workshop ceived the Bachelor of Science degrees in Electrical
Low-Power Design, Apr. 1994, pp. 619–626. Engineering and Mathematics from California
[17] T. Burd and R. W. Brodersen, “Design issues for dynamic voltage State Polytechnic University, Pomona, CA in 1966;
scaling,” in IEEE ISLPED Dig. Tech. Papers, July 2000, pp. 9–14. the Engineering and Master of Science degrees
[18] N. Weste and K. Eshraghian, Principles of CMOS VLSI De- from the Massachusetts Institute of Technology
sign. Reading, MA: Addison Wesley, 1993. (MIT), Cambridge in 1968; and the Ph.D. degree
in Engineering from MIT in 1972. On May 28,
1999, Professor Brodersen was formally declared
Technologiae Doctor Honoris Causa (Honorary
Thomas D. Burd (S’94) received the B.S and Doctor of Technology) by the University of Lund,
M.S. degrees in electrical engineering from the Sweden.
University of California, Berkeley, in 1992 and 1994, He is a Professor in the Department of Electrical Engineering and Computer
respectively, where he is currently working toward Sciences (EECS) at the University of California, Berkeley. He joined the EECS
the Ph.D. degree in the area of energy-efficient Department faculty in 1976. From 1972–1976, he was a member of the Tech-
processor system design. nical Staff, Central Research Laboratory at Texas Instruments. In addition to
For the InfoPad research project at Berkeley, he de- teaching, his present research focus is the application of integrated circuits as
veloped a low-power CMOS cell library for custom applied to personal communication systems with emphasis on wireless commu-
DSP ASICs, which was used in the design of sev- nications and low power design. He was appointed the first holder of the John
eral custom chips. He has since worked on energy- R. Whinnery Chair in the Department of Electrical Engineering and Compputer
efficient system, architecture, and circuit design for Science, University of California, Berkeley, in September, 1995. He was the
general-purpose processors, energy-efficient low-swing bus transceivers, CAD National Chair of Information Science and Technology (ISAT) Study Group
methodology to automate IC verification, and DVS converter loop architecture sponsored by the Institute for Defense Analysis, Washington, D.C., from 1992
design. to 1994. He currently serves on several committees associated with the National
Mr. Burd is the recipient of the 1998 Analog Devices Outstanding Student Academy of Sciences, Washington, D.C. He is the author or co-author of over
Award for recognition in IC design. He is a member of Tau Beta Pi and Eta 60 journal publications and 120 published conference papers, and author, co-au-
Kappa Nu. thor, editor, or contributor to 14 books, including Anatomy of a Silicon Compiler
(Norwood, MA: Kluwer, 1992), and Low Power Digital CMOS Design (Nor-
wood, MA: Kluwer, 1995). He is the holder of three patents. He has served on the
Trevor Pering received the B.S. degree in computer editorial board or as reviewer for numerous scholarly journals and publications,
science and the Ph.D. degree in electrical engineering including the IEEE JOURNAL OF SOLID-STATE CIRCUITS, IEEE TRANSACTIONS
from the University of California, Berkeley, in 1993 ON VLSI SYSTEMS, IEEE PERSONAL COMMUNICATIONS MAGAZINE, and Wire-
and 2000, respectively. In September, 1999, he joined less Personal Communications (Kluwer Press).
the Microprocessor Research Lab, Intel Corporation, He won conference best paper awards at Eascon in 1973, the International
near Portland, OR. Solid-States Circuits Conference in 1975, and the European Solid-States Cir-
At Berkeley, he worked on the InfoPad project, cuits Conference in 1978. In 1979, he received the W.G. Baker Award for the
where he was responsible for the design and imple- outstanding paper in the IEEE Journals and Transactions. He was co-recipient of
mentation of hardware-based wireless transmission the Morris Libermann Award of the IEEE in 1983 for “outstanding contributions
protocols, as well as system-level integration and to an emerging technology.” He received the best paper award in the Transac-
debugging of the InfoPad portable terminal. His tions on CAD in 1985 and the best tutorial paper of the IEEE Communications
Ph.D. work focused on energy-efficient software techniques for portable Society in 1992. In 1997, he received the distinguished IEEE Solid-States Cir-
computers, including the design of a real-time voltage-scaling operating cuits Award “for contributions to the design of integrated circuits for signal pro-
system. With Intel, he is currently engaged in user interfaces and system-level cessing systems.” He is a member of the National Academy of Engineering.
design issues for ubiquitous computing environments.