0% found this document useful (0 votes)
5 views

eytu_lecture2-3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

eytu_lecture2-3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 114

Power Consumption and Thermal

Management 2:
Low Power Digital Design
and Management
Pablo Ituero and Rubén San Segundo
Outline
• 4 Relationship between Energy and Delay
• 5 Circuit-level Strategies
• 6 Gate-level Strategies
• 7 Architecture-level Strategies
• 8 Software-level Strategies
• 9 System-level Strategies
• 10 Power Management Examples
▪ 10.1 Arduino
▪ 10.2 Raspberry Pi

2
Slides Credits
• Low Power Design Essentials. Jan Rabaey. Springer
• Low Power VLSI Design. Dr.-Ing. Frank Sill.
Department of Electrical Engineering, Federal
University of Minas Gerais, Brazil.
• www.arduino.cc
• www.raspberrypi.org
• Own Material

3
Lecture Recap 1
• In current electronic circuits, power is mainly
consumed through the charge and discharge of
capacitors through resistances (CMOS gates) that
store the information that the circuit processes.
• In each charge cycle, half of the energy provided
by the power supply is stored in the capacitance
and the other half is dissipated in the pull-up
resistance (turned into thermal energy, heat),
regardless of the value of the resistance.
• In the discharge cycle, the remaining half energy, is
dissipated in the pull-down resistance.

4
Lecture Recap 2
The total power consumption of a circuit is given by

P = α f CL VDD2 + VDD Ipeak (P0→1 + P1→0 ) + VDD Ileak

Dynamic power Short-circuit power Leakage power


(≈ 40 - 70% today (≈ 10 % today and (≈ 20 – 50 % today
and decreasing decreasing absolutely) and increasing)
relatively)

5
Threshold voltage

6
Sub-threshold Leakage
The dominant component of the leakage currents

Off-current increases exponentially when reducing VTH


−VTH
W
I leak = I0 10 S Pleak = VDD.Ileak
W0
7
Leakage current and temperature

8
The Traditional Design Philosophy
• Maximum performance is primary goal
▪ Minimum delay at circuit level
• Architecture implements the required function with
target throughput, latency
• Performance achieved through optimum sizing, logic
mapping, architectural transformations.
• Supplies, thresholds set to achieve maximum
performance, subject to reliability constraints
Trend: Power

Source: Moore, ISSCC 2003

10
The New Design Philosophy
• Maximum performance (in terms of propagation delay)
is too power-hungry, and/or not even practically
achievable
• Many (if not most) applications either can tolerate
larger latency, or can live with lower than maximum
clock-speeds
• Excess performance (as offered by technology) to be
used for energy/power reduction

Trading off speed for power


4. Relationship
between Energy
and Delay

12
Lowering Vdd
• One of the most straightforward ways to reduce
power is lowering 𝑉𝐷𝐷
• However, lowering 𝑉𝐷𝐷 also affects an important
metric of the circuit: Speed.

13
Threshold voltage

14
Energy-Delay Interaction

Delay decreases with supply voltage but energy/power


increases
2
𝑉𝐷𝐷
𝑃𝑑𝑦𝑛 = 𝑉𝐷𝐷 · 𝑓 · 𝐶 · 𝛼 𝐷𝑒𝑙𝑎𝑦 = 𝑘 · 𝐶
(𝑉𝐷𝐷 − 𝑉𝑇 )2
15
Static-Energy Delay Interaction

Static energy increases exponentially with decrease in


threshold voltage

Delay increases with threshold voltage

16
Relationship Between Power and Delay
-4 -10
x 10 x 10
1 5
0.8 4
Power (W)

0.6

Delay (s)
3
A
0.4 2
0.2 1
B
0 04
4 A
3 -0.4 3 B
2 0 0 -0.4
0.4 2 0.4
1 0.8 1 0.8

For a given activity level, power is reduced while delay is unchanged if both VDD
and VTH are lowered such as from A to B.

[Ref: T. Sakurai and T. Kuroda, numerous references]


Effect of VDD reduction on RON
𝑉𝐷𝐷
𝐷𝑒𝑙𝑎𝑦 = 𝑘 · 𝐶
(𝑉𝐷𝐷 − 𝑉𝑇 )2

𝐷𝑒𝑙𝑎𝑦 = 0.69 · 𝑅𝑂𝑁 · 𝐶

𝑉𝐷𝐷
𝑅𝑂𝑁 = 𝑘2
(𝑉𝐷𝐷 − 𝑉𝑇 )2

18
4.5 Reducing power:
global overview

19
Exploring the Energy-Delay Space
Energy
Unoptimized
design

Emax Pareto-optimal
designs

Emin
Dmin Dmax Delay

In energy-constrained world, design is trade-off process


♦ Minimize energy for a given performance requirement
♦ Maximize performance for given energy budget
[Ref: D. Markovic, JSSC’04]
The Design Abstraction Stack
A very rich set of design parameters to consider!
It helps to consider options in relation to their abstraction
layer

System/Application Choice of algorithm

Amount of concurrency
Software
Parallel versus pipelined, general
purpose versus application
(Micro-)Architecture specific
logic family, standard cell versus
Logic/RT custom

Circuit sizing, supply, thresholds

Bulk versus SOI


Device
Reducing Active Energy @ Design Time

Eactive ~ a  CL Vswing VDD


Pactive ~ a  CL Vswing VDD  f

• Reducing VDD has a quadratic effect!


▪ Has a negative effect on performance especially as VDD
approaches 2VT
• Reducing transistor sizes (CL)
▪ Slows down logic
• Reducing activity (a)
▪ Reducing switching activity through transformations
▪ Reducing glitching by balancing logic
▪ Impacted by logic and architecture design decisions
Lowering Dynamic Power

Power Consumption 28
and Thermal
5 Circuit-Level
Strategies

29
Transistor Sizing for Power Minimization

Lower Capacitance Higher Voltage


Small W’s

To keep
performance

Large W’s
Higher Capacitance Lower Voltage

• Larger sized devices: only useful only when interconnects dominate


• Minimum sized devices: usually optimal for low-power
Source: Timmernann,
2007

Micro transductors ‘08, Low 30


Power
6 Gate-Level
Strategies

31
Gate-Level Strategies for Low-Power
6.1 Algebraic transformations
6.2 Restructuring
6.3 Input Ordering
6.4 Dealing with glitches
6.5 Multiple VDD
Algebraic Transformations
Idea: Modify network to reduce capacitance

p1=0.05
p5=0.075
a p3=0.075 a
b f
f
a b
c c
p2=0.05 p4=0.75

pa = 0.1; pb = 0.5; pc = 0.5

Caveat: This may increase activity!


Logic Restructuring
▪ Logic restructuring: changing the topology of a logic
network to reduce transitions

AND: P0→1 = P0 * P1 = (1 - PAPB) * PAPB


3/16
0.5 A Y
0.5 (1-0.25)*0.25 = 3/16
A W 7/64 = 0.109 0.5 B 15/256
B X F
0.5
0.5 C C
0.5 D F 15/256
0.5 0.5 D Z
3/16 = 0.188
➔Chain implementation has a lower overall switching activity than tree
implementation for random inputs
▪ BUT: Ignores glitching effects
Source: Timmernann,
2007
34
Input Ordering
(1-0.5x0.2)*(0.5x0.2)=0.09 (1-0.2x0.1)*(0.2x0.1)=0.0196
0.5 0.2
A B X
X
B C
F 0.1 A F
0.2 C
0.1 0.5
AND: P0→1 = (1 - PAPB) * PAPB

Beneficial: postponing introduction of signals with a


high transition rate (signals with signal probability
close to 0.5)

Source: Timmernann,
2007
35
Glitching
A X
B
C Z

ABC 101 000

Unit Delay

36
Example 1: Chain of NAND Gates
out1 out2 out3 out4 out5
1
...

6.0

out8
4.0 out6
out4
V (Volt)

out2
VDD / 2
2.0
out1
out3
out5
out7
0.0
0 1 2 3
t (nsec)

37
Example 2: Adder Circuit

Cin

S15 S14 S2 S1 S0
3
S Output Voltage (V)

2 S3
S4 S15
Cin VDD / 2
S2
1 S5
S10
S1
S0
0
0 2 4 6 8 10 12
Time (ps)

Micro transductors ‘08, Low 38


Power
How to Cope with Glitching?

0
F1 0
1 F1 1
F2 0
0 2
F3
0 F3
0
0 F2 1
0

Equalize Lengths of Timing Paths Through Design

Micro transductors ‘08, Low 39


Power
Dealing with Glitches

0
1 1
1 1
0 0
1 1

Logic restructuring to minimize glitches

1
1 1
1 1
2
1 1 1
1
3

Buffer insertion for path balancing


Multiple VDD
• Main ideas:
▪ Use of different supply voltages within the same design
▪ High VDD for critical parts (high performance needed)
▪ Low VDD for non-critical parts (only low performance demands)

• At design phase:
▪ Determine critical path(s)
▪ High VDD for gates on those paths
▪ Lower VDD on the other gates (in non-critical paths)
▪ For low VDD: prefer gates that drive large capacitances (yields the
largest energy benefits)
• Usually two different VDD (but more are possible)

Micro transductors ‘08, Low 41


Power 2
Data Paths
• Data propagate through different data paths between registers (flipflops -
FF)
• Paths mostly differ in propagation delay times
• Frequency of clock signal (CLK) depends on path with longest delay ➔
critical path

FF FF FF

FF FF FF
Paths
Path
FF FF FF

CLK CLK CLK


42
Data Paths: Slack

C
A Y G2
G1
B

A
G1 ready with
B evaluation

Y all inputs of G2
all Inputs of G1 arrived
arrived
C

delay of G1 Slack for G1 time

43
Multiple VDD in Data Paths
• Minimum energy consumption when all logic paths are critical (same delay)
• Possible Algorithm: clustered voltage-scaling
▪ Each path starts with VDDH and switches to VDDL (blue gates) when slack
is available
▪ Level conversion in flipflops at end of paths

Connected with VDDL

Connected with VDDH

44
11. The following section represents a segment of a pipelined architecture.

11.a Signals 𝑆1 , 𝑆2 and 𝑆3 have a ‘1’ probability of 0.5. Find the ‘1’ probability of the rest of the
signals of the circuit.

11.b Find the activity factor of each signal.

11.c The circuit operates at 100 MHz, from a 1.2 V supply voltage and the average load
capacitance is 10 fF. Find the dynamic power consumption of the circuit. Do not consider the
effect of glitches in your analysis.

45
12. Considering the circuit in the previous exercise,

12.a Draw the timing diagram when the inputs signals change from 𝑆1 = 0, 𝑆2 = 0, 𝑆3 = 1 to 𝑆1 =
1, 𝑆2 = 1, 𝑆3 = 0.

12.b Is there any glitch in the circuit? What can you say about the power consumption results
of the previous exercise?

46
14. You need to implement a 3-OR function with two 2-OR gates. Find the input ordering that
minimizes power consumption, knowing that PA = 0.7, PB = 0.5, PC = 0.2.

47
7 Architecture-Level
Strategies

48
Strategies
• 7.1 Review of architectural metrics and design
techniques
• 7.2 Reducing supply voltage while maintaining
performance
• 7.3 Clock Gating
• 7.4 Bus Power Reduction

49
Design Layer: Architecture
Level
• Also known as Register transfer level (RTL)
• Base elements:
▪ Register structures
▪ Arithmetic logic units (ALU)
▪ Memory elements
• Only behavior is described
(no inner structure)

Micro transductors ‘08, Low 50


Power 2
Performance Metrics

• Two common metrics


▪ Latency (how long to do X)
• Also called response time and execution time
▪ Throughput (how often can it do X)
• Example of car assembly line
▪ Takes 6 hours to make a car
(latency is 6 hours)
▪ A car leaves every 5 minutes
(throughput is 12 cars per hour)
▪ Overlap results in Throughput > 1/Latency
Basic Concepts: Pipelining

No pipeline:
1 operation
every 1ns

1ns

Pipeline:
1 operation
every 200ps
200ps 200ps 200ps 200ps 200ps

52
Basic Concepts: Parallelism

1ns

1ns Parallel
implementation:
5 operations
every 1ns
1ns

1ns

53
1ns
Motivation for Power Reduction
• Optimizations at the architecture or system level can enable
more effective power minimization at the circuit level (while
maintaining performance), such as
▪ Enabling a reduction in supply voltage
▪ Reducing the effective switching capacitance for a given function
(physical capacitance, activity)
▪ Reducing the switching rates
▪ Reducing leakage

• Optimizations at higher abstraction levels tend to have greater


potential impact
▪ While circuit techniques may yield improvements in the 10-50% range,
architecture and algorithm optimizations have reported orders of
magnitude power reduction
Expanding the Playing Field

E E

D D

Removing inefficiencies (1) Alternative topologies (2)

E
Architecture and system
transformations and
optimizations reshape
the E-D curves

Discrete options (3)


Reducing the Supply Voltage
(while maintaining performance)
Concurrency:
trading off clock frequency versus area to reduce power

Consider the following reference design

R
F1
R
F2

fref
R: register,
Cref: average switching capacitance
F1,F2: combinational logic blocks
(adders, ALUs, etc)

[A. Chandrakasan, JSSC’92]


A Parallel Implementation
R
F1
R
F2
R

fref /2

R Almost cancels
F1
R
F2
R

fref /2

Running slower reduces required supply voltage


Yields quadratic reduction in power
A Pipelined Implementation
R
F1 R
R
F2
R R

fref fref

Shallower logic reduces required supply voltage


(this example assumes equal Vdd for par / pipe designs)

Assuming
ovpipe = 10%
Parallel Architecture: Example

• Reference Data path (for example)

• Critical path delay Tadder + Tcomparator (= 25 ns)


➔ fref = 40 MHz
• Total capacitance being switched = Cref
• VDD = Vref = 5V
• Power for reference datapath = Pref = Cref Vref2 fref

Source: Irwin, 2000

Micro transductors ‘08, Low 59


Power 2
Parallel Architecture: Example cont’d

Area = 1476 x 1219 µ2

• The clock rate can be reduced by half with the same throughput fpar = fref / 2
• Vpar = Vref / 1.7, Cpar = 2.15 Cref
• Ppar = (2.15 Cref) (Vref / 1.7)2 (fref / 2) = 0.36 Pref

Source: Irwin, 2000

Micro transductors ‘08, Low 60


Power 2
Pipelined Architecture: Example

◼ fpipe = fref, , Cpipe = 1.1 Cref , Vpipe = Vref / 1.7


◼ Voltage can be dropped while maintaining the original throughput
◼ Ppipe = CpipeVpipe2 fpipe = (1.1 Cref) (Vref/1.7)2 fref = 0.37 Pref

Source: Irwin, 2000

Micro transductors ‘08, Low 61


Power 2
Approximate Trend
N-parallel proc. N-stage pipeline proc.

Capacitance N*Cref Cref

Voltage Vref/N Vref/N

Frequency fref/N fref

Dynamic Power CrefVref2fref/N2 CrefVref2fref/N2

Chip area N times 10-20% increase

Source: G. K. Yeap, Practical Low Power Digital


VLSI Design, Boston: Kluwer Academic Publishers,
1998.
Micro transductors ‘08, Low 62
Power 2
15. Consider the circuit of Figure 1. Modules A and B have a delay of 20 nsec and 65 nsec at 5v,
and switch of 30pF and 112 pF, respectively. The register has a delay of 4 nsec and switch
0.2pF. Adding a pipeline register allows for reduction of the supply voltage while maintaining
throughput. How much power can be saved this way? Delay with respect to Vdd can be
approximated from the lower figure.

63
16. Repeat problem 15, using parallelism instead of pipelining. Assume that a 2-to-1
multiplexer has a delay of 4 ns at 2.5 V and switches 0.3 pF. Try parallelism levels of 2 and by 4.
Which one is preferred?

64
Increasing use of Concurrency Saturates
▪ Can combine parallelism and pipelining to drive VDD down
▪ But, close to process threshold overhead of excessive concurrency starts to dominate

1
0.9
0.8
0.7
Power

0.6
0.5
0.4
0.3
0.2
0.1
2 4 6 8 10 12 14 16
Concurrency
Assuming constant % overhead
Increasing use of Concurrency Saturates

P Nominal design
Fixed (no concurrency)
Throughput

Overhead +
leakage
Concurrency

Pmin

VDD

Only option: Reduce VTH as well!


But: Must consider Leakage …
Mapping into the Energy-Delay Space
E Op © IEEE 2004

N=5 N=4 N=3 N=2 nominal

Fixed throughput

Optimum
Energy-Delay
point
increasing level of parallelism

Delay = 1/Throughput

▪ For each level of performance, optimum amount of concurrency


▪ Concurrency only energy-optimal if requested throughput larger than
optimal operation point of nominal function

[Ref: D. Markovic, JSSC’04]


Some Energy-Inspired Design Guidelines

• For maximum performance


▪ Maximize use of concurrency at the cost of area
• For given performance
▪ Optimal amount of concurrency for minimum energy
• For given energy
▪ Least amount of concurrency that meets performance
goals
• For minimum energy
▪ Solution with minimum overhead (that is – direct mapping
between function and architecture)
Concepts Slowly Embraced in Late 90’s
1012
1.000E+12
Transistors/chip

1011
1.000E+11
memory
1010
1.000E+10

109
1.000E+09

108
1.000E+08
microprocessor/DSP
107
1.000E+07 100
memory

[mA/ MIP]
106
1.000E+06 10 processors
processor speed

105
1.000E+05 1
Normalized

104
1.000E+04 0.1
103
1.000E+03 computational 0.01
102
1.000E+02 efficiency 0.001
101
1.000E+01

100
1.000E+00
1960
1 3 5 7 91970 1980
11 13 15 17 19 1990
21 23 25 27 29 2000
31 33 35 37 39 2010
41 43 45 47 49 51

[Ref: R. Subramanyan, Tampere’99]


And Finally Accepted in the 00’s

100
(for constant power envelope)
Processor performance

Dual/Many Core

10x
10

Single Core
3x

1
2000 2004 2008+

[Ref: S. Chou, ISSCC’05]


Fully Accepted in 00’s
UCB Pleiades

Heterogeneous Xilinx Vertex 4


reconfigurable
fabric Intel Montecito

ARM

AMD DualCore

NTT Video codec


(4 Tensilica cores)

IBM/Sony Cell Processor

[© Xilinx, Intel, AMD, IBM, NTT]


The Quest for Concurrency
Serial = 0%
10
Serial = 6.7%
8
Performance

4
Serial = 20%
2

0
0 10 20 30
Number of Cores

Amdahl’s Law:
Clock Gating
• Most popular method for power reduction of clock signals and
functional units
• Gate off clock to idle functional units
• Logic for generation of disable signal necessary R
Functional
Higher complexity of control logic e
unit
Higher power consumption g
Critical timing critical for avoiding of
clock glitches at OR gate output
 Additional gate delay on clock signal
clock
disable

Source: Irwin, 2000

Micro transductors ‘08, Low 77


Power 2
Clock Gating: Example
Without clock gating

30.6mW

With clock gating

8.5mW DEU
VDE

MIF
0 5 10 15 20 25 DSP/
Power [mW]
HIF
896Kb SRAM
▪ 90% of FlipFlops clock-gated

▪ 70% power reduction by clock-gating


MPEG4 decoder
Source: M. Ohashi, Matsushita, 2002

78
Bus Power
• Buses are significant source of power dissipation
▪ 50% of dynamic power for interconnect switching (Magen, SLIP 04)
▪ MIT Raw processor’s on-chip network consumes 36% of total chip power
(Wang et al. 2003)
• Caused by:
▪ High switching activities
▪ Large capacitive loading

Wout Xout Yout Zout


Bus
receivers
Bus
Bus
drivers
Ain Bin Cin Din
Source: Irwin, 2000

79
Bus Power Reduction
• For an n-bit bus: Pbus = n* αfClkCloadVDD2
• Alternative bus structures
▪ Segmented buses (lower Cload)
▪ Charge recovery buses
▪ Bus multiplexing (lower fClk possible)
• Minimizing bus traffic (n)
▪ Code compression
▪ Instruction loop buffers
• Minimization of bit switching activity (fclk) by data encoding
• Minimize voltage swing (VDD2) using differential signaling

Source: Irwin, 2000

80
Reducing Shared Resources
• Shared resources incur switching overhead
• Local bus structures reduce overhead

Global bus architecture Local bus architecture

Source: Irwin, 2000

81
Reducing Shared Resources cont’d
• Bus segmentation
▪ Another way to reduce shared buses
▪ Control of bus segment by controller blocks (B)

Shared Bus
B

Segmented Bus

Source: Evgeny Bolotin – Jan 2004

82
8 Software-Level
Strategies

83
Design Layer: Algorithm Level
• Base elements:
▪ Functions
▪ Procedures
▪ Processes
▪ Control structures
• Description of design behavior

Micro transductors ‘08, Low 84


Power 2
Coding styles
• Use processor-specific instruction style:
▪ Variable types
▪ Function calls style
▪ Conditionalized instructions (for ARM)
• Follow general guidelines for software coding
▪ Use table look-up instead of conditionals
▪ Make local copies of global variables so that they can be assigned to
registers
▪ Avoid multiple memory look-ups with pointer chains

Micro transductors ‘08, Low 85


Power 2
Source-code Transformations
• Minimize power-consuming activity:
▪ Computation
A*B+A*C A*(B+C)

▪ Communication
for (c = 1..N) receive (A)
receive (A) for (c = 1..N)
B=c*A B=c*A
▪ Storage
for (c = 1..N)
B[c] = A[c]*D[c] for (c = 1..N)
for (c = 1..N) F[c] = A[c]*D[c]-1
F[c] = B[c]-1

Micro transductors ‘08, Low 86


Power 2
Adaptive Dynamic Voltage Scaling (DVS)
• Slow down processor to fill idle time
• More Delay ➔ lower operational voltage

Active Idle Active Idle 3.3 V


Active 2.4 V
• Runtime Scheduler determines processor speed and selects
appropriate voltage
• Transitions delay for frequencies ~150s
• Potential to realize 10x energy savings

Micro transductors ‘08, Low 87


Power 2
Adaptive DVS: Example
• Task with 100 ms deadline, requires 50 ms CPU time at full speed
▪ Normal system gives 50 ms computation, 50 ms idle/stopped time
▪ Half speed/voltage system gives 100 ms computation, 0 ms idle
▪ Same number of CPU cycles but: E = C (VDD/2)2 = Eref / 4
▪ Dynamic Voltage Scaling adapts voltage to workload

T1 T2 T1 T2

Same work,
Speed

lower energy
Task Idle
Task

Time Time
Micro transductors ‘08, Low Power 88
2
9 System-Level
Strategies

89
Design Layer: System Level
• Basic Elements:
▪ Complex modules
▪ Processors
▪ Calculation and control units
▪ Sensors
ALU

MEM

MEM
MP3

Micro transductors ‘08, 90


Low Power 2
Dynamic Power Management
• Systems are:
▪ Designed to deliver peak performance, but …
▪ Not needing peak performance most of the time
• Components are idle sometimes
• Dynamic power management (DPM):
▪ Puts idle components in low-power non-operational
states when idle
• Power manager:
▪ Observes and controls the system
▪ Power consumption of power manager is negligible

Micro transductors ‘08, Low 91


Power 2
Processor Sleep Modes
• Software power control - power management
DOZE Most units stopped except on-chip
cache memory (cache coherency)
NAP Cache also turned off, PLL still on,
time out or external interrupt
to resume
SLEEP PLL off, external interrupt to resume

Deeper sleep mode requires


Deeper sleep mode consumes
more latency to resume
less power

Micro transductors ‘08, Low 92


Power 2
Processor Sleep Modes: Example
• PowerPC sleep modes
Mode 66Mhz 80Mhz
No power mgmt 2.18W 2.54W
Dynamic power mgmt 1.89W 2.20W
DOZE 307mW 366mW
NAP 113mW 135mW
SLEEP 89mW 105mW
SLEEP without PLL 18mW 19mW
SLEEP without clock 2mW 2mW

10 cycles to wake up from SLEEP


100us to wake up from SLEEP+
Source: Irwin, 2000

Micro transductors ‘08, Low 93


Power 2
Transmeta LongRun
• Applies adaptive DVS
• LongRun policies:
▪ Detection of different workload scenarios
▪ Based on runtime performance information
• After detection ➔ accordingly adaptation of:
▪ Processor supply voltage
▪ Processor frequency
▪ Clock frequency always within limits required by supply voltage to avoid clock
skew problems
• Use of core frequency/voltage hard coded operating points

➔ Best trade-off between performance and power possible

Micro transductors ‘08, Low 94


Power 2
Transmeta LongRun cont’d
100
90
% of max powerl consumption

80
70
60
50
40
30
20
10
Typical operating region Peak performance region
0
300 400 500 600 700 800 900 1000
300 Mhz 433 Mhz 533 Mhz 667 Mhz 800 Mhz 900 Mhz 1000 Mhz
0.80 V 0.87 V 0.95 V 1.05 V 1.15 V 1.25 V 1.30 V

Frequency (MHz)
Source: Transmeta

Micro transductors ‘08, Low 95


Power 2
Transmeta LongRun: Example

Source: Transmeta

Micro transductors ‘08, Low 96


Power 2
10 Power
Management
Examples

97
10.1 Reducing
power consumption
with Arduino

98
Standard situation

Running from a 9V battery through the "power


in" plug, it draws about 50 mA.

Running on 5V through the +5V pin, it draws


about 49 mA.

99
Sleep modes

#include <avr/sleep.h>

void setup () {
set_sleep_mode
(SLEEP_MODE_PWR_DOWN);
sleep_enable();
sleep_cpu ();

} // end of setup
oid loop () { }

100
Sleep modes

SLEEP_MODE_IDLE: 50 mA
SLEEP_MODE_ADC: 42 mA
SLEEP_MODE_PWR_SAVE: 36 mA
SLEEP_MODE_EXT_STANDBY: 36 mA
SLEEP_MODE_STANDBY : 35 mA
SLEEP_MODE_PWR_DOWN : 34.5 mA

1. Power-save mode: keep Timer 2 running (providing clocked


from an external source).
2. Stand-by mode is similar to power-down mode, except that the
oscillator is kept running. This lets it wake up faster.
3. In IDLE mode, the clocks are running (millis()) it wakes up
every millisecond.
101
Sleep modes

SLEEP_MODE_IDLE: 50 mA
SLEEP_MODE_ADC: 42 mA
SLEEP_MODE_PWR_SAVE: 36 mA
SLEEP_MODE_EXT_STANDBY: 36 mA
SLEEP_MODE_STANDBY : 35 mA
SLEEP_MODE_PWR_DOWN : 34.5 mA

1. SLEEP_MODE_IDLE provides the least power savings but also


retains the most functionality.
2. SLEEP_MODE_PWR_DOWN uses the least power but turns
almost everything off, so your options for wake interrupts and
the like are limited.
102
Sleep modes: ATMEGA Datasheet

103
Sleep modes

104
Power Reduction Mode

In addition to putting the whole thing to sleep, you can turn off
parts of the chip with the chip's Power Reduction Manager.

Turn ADC off: power_adc_disable();

You do not need to do any serial communication:


power_usi_disable();

For maximum power reduction, just run this:


power_all_disable()
then enable what you want you actually need:
http://www.nongnu.org/avr-libc/user-
manual/group__avr__power.html
105
Configuring pins: current increment

1. All pins as outputs, and LOW: 0.0 µA (same as before).

2. All pins as outputs, and HIGH: 1.86 µA.

3. All pins as inputs, and LOW (in other words, internal


pull-ups disabled): 0.0 µA (same as before).

4. All pins as inputs, and HIGH (in other words, internal


pull-ups enabled): 1.25 µA.

106
Down-clock

typedef enum
{
clock_div_1 = 1, clock_div_2 = 2, clock_div_4 = 4,
clock_div_8 = 8, clock_div_16 = 16, clock_div_32
= 32, clock_div_64 = 64, clock_div_128 = 128
} clock_div_t;

clock_prescale_set ( clock_div_t x)

107
Down-voltage

At 8MHz

1. 5.0V : 11.67 mA
2. 4.5V : 7.74 mA
3. 4.0V : 5.60 mA
4. 3.5V : 4.10 mA
5. 3.3V : 3.70 mA
108
Max frequency vs. Voltage

109
10.2 Reducing
power consumption
with Raspberry Pi

110
General comparison

111
Power Management in Linux

112
Suspend(suspend.c)/Resume

113
Hibernation (hibernate.c)

114
Restore

115
Disconnect Unnecessary Peripherals

116
Shut down the USB Hub

117
Shut down the USB Hub

#!/bin/bash
#Code to stop
/etc/init.d/networking stop
echo 0 > /sys/devices/platform/bcm2708_usb/buspower;
echo “Bus power stopping”

#!/bin/bash
#Code to start
echo 1 > /sys/devices/platform/bcm2708_usb/buspower;
echo “Bus power starting”
sleep 2;
/etc/init.d/networking start
118
Shut down the USB Hub

To locate buspower

find /sys/devices/ -name `dmesg -t | grep dwc_otg | grep “DWC


OTG Controller” | awk ‘{print $2}’ | cut -d”:” -f1`
Power Consumption Reduction
❑With USB Hub
560-580 mA with LCD display and USB WIFI dongle.
(about 2.9 Watt). Tª 48.

❑Without USB Hub


220 mA, the power is about 1.1 Watt. Tª 42
119
Turn off video output

To turn off the HDMI port with:


sudo /opt/vc/bin/tvservice –o

to turn it back on:


sudo /opt/vc/bin/tvservice -p

This command will save you around 20-30mA.

120
Down-clock the Core

Adding this lines to your config.txt (number in MHz)


arm_freq=700
arm_freq_min=250
core_freq=250
core_freq_min=100
sdram_freq_min=150

over_voltage_min=0 or 4 if you have overclocked your


rasppi.

Dividing by 2 the frequency reduces by 8 (23) the ARM


power consumption.
121
Example

122
Example
Multithread
GPU Module

If the GPU increases a 20% its frequency, the power consumption increases (1.2)3
123

You might also like