Lect3 - Design Metrics
Lect3 - Design Metrics
design metrics
• Design metric
– A measurable feature of a system’s implementation
– Optimizing design metrics is a key challenge
4
Design challenge – optimizing
design metrics
Common metrics
Unit cost: the monetary cost of manufacturing each copy of the
system, excluding NRE cost
NRE cost (Non-Recurring Engineering cost): The one-time
monetary cost of designing the system
Size: the physical space required by the system
Performance: the execution time or throughput of the system
Power: the amount of power consumed by the system
Flexibility: the ability to change the functionality of the system
without incurring heavy NRE cost
5
Design challenge – optimizing
design metrics
Common metrics (continued)
Time-to-prototype: the time needed to build a working version of
the system
Time-to-market: the time required to develop a system to the point
that it can be released and sold to customers
Maintainability: the ability to modify the system after its initial
release
Correctness, safety, many more
6
Design metric competition
improving one may worsen others
Expertise with both
software and hardware is
Power
needed to optimize design
metrics
Not just a hardware or
Performance Size
software expert, as is
common
A designer must be
NRE cost
comfortable with various
technologies in order to
choose the best for a given
application and constraints
7
Time-to-market: a demanding
design metric Time required to develop
a product to the point it
can be sold to customers
Market window
Period during which the
product would have highest
Revenues ($)
sales
Average time-to-market
Time (months)
constraint is about 8
months
Delays can be costly
8
Losses due to delayed market
entry
Simplified revenue model
Peak revenue
Product life = 2W, peak at W
Peak revenue from Time of market entry defines
delayed entry
a triangle, representing
Revenues ($)
On-time
9
Losses due to delayed market
entry (cont.) Area = 1/2 * base * height
Peak revenue
On-time = 1/2 * 2W * W
Delayed = 1/2 * (W-D+W)*(W-D)
Peak revenue from
delayed entry
Percentage revenue loss =
Revenues ($)
On-time
• Example
– NRE=$2000, unit=$100
– For 10 units
– total cost = $2000 + 10*$100 = $3000
– per-product cost = $2000/10 + $100 = $300
Amortizing NRE cost over the units results in an
additional $200 per unit
11
The performance design metric
Widely-used measure of system, widely-abused
Clock frequency, instructions per second – not good measures
Digital camera example – a user cares about how fast it processes images,
not clock speed or instructions per second
Latency (response time)
Time between task start and end
e.g., Camera’s A and B process images in 0.25 seconds
Throughput
Tasks per second, e.g. Camera A processes 4 images per second
Throughput can be more than latency seems to imply due to concurrency,
e.g. Camera B may process 8 images per second (by capturing a new image
while previous image is being stored).
Speedup of B over S = B’s performance / A’s performance
Throughput speedup = 8/4 = 2
12
Three key embedded system
technologies
Technology
A manner of accomplishing a task, especially using
technical processes, methods, or knowledge
Three key technologies for embedded systems
Processor technology
IC technology
Design technology
13
Processor technology
The architecture of the computation engine used to
implement a system’s desired functionality
Processor does not have to be programmable
“Processor” not equal to general-purpose processor
Controller Datapath Controller Datapath Controller Datapath
Control index
Control Register Control logic Registers
logic
logic and file and State total
State register State
Custom +
register register
ALU
General
IR PC ALU IR PC
Data Data
memory memory
Program Data Program
memory memory memory
Assembly code Assembly code
for: for:
total = 0 total = 0
for i =1 to … for i =1 to …
General-purpose (“software”) Application-specific Single-purpose (“hardware”)
14
Processor technology
Processors vary in their customization for the problem at hand
total = 0
for i = 1 to N loop
total += M[i]
Desired end loop
functionality
15
General-purpose processors
Programmable device used in a variety
Controller Datapath
of applications
Also known as “microprocessor” Control Register
logic and file
Features State
register
Program memory General
General datapath with large register file IR PC ALU
16
Single-purpose processors
Digital circuit designed to execute
Controller Datapath
exactly one program Control index
a.k.a. coprocessor, accelerator or peripheral logic
total
Features State
+
register
Contains only the components needed to
execute a single program Data
No program memory memory
Benefits
Fast
Low power
Small size
17
Application-specific processors
• Programmable processor optimized for Controller Datapath
a particular class of applications having Control Registers
common characteristics logic and
State
– Compromise between general-purpose and register
Custom
single-purpose processors ALU
IR PC
• Features
Data
– Program memory
Program memory
– Optimized datapath memory
– Special functional units Assembly code
for:
• Benefits
total = 0
– Some flexibility, good performance, size and for i =1 to …
power
18
Architectures
We must be clear about the architecture that we are going to use for design of
ES
It has also got a wide variety of choices, to be chosen according to the given
application.
The choices are as follows
Application-specific Architecture :-
- Controller Architecture
- Datapath Architecture
- Finite state machine with datapath
20
Datapath Operations
Load
Processor
Read memory location Control unit Datapath
into register ALU
• ALU operation Controller Control +1
/Status
– Input certain registers
through ALU, store Registers
back in register
• Store
10 11
– Write register to PC IR
memory location
I/O
...
Memory
10
11
...
21
Control Unit
Control unit: configures the
datapath operations Processor
Sequence of desired operations Control unit Datapath
(“instructions”) stored in memory –
“program” ALU
Controller Control
Instruction cycle – broken into /Status
several sub-operations, each one
clock cycle, e.g.: Registers
Fetch: Get next instruction into IR
Decode: Determine what the
instruction means
Fetch operands: Move data from PC IR R0 R1
memory to datapath register
Execute: Move data through the
ALU I/O
Store results: Write data from ...
100 load R0, M[500] Memory
500 10
register to memory 101 inc R1, R0 501
102 store M[501], R1 ...
22
Control Unit Sub-Operations
• Fetch Processor
ALU
instruction into IR Controller Control
– PC: program /Status
points to next
instruction
PC 100 IR R0 R1
load R0, M[500]
– IR: holds the
fetched instruction I/O
...
100 load R0, M[500] Memory
500 10
101 inc R1, R0 501
102 store M[501], R1 ...
23
Control Unit Sub-Operations
Decode Processor
ALU
the instruction Controller Control
means /Status
Registers
PC 100 IR R0 R1
load R0, M[500]
I/O
...
100 load R0, M[500] Memory
500 10
101 inc R1, R0 501
102 store M[501], R1 ...
24
Control Unit Sub-Operations
Fetch operands Processor
ALU
memory to Controller Control
datapath register /Status
Registers
10
PC 100 IR R0 R1
load R0, M[500]
I/O
...
100 load R0, M[500] Memory
500 10
101 inc R1, R0 501
102 store M[501], R1 ...
25
Control Unit Sub-Operations
Execute Processor
ALU
the ALU Controller Control
/Status
This particular
instruction does Registers
nothing during
this sub-operation 10
PC 100 IR R0 R1
load R0, M[500]
I/O
...
100 load R0, M[500] Memory
500 10
101 inc R1, R0 501
102 store M[501], R1 ...
26
Control Unit Sub-Operations
Store results Processor
ALU
register to memory Controller Control
/Status
This particular
instruction does Registers
nothing during
this sub-operation 10
PC 100 IR R0 R1
load R0, M[500]
I/O
...
100 load R0, M[500] Memory
500 10
101 inc R1, R0 501
102 store M[501], R1 ...
27
Instruction
PC=
100
Cycles Processor
Registers
10
PC 100 IR R0 R1
load R0, M[500]
I/O
...
100 load R0, M[500] Memory
500 10
101 inc R1, R0 501
102 store M[501], R1 ...
28
Instruction
PC=100
Cycles Processor
PC=101
Registers
Fetch DecodeFetch Exec. Store
ops result
clk
s 10 11
PC 101 IR R0 R1
inc R1, R0
I/O
...
100 load R0, M[500] Memory
500 10
101 inc R1, R0 501
102 store M[501], R1 ...
29
Instruction
PC=100
Cycles Processor
PC=101
Registers
Fetch DecodeFetch Exec. Store
ops result
clk
s 10 11
PC 102 IR R0 R1
store M[501], R1
PC=102
Fetch DecodeFetch Exec. Store I/O
ops result ...
100 load R0, M[500] Memory
clk 500 10
s 101 inc R1, R0 501 11
102 store M[501], R1 ...
30
Architectural Considerations
• N-bit processor Processor
– N-bit ALU, Control unit Datapath
bit, even 64
• PC size determines I/O
31
Architectural Considerations
• Clock frequency Processor
ALU
period Controller Control
– Must be longer than /Status
register delay in
entire processor
PC IR
– Memory access is
often the longest I/O
Memory
32
Pipelining: Increasing Instruction
Throughput
Wash 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
Non-pipelined Pipelined
Dry 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
Fetch-instr. 1 2 3 4 5 6 7 8
Decode 1 2 3 4 5 6 7 8
Execute 1 2 3 4 5 6 7 8
Instruction 1
Store res. 1 2 3 4 5 6 7 8
Time
pipelined instruction execution
33
Summary
What is an embedded system?
Characteristics of ES
Classification of ES
Design challenges and Metrics
Architecture of ES