01 EmbeddedSystemsNotes2011a
01 EmbeddedSystemsNotes2011a
Chapter 1
1. What is an embedded system?
An embedded system is a special-purpose system in which the computer is completely encapsulated by the device it
controls. Unlike a general-purpose computer, such as a personal computer, an embedded system performs pre-defined
tasks, usually with very specific requirements. Since the system is dedicated to a specific task, design engineers can
optimise it, reducing the size and cost of the product. Embedded systems are often mass-produced, so the cost savings
may be multiplied by millions of items.
The first recognizably modern embedded system was the Apollo Guidance Computer, developed by Charles Stark Draper
at the MIT Instrumentation Laboratory. Each flight to the moon had two. At the project's inception, the Apollo guidance
computer was considered the riskiest item in the Apollo project. The use of the then new monolithic integrated circuits, to
reduce the size and weight, increased this risk.
The first mass-produced embedded system was the Autonetics D-17 guidance computer for the Minuteman missile,
released in 1961. It was built from discrete transistor logic and had a hard disk for main memory. When the Minuteman II
went into production in 1966, the D-17 was replaced with a new computer that was the first high-volume use of integrated
circuits. This program alone reduced prices on quad NAND gate ICs from $1000/each to $3/each, permitting their use in
commercial products.
Since these early applications in the 1960s, where cost was no object, embedded systems have come down in price. There
has also been an enormous rise in processing power and functionality. For example the first microprocessor was the Intel
4004, which found its way into calculators and other small systems, but required external memory and support chips. By
the mid-1980s, most of the previously external system components had been integrated into the same chip as the
processor, resulting in integrated circuits called microcontrollers, and widespread use of embedded systems became
feasible.
As the cost of a microcontroller fell below $1, it became feasible to replace expensive analog components such as
potentiometers and variable capacitors with digital electronics controlled by a small microcontroller. By the end of the
80s, embedded systems were the norm rather than the exception for almost all electronics devices, a trend which has
continued since.
1.2. Characteristics
Embedded systems are designed to do some specific task, rather than be a general-purpose computer for multiple tasks.
Some also have real-time performance constraints that must be met, for reasons such as safety and usability; others may
have low or no performance requirements, allowing the system hardware to be simplified to reduce costs.
For high volume systems such as portable music players or mobile phones, minimising production cost is usually the
primary design consideration. Engineers typically select hardware that is just ―good enough‖ to implement the necessary
functions. For example, a digital set-top box for satellite TV has to process large amounts of data every second, but most
of the processing is done by custom integrated circuits. The embedded CPU "sets up" this process, and displays menu
graphics, etc. for the unit‘s look and feel.
1
For low-volume or prototype embedded systems, prebuilt computer hardware can be used by limiting the programs or by
replacing the operating system with a real-time operating system. In such systems, minimising the design and
development cost is usually the goal.
The software written for embedded systems is often called firmware, and is stored in ROM or Flash memory chips rather
than a disk drive. It often runs with limited hardware resources: small or no keyboard, screen, and little RAM memory.
Embedded systems reside in machines that are expected to run continuously for years without errors and in some cases
must recover by themselves if an error occurs. Therefore the software is usually developed and tested more carefully than
that for PCs, and unreliable mechanical moving parts such as Disk drives, switches or buttons are avoided. Recovery from
errors may be achieved with techniques such as a watchdog timer that resets the computer unless the software periodically
notifies the watchdog.
1.4. Self-Test
Most embedded systems have some degree of built-in self-test. In safety-critical systems, they are may be run periodically
or even continuously. There are several basic types:
1. Computer Tests: CPU, RAM, and program memory. These often run once at power-up.
2. Peripheral Tests: These simulate inputs and read-back or measure outputs.
3. Power Supply Tests: including batteries or other backup.
4. Consumables Tests: These measure what a system uses up, and warn when the quantities are low, for example a
fuel gauge in a car, or chemical levels in a medical system.
5. Safety Tests: These run within a 'safety interval', and assure that the system is still reliable. The safety interval is
usually less than the minimum time that can cause harm.
Some tests that may require interaction with a technician:
1. Cabling Tests, where a loop is made to allow the unit to receive what it transmits
2. Rigging Tests: allow a system to be adjusted when it is installed
3. Operational Tests: These measure things that a user would care about to operate the system. Notably, these
have to run when the system is operating. This includes navigational instruments on aircraft, a car's speedometer,
and disk-drive lights.
After self-test passes, it is common to indicate this by some visible means like LEDs, providing simple diagnostics to
technicians and users.
CPU Platform
There are many different CPU architectures used in embedded designs such as ARM, MIPS, Coldfire/68k, PowerPC,
X86, PIC, 8051, Atmel AVR, Renesas H8, SH, V850, FR-V, M32R etc. This in contrast to the desktop computer
market, which is currently limited to just a few competing architectures.
PC/104 is a typical base for small, low-volume embedded and ruggedized system design. These often use DOS, Linux,
NetBSD, or an embedded real-time operating system such as QNX or Inferno.
EG. This is a full PC type processer using a 1.1Ghz Intel Atom processor with multiple
RS232, LAN & USB ports, up to 2GB of RAM. All this in a 146mm x104mm module.
A common configuration for very-high-volume embedded systems is the system on a
chip (SoC)(2), an application-specific integrated circuit (ASIC(3)), for which the CPU
2
was purchased as intellectual property to add to the IC's design. A related scheme is to use a field-programmable gate
array, and program it with all the logic, including the CPU.
1.5. Tools
As for other software, embedded system designers use compilers, assemblers, and debuggers to develop embedded system
software. However, they may also use some more specific tools:
An in-circuit emulator(4) (ICE): ICE is a hardware device that replaces or plugs into the microprocessor, and
provides facilities to quickly load and debug experimental code in the system.
Utilities to add a checksum or CRC to a program, so the embedded system can check its program is valid
For systems using digital signal processing, developers may use a math workbench such as MatLab or
Mathematica to simulate the mathematics.
Custom compilers and linkers may be used to improve optimisation for the particular hardware.
An embedded system may have its own special language or design tool, or add enhancements to an existing
language.
Software tools can come from several sources:
Software companies that specialize in the embedded market
Ported from the GNU software development tools
Sometimes, development tools for a personal computer can be used if the embedded processor is a close relative
to a common PC processor
1.6. Debugging
Embedded Debugging may be performed at different levels, depending on the facilities available, ranging from assembly
or source-level debugging with an in-circuit emulator to output from serial debug ports, to an emulated environment
running on a personal computer.
As the complexity of embedded systems grows, higher level tools and operating systems are migrating into machinery
where it makes sense. For example, cellphones, personal digital assistants and other consumer computers often need
significant software that is purchased or provided by a person other than the manufacturer of the electronics. In these
systems, an open programming environment such as Linux, NetBSD, OSGi or Embedded Java is required so that the
third-party software provider can sell to a large market.
Most such open environments have a reference design that runs on a PC. Much of the software for such systems can be
developed on a conventional PC. However, the porting of the open environment to the specialized electronics, and the
development of the device drivers for the electronics are usually still the responsibility of a classic embedded software
engineer. In some cases, the engineer works for the integrated circuit manufacturer, but there is still such a person
somewhere.
1.7. Start-up
All embedded systems have start-up code. Usually it sets up the electronics, runs a self-test, and then starts the application
code. The startup process is commonly designed to be short, such as less than a tenth of a second, though this may depend
on the application.
3
3. The system will lose large amounts of money when shut down. (Telephone switches, factory controls, bridge and
elevator controls, funds transfer and market making, automated sales and service) These usually have a few
go/no-go tests, with on-line spares or limp-modes using alternative equipment and manual procedures.
4. The system cannot be operated when it is unsafe. Similarly, perhaps a system cannot be operated when it would
lose too much money. (Medical equipment, aircraft equipment with hot spares, such as engines, chemical factory
controls, automated stock exchanges, gaming systems) The testing can be quite exotic, but the only action is to
shut down the whole unit and indicate a failure.
4
2. Embedded Software Architectures
There are several different types of software architecture in common use.
5
2.3.2. Micro-kernels
A microkernel is an alternate realisation of a real-time OS where the micro-kernel handles only memory and task
management with user mode processes left to implement major subsystems such as file systems, network interfaces, etc.
In general, microkernels are of value when the task switching and inter-task communication is fast.
Footnotes:
System-on-a-chip
System-on-a-chip (SoC or SOC) is an idea of integrating all components of a computer or other electronic system into a single chip. It may contain
digital, analogue, mixed-signal, and often radio-frequency functions – all on one chip. A typical application is in the area of embedded systems. If it is
not feasible to construct an SoC for a particular application, an alternative is a system in package (SiP) comprising a number of chips in a single
package.
A typical SoC consists of:
one or more microcontroller, microprocessor or DSP core(s)
memory blocks including a selection of ROM, RAM, EEPROM and Flash
Timing sources including oscillators and phase-locked loops
Peripherals including counter-timers, real-time timers and power-on reset generators
External interfaces including industry standards such as USB, FireWire, Ethernet, USART, SPI
Analog interfaces including ADCs and DACs
Voltage regulators and power management circuits
These blocks are connected by either a proprietary or industry-standard bus such as the AMBA bus from ARM. DMA controllers route data directly
between external interfaces and memory, by-passing the processor core and thereby increasing the data throughput of the SoC.
In-circuit emulator
An in-circuit emulator (ICE) also called on-circuit debugger (OCD) or background debug module (BDM) is a hardware device used to debug the
software of an embedded system. Embedded systems present special problems for a programmer, because they usually lack keyboards, screens, disk-
drives and other helpful user interfaces and storage devices that are present on business computers.
The basic idea of an "in-circuit emulator" is that it provides a window into the embedded system. The programmer uses the emulator to load programs
into the embedded system, run them, step through them slowly, and see and change the data used by the system's software
In-circuit emulation can also refer to the use of hardware emulation, when the emulator is plugged into a system (not always embedded) in place of a
yet-to-be-built chip (not always a processor). These in-circuit emulators provide a way to run the system with "live" data while still allowing relatively
good debugging capabilities.
6
3. Processor Design Metrics
3.1. Introduction
Design is the task of defining a system‘s functionality and converting
that functionality into a physical implementation, while satisfying
certain constrained design metrics and optimising other design
metrics.
The functionality of modern microelectronics and embedded systems are becoming more and more complex. Getting such
complex functionality right is difficult task because of the millions of possible environment scenarios that must be
responded to properly. Not only is getting the functionality correct difficult, but creating an implementation that satisfies
physical constraints may also be difficult due to competing, tightly constrained metrics.
The designer, while converting requirements into a workable implementation, passes through many stages each with its
own constraints. Some constraints are inflexible and known to the designer before design begins; these are the generic
rules for design of a particular kind of object. For example, a logic designer may begin with the constraint of using strictly
AND, OR, and NOT gates to build a circuit. More complex objects must all reduce to these at some point. Other prior
knowledge is more understandable and is sometimes taken for granted, such as the constraint of having to do logic design
with only two-valued logic (true or false) rather than multivalued logic or continuous (analog) logic. All of these are
guiding factors to a designer.
Other design constraints are specific to the individual product being designed. Examples of these constraints are the
particular components selected for the design, the location of power and ground paths, and the timing specifications. As
design proceeds, the number of constraints increases. A good design leaves sufficient options available in the final stages
so that corrections and improvements can be made at that point. A poor design can leave the designer "painted into a
corner" such that the final stages become more and more difficult, even impossible to accomplish.
The hardest design constraints to satisfy are those that are continuous in their trade-off and, because they compete with
others, have no optimum solutions. An example of such constraints is the simultaneous desire to keep size low, power
consumption low, and performance high. In general, these three criteria cannot all be minimised because they are at cross-
purposes to each other. Improving one often leads to worsening of another. For example, if we reduce an
implementation's size, the performance may suffer. Performance and power dissipation compete with one another. The
designer must choose a scheme that meets specified needs in some areas and is not too wasteful in the others. The rules
for this are complex, intertwined, and imprecise. It is possible to say that design is a matter of constraint optimisation
without any trade-off with the system‘s functionality (the system requirements and specifications).
This chapter deals with this type of constraints; constraints that compete with one another. The metrics are divided into
four groups; performance design metrics, design economics metrics, power dissipation metrics, and system effectiveness
metrics
7
Performance: The execution time of the system or the processing power. It is usually taken to mean the time required to complete
a task (latency or response time), or as the number of tasks that can be processed per unit time (throughput). Factors that influence
throughput and latency include the clock speed, the word length, the number of general purpose registers, the instruction variety,
memory speed, programming language used, and the availability of suitable peripherals.
Power: The amount of power consumed by the system, which may determine the lifetime of a battery, or the cooling requirements
of the IC, since more power means more heat.
o Heat generation is a primary enemy in achieving increased performance. Newer processors are larger and faster, and keeping
them cool can be a major concern.
o Reducing power usage will be the primary objective in case of designing a project that needs the components to be crammed
into small space. Such applications are very sensitive to heat problems
o Energy conservation is becoming an increasingly significant global concern to be addressed.
Flexibility: The ability to change the functionality of the system without incurring heavy NRE cost. Software is typically
considered very flexible. Many digital systems are created to provide a device that may be used in a variety of applications to
achieve a reasonable solution.
Time-to-prototype: The time needed to build a working version of the system, which may be bigger or more expensive than the
final system implementation, but can be used to verify and enhance the system's functionality.
Time-to-market: The time required to develop a system to the point that it can be released and sold to customers. The main
contributors are design time, manufacturing time, and testing time.
Reliability: Reliability is the probability that a machine or product can perform continuously, without failure, for a specified
interval of time when operating under standard conditions. Increased reliability implies less failure of the machinery and
consequently less downtime and loss of production.
Availability: Availability refers to the probability that a system will be operative (up).
Serviceability: Refers to how easily the system is repaired.
Maintainability: It is the ease with which a software system or component can be modified to correct faults, improve performance,
or other attributes, or adapt to a changed environment
Range of complementary hardware: For some applications the existence of a good range of compatible ICs to support the
microcontroller/microprocessor may be important.
Special environmental constraints: The existence of special requirements, such as military specifications or minimum physical
size and weight, may well be overriding factors for certain tasks. In such cases, the decision is often an easy one.
Ease of use: This will affect the time required to develop, implement, test and start using the system.
Correctness: Our confidence that we have implemented the system's functionality correctly. We can check the functionality
throughout the process of designing the system, and we can insert test circuitry to check that manufacturing was correct.
Safety: The probability that the system will not cause harm.
Metrics typically compete with one another: Improving one often leads to worsening of another. For example, if we
reduce an implementation's size, the performance may suffer. Some observers have compared this to a wheel with
numerous pins, as illustrated below. If you push one pin in, such as size, then the other pins pop out. To best meet this
optimisation challenge, the designer must be comfortable with a variety of hardware and software implementation
technologies, and must be able to migrate from one technology to another, in order to find the best implementation for a
given application and constraints. Thus, a designer cannot simply be a hardware expert or a software expert, as is
commonly the case today; the designer must have expertise in both areas.
Power
Performance Size
NRE cost
Figure 1.1: Design metric competition -improving one may worsen others.
The above design metrics can be divided into five major groups based on the design phenomena that it measures. The five
proposed groups are:
8
1. Performance Metrics: How fast is the system? How quickly can it execute the desired application.
2. Cost Metrics: The metrics of this group measure the product cost, the unit cost and the price of the product.
3. Power Consumption Metrics: Critical in battery operated systems.
4. System effectiveness Metrics: In many applications e.g. military applications, how adequate and effective
the system is in implement its target, is more important than cost. Reliability, Maintainability, Serviceability,
design adequacy, and flexibility are related to the metrics of this group.
5. Metrics that guide the designer to select one out of many off-the-shelf components that can do the job he
wants. Ease of use, software –support, motherboard support, safety, and availability of second source
supplier are some of the metrics of this group.
Throughput: The number of tasks that can be processed per unit time. For example, an
assembly line may be able to produce 6 cars per day.
However, note that throughput is not always just the number of tasks times the latency. A system may be able to do better
than this by using parallelism, either by starting one task before finishing the next one (pipelining) or by processing each
task concurrently. In case of an automobile assembly line, there are many steps, each contributing something to the
construction of the car. Each step operates in parallel with the other steps, though on a different car. Thus, our assembly
line may have a latency of 4 hours but a throughput of 120 cars per day. In case of an automobile assembly line,
throughput is defined as the number of cars per day and is determined by how often a completed car exits the assembly
line.
Whether we are interested in throughput or response time, the key measurement is time: The computer that performs the
same amount of work in the least time is the fastest. The difference whether we measure one task (response time) or many
tasks (throughput).
We can expect many metrics measuring the throughput based on the definition of the task. It can be instruction based as in
case of MIPS, floating-point operations as in case of MFLOPS or any other task. Beside execution time and rate metrics,
a third group of performance metrics (discussed below) are widely used. A wide variety of performance metrics has been
proposed and used in the computer field to measure the performance. Unfortunately, as we are going to see later, many of
these metrics are often used and interpreted incorrectly.
10
it is consistent since the value of MHz is precisely defined across all systems
it is independent of any sort of manufacturers' games.
The above characteristics appear as advantages to using the clock rate as a measure of performance, but in fact it is
nonlinear measure (doubling the clock rate seldom doubles the resulting performance) and is an unreliable metric. As
many owners of personal computer systems can attest, buying a system with a faster clock in no way assures that their
programs will run correspondingly faster. This point will be clarified later when considering the execution time equation.
Thus, we conclude that the processor's clock rate is not a good metric of performance.
11
MFLOPS performance depends heavily on the program. Different programs require the execution of different proportions
of floating point operations. Since MFLOPS was intended to measure floating-point performance, it is not applicable
outside that range. Compilers for example have a MFLOPS rating near 0 no matter how fast the machine is, because
compilers rarely use floating-point arithmetic.
Because it is based on operations in the program rather than on the instructions, MFLOPS has stronger claim than MIPS
to being a fair comparison between different machines. The reason of that is the fact that the same program running on
different computers may execute a different number of instructions but will always execute the same number of floating-
point operations.
MFLOPS is not dependable. It depends on the type of floating-point operations present in the program (availability of
floating point instructions). For example some computers have no sine instruction while others do. In the first group of
computers, the calculation of a sine function needs to call the sine routine, which would require performing several
floating-point operations, while in the second group this would require only one operation. Another potential problem is
that the MFLOPS rating changes according not only to mixture of integer and floating-point operations but to the mixture
of fast and slow floating-point operations. For example, a program with 100% floating-point adds will have a higher
rating than a program with 100% floating-point divides. The solution to both these problems is to define a method of
counting the number of floating-point operations in a high-level language program. This counting process can also weight
the operations, giving more complex operations larger weights, allowing a machine to achieve a higher MFLOPS rating
even if the program contains many floating-point divides. This type of MFLOPS is called normalized or weighted
MFLOPS.
I essence, both MIPS and MFLOPS are however quite misleading metrics and although often used, cannot give much data
on the real performance of the system.
(1.6)
In the general case, executing the program means the use of different instruction types each of which has its own
frequency of occurrence and its own CPI.
13
SOLUTION:
Code from: A B C
Compiler 1 5 1 1
Compiler 2 10 1 1
• The machine is assumed to run at a clock rate of 100 MHz
14
Instruction type Frequency Clock Cycle Count
ALU 43% 1
Loads 21% 2
Stores 12% 2
Branches 24% 2
SOLUTION: Given the value CPIun-optimised = 1.57, then from equation (1.1), we get:
MIPSun-optimised = 500 MHz/ 1.57 x 106 = 318.5
Use equation 1.5 to calculate the performance of un-optimised code, we get
CPU timeun-optimised = ICun-optimised x 1.57 x (2 x 10-9)
= 3.14 x 10-9 x ICunoptimised
For the optimised case:
Since the optimised compiler will discard 50% of the ALU operations, then:
ICoptimised = (1 – 0.43/2) ICunoptimised = (1 - 0.215) ICunoptmized = 0.785 ICunoptimised
The frequency given in table above must be changed now to fraction of the ICoptimised. The table will take the form:
ALU 27.4% 1
Loads 26.8% 2
Stores 15.3% 2
Branches 30.5% 2
15
Another technique for comparing performance is to express the performance of a system as a percent change relative to
the performance of another system. Such a measure is called relative change. If, for example, the throughput of system A
is R1, and that of system B is R2, the relative change of system B with respect to A, denoted 2,1 , (that is, using system A
as the base) is then defined to be: R2 R1
2,1
R1
Relative change of system B w.r.t. system A =.
Typically, the value of 2,1 is multiplied by 100 to express the relative change as a percentage with respect to a given
basis system. This definition of relative change will produce a positive value if system B is faster than system A, whereas
a negative value indicates that the basis system is faster.
EXAMPLE 1.5:
As an example of how to apply these two normalization techniques, the speed up and relative change of the systems
shown in Table-1.1 are found using system 1 as the basis. From the raw execution times, we can see that system 4 is the
fastest, followed by systems 2, 1, and 3, respectively. However, the speedup values give us a more precise indication of
exactly how much faster one system is than the other. For instance, system 2 has a speedup of 1.33 compared with system
1 or, equivalently, it is 33% faster. System 4 has a speedup ratio of 2.29 compared with system 1 (or it is 129% faster).
We also see that system 3 is actually 11% slower than system 1, giving it a slowdown factor of 0.89.
Normally, we use the speedup ratio and the relative change to compare the overall performance of two systems. In many
cases it is required to measure how much the overall performance of any system can be improved due to changes in only a
single component of the system. Amdahl‘s law can be used, in such cases, to get the impact of improving certain feature
on the performance of the system.
Table-1.1: An example of calculating speedup and relative change using system 1 as the basis.
System Execution time Speedup Relative change (%)
X Tx(s) Sx,1 2,1
1 480 1 0
2 360 1.33 +33
3 540 0.89 -11
4 210 2.29 +129
s = 0;
for (i = 1; I < N; i++)
s = s + x[ i ] * y[ i ]
16
Figure 1.2: A vector dot-product example program.
Since there is no need to perform the addition or multiplication operations for elements whose value is zero, it may be
possible to reduce the total execution time if many elements of the two vectors are zero. Figure 1.3 shows the example
from Figure 1.2 modified to perform the floating-point operations only for those nonzero elements. If the conditional if
statement requires tif cycles to execute, the total time required to execute this program is
where f is the fraction of N for which both x [i] and y [i] are nonzero. Since the total number of additions and
multiplications executed in this case is 2Nf, the execution rate for this program is
2 Nf 2f
R2 FLOPS / cycle
N [t if f (t t* )] t if f (t t* )
s = 0;
for (i = 1; i < N; i++)
If (x[ i ] != 0 && y[ i ] != 0)
S = s + x[ i ] * y[ i ];
Figure1.3: The vector dot-product example program of Figure1.2 modified to calculate only nonzero
element.
If tif is four cycles, t+ is five cycles, t* is ten cycles, f is 10%, and the processor's clock rate is 250 MHz (i.e. one cycle is 4
ns), then:
t1 = 60N ns and
t2 = N [4 + 0.1(5 + 10)] * 4 ns = 22N ns.
The speedup of program 2 relative to program 1 then is found to be:
S2,1= 60N/22N = 2.73.
Calculating the execution rates realized by each program with these assumptions produces
R1 = 2/(60 ns) = 33 MFLOPS and
R2 = 2(0.1)/(22 ns) = 9.09 MFLOPS.
Thus, even though we have reduced the total execution time from t1 = 60N ns to t2 = 22N ns, the means-based metric
(MFLOPS) shows that program 2 is 72% slower than program 1. The ends-based metric (execution time), however,
shows that program 2 is actually 173% faster than program 1.
We reach completely different conclusions when using these two different types of metrics because the means-based
metric unfairly gives program 1 credit for all of the useless operations of multiplying and adding zero. This example
highlights the danger of using the wrong metric to reach a conclusion about computer-system performance.
“The performance improvement to be gained from using some faster mode of execution is
limited by the fraction of the time the faster mode can be used”.
In particular, consider the execution time lines shown in Fig.1.4. The top line shows the time (T old) required to execute
some given program on the system before any changes are made. Now assume that some change that reduces the
execution time for some particular feature in the processor by a factor of q is made to system. The program now runs in
time Tnew, where Tnew < Told , as shown in the bottom line.
Told
Tnew
Execution time for entire task without using the enhancemen t (Told )
Speedup
Execution time for entire task using the enhancemen t when possible (Tnew )
Speedup tells us how much faster a task will run using the machine with the enhancement as opposed to the original
machine.
The speedup from some enhancement depends on two factors:
The fraction of the computation time in the original machine that can be converted to take advantage of the enhancement.
For example, if 20 seconds of the execution time of a program that takes 60 seconds in total can use an enhancement, the
fraction is 20/60. This value, which we will call Fractionenhanced, is always less than or equal to 1. In Fig.1.4,
Fraction enhancment (1 ) .
The improvement gained by the enhanced execution mode; that is, how much faster the task would run if the enhanced
mode were used for the entire program. This value is the time of the original mode over the time of the enhanced mode: If
the enhanced mode takes 2 seconds for some portion of the program that can completely use the mode, while the original
mode took 5 seconds for the same portion, the improvement is 5/2. We will call this value, which is always greater than 1,
Speedupenhanced . In Fig.1.2, Speedupenhanced = q.
The execution time using the original machine with the enhanced mode will be the time spent using the unenhanced
portion of the machine plus the time spent using the enhancement:
Fraction enhanced
Execution time new Execution time old x (1 - Fraction enhanced )
Speedup enhanced
The overall speedup is the ratio of the execution times:
18
Execution time old 1
Speedup overall
Execution time new Fraction enhanced
(1 Fraction enhanced )
Speedup enhanced
Or:
1 1
Speedupoverall = (1.21)
(1 ) / q 1 / q (1 1 / q)
EXAMPLE 1.19:
Suppose that we are considering an enhancement that runs 10 times faster than the original machine but is only usable 40% of
the time. What is the overall speedup gained by incorporating the enhancement?
SOLUTION:
Fractionenhanced = (1- ) = 0.4, or = 0.6
Speedupenhanced = q = 10
Speedupoverall = 1/(0.6 + (0.4/10)) = 1/0.64 = 1.56
Amdahl‘s Law can serve as a guide to how much an enhancement will improve performance and how to distribute resources
to improve cost/performance. The goal, clearly, is to spend resources proportional to where time is spent. We can also use
Amdahl‘s Law to compare two design alternatives, as the following examples shows.
EXAMPLE 1.20:
Implementation of floating-point (FP) square root vary significantly in performance. Suppose FP square root (FPSQR) is
responsible for 20% of the execution time of a critical benchmark on a machine. One proposal is to add FPSQR hardware that
will speed up this operation by a factor 10. The other alternative is just to try to make all FP instructions run faster; FP
instructions are responsible for a total of 50% of the execution time. The design team believes that they can make all FP
instructions run two times faster with the same effort as required to the fast square root. Compare these two design
alternatives.
SOLUTION:
We can compare these two alternatives by comparing the speedups:
1 1
Speedup FPSQR 1.22
0.2 0.82
(1 - 0.2)
10
1 1
Speedup FP 1.33
0.5 0.75
(1 - 0.5)
2.0
From the results, it is clear that improving the performance of the FP operations overall is better because of the higher frequency.
EXAMPLE 1.21:
For the RISC machine with the following instruction mix given earlier:
If a CPU design enhancement improves the CPI of load instructions from 5 to 2, what is the resulting performance
improvement from this enhancement?
SOLUTION:
Fraction enhanced = (1- ) = F = 45% or .45
19
Unaffected fraction = 100% - 45% = 55% or .55
Factor of enhancement = 5/2 = 2.5
EXAMPLE 1.24:
A Program is running on a specific machine with the following parameters:
– Total instruction count: 10,000,000 instructions
– Average CPI for the program: 2.5 cycles/instruction.
– CPU clock rate: 200 MHz.
Using the same program with these changes:
– A new compiler used: New instruction count 9,500,000
New CPI: 3.0
– Faster CPU implementation: New clock rate = 300 MHZ
What is the speedup with the changes?
SOLUTION:
Speedup = Old Execution Time/ New Execution = (Iold x CPIold x Clock cycleold)/Time Inew x
CPInew x Clock Cyclenew
Speedup = (10,000,000 x 2.5 x 5x10-9) / (9,500,000 x 3 x 3.33x10-9 )
= .125 / .095 = 1.32
or 32 % faster after changes.
Equation (1.21), as mentioned before, can be used to calculate the overall speedup obtained due to some improvement in the system.
However, it can be used to till us what is going to happen as the impact on the performance of improvement becomes large, that is, as
q .
1 1
lim q ( Speedup ) lim q (1.22)
1 / q (1 1 / q)
This result says that, no matter how much one type of operation in a system is improved, the overall performance is inherently limited
by the operations that still must be performed but are unaffected by the improvement. For example the best (ideal) speedup that could
20
be obtained in a parallel computing system with p processors is p. However, if 10% of a program cannot be executed in parallel, the
overall speedup when using the parallel machine is at most 1 / = 1/0.1 = 10, even if an infinite number of processors were available.
The constraint that 10% of the total program must be executed sequentially limits the overall performance improvement that could be
obtained.
1
Speedup (1.23)
F
((1 Fi ) i )
i i Si
EXAMPLE 1.25:
Three CPU performance enhancements are proposed with the following speedups and percentage of the
code execution time affected:
Speedup1 = S1 = 10 Percentage1 = F1 = 20%
Speedup2 = S2 = 15 Percentage1 = F2 = 15%
Speedup3 = S3 = 30 Percentage1 = F3 = 10%
While all three enhancements are in place in the new design, each enhancement affects a different
portion of the code and only one enhancement can be used at a time.
What is the resulting overall speedup?
SOLUTION:
1
Speedup
Fi
((1 Fi ) )
i i Si
Speedup = 1 / [(1 - .2 - .15 - .1) + .2/10 + .15/15 + .1/30)]
= 1 / [ .55 + .0333 ]
= 1 / .5833 = 1.71
Pictorial Depiction of the example is shown in Fig.1.5
21
Before:
Execution Time without enhancement: 1
Programming
Language
Datapath
Control Megabytes per second
Function Units
Cycles per second
(clock rate)
Transistor
Although we have focused on performance and how to evaluate it in this section, designing only for performance without
considering other factors as cost, power, etc, is unrealistic. For example, in the field of computer design, the designers
22
must balance performance and cost, in mobile computing, the power comes as priority, in military application reliability,
design adequacy and system effectiveness are more important than cost, etc..
Figure 1.9: Dramatic Change in Product life cycle (market window) [9]
Let's investigate the loss of revenue that can occur due to delayed entry of a product in the market. We'll use a simplified
model of revenue that is shown in Figure 1.10(b). This model assumes the peak of the market occurs at the halfway point,
denoted as W, of the product life, and that the peak is the same even for a delayed entry .The revenue for an on-time
market entry is the area of the triangle labeled On-time, and the revenue for a delayed entry product is the area of the
triangle labeled Delayed. The revenue loss for a delayed entry is just the difference of these two triangles' areas. Let's
derive an equation for percentage revenue loss, which equals «On-time -Delayed) / On-time) * 100%. For simplicity, we'll
assume the market rise angle is 45 degrees, meaning the height of the triangle is W, and we leave as an exercise the
derivation of the same equation for any angle. The area of the On-time triangle, computed as 1/2 * base * height, is thus
1/2 * 2W * W, or W2. The area of the Delayed triangle is ½(W -D + W) * (W- D). After algebraic simplification, we
obtain the following equation for percentage revenue loss:
Percentage revenue loss = (D(3 W- D)/2W2) * 100%
Consider a product whose lifetime is 52 weeks, so W = 26. According to the preceding equation, a delay of just D = 4
weeks results in a revenue loss of 22%, and a delay of D = 10 weeks results in a loss of 50%. Some studies claim that
reaching market late has a larger negative effect on revenues than development cost overruns or even a product price that
is too high.
23
Peak Revenue
Revenue($)
Peak Revenue
Market from delayed
Rise entry
On-time
Dela Time
yed
Time (months) D W 2W
Delayed Entry
Figure 1.10: Time-to-market: (a) market window, (b) simplified revenue model for computing revenue loss from
delayed entry.
24
3.6.1. Non-recurring (Fixed) Engineering Costs
Fixed costs represent the one-time costs that the manufacturer must spend to guarantee a successful development cycle of
the product and which are not directly related to the production strategy or the volume of production (or product sold).
Once the system is developed (designed), any number of units can be manufactured without any additional fixed costs.
Fixed costs include:
Non-recurring Engineering Costs (NREs):
engineering design cost Etotal
prototype manufacturing cost Ptotal
Fixed costs to support the product Stotal
These costs are amortized over the total number of products sold. Ftotal, the total non recurring cost, is given by
Ftotal = Etotal + Ptotal + Stotal (1.27)
The NRE costs can be amortized over the lifetime volume of the product. Alternatively, the non-recurring costs can be
viewed as an investment for which there is a required rate of return. For instance, if $1M is invested in NRE for a product,
then $10M has to be generated for a rate of return of 10.
25
Days 5 5
Cost/Day $400 $400
NRE $30000 $70000
Masks $10000 $50000
Simulation $10000 $10000
Test program $10000 $10000
Second source $2000 $2000 $2000
Days 5 5 5
Cost/Day $400 $400 $400
EXAMPLE1.26:
You are starting a company to commercialize your brilliant research idea. Estimate the cost to prototype a mixed-signal
chip. Assume you have seven digital designers (each of salary $70K and costs an overhead of $30K), three analog
designers (each of salary $100K and costs an overhead of $30K), and five support personnel (each of salary $40K and an
overhead of $20K) and that the prototype takes two fabrication runs and two years.
SOLUTION:
Total cost of one digital Engineer per year = Salary + Overhead + Computer used
+ Digital front end CAD tool
= $70K + $30K + $10K + $10K = $120K
Total cost of one analog Engineer per year = Salary + Overhead + Computer used
+ Analog front end CAD tool
= $100K + $30K + $10K + $100K = $240K
Total cost of one support staff per year = Salary + Overhead + Computer used
= $40K + $20K + $10K = $70K
Total cost per year:
Cost of 7 digital engineers = 7 * $120K = $ 840K
Cost of 3 analog engineers = 3* $240K = $ 720K
Cost of 5 support staff = 5* $70K = $ 350K
Two fabrication runs = 2 * $1M = $ 2M
Total cost per year = $ 3.91M
The total predicted cost here is nearly $8M.
Figure 1.12 shows the breakdown of the overall cost.
It is important for the project manager to find ways to reduce fabrications costs. Clearly, the manager can reduce the
number of people and the labor cost. He might reduce the CAD tool cost and the fabrication cost by doing multiproject
chips. However, the latter approach will not get him to a pre-production version, because issues such as yield and
behavior across process variations will not be proved.
26
Back-End
Tools Fab
25% 25%
Entry
Tools
9% Salary
Computer 26%
4%
Overhead
11%
Special components of variable costs can be added. A few large companies such as Intel, TI, STMicroelectronics,
Toshiba, and IBM have in-house manufacturing divisions. Many other semiconductor companies outsource their
manufacturing to a silicon foundry such as TSMC, Hitachi/ UMC, IBM, LSI Logic, or ST. This is a recurring cost; it
recurs every time an IC is sold. Another component of the recurring cost is the continuing cost to support the part from a
technical viewpoint. Finally, there is what is called ―the cost of sales,‖ which is the marketing, sales force, and overhead
27
costs associated with selling each IC. In a captive situation such as the IBM microelectronics division selling CPUs to the
mainframe division, this might be zero.
Example 1.28: ASIC Variable Costs
Figure 1.13 shows typical variable costs of a certain ASIC
Example 1.29
Suppose your startup seeks a return on investment of 5. The wafers cost $2000 and hold 400 gross die with a yield of 70%. If
packaging, test, and fixed costs are negligible, how much do you need to charge per chip to have a 60% profit margin? How
many chips do you need to sell to obtain a 5-fold return on your $8M investment?
Solution:
Rtotal = Rprocess = $2000/(400* 0.7) = $7.14. For a 60% margin, the chips are sold at $7.14/(1 - 0.6) = $17.86 with a profit of
$10.72 per unit. The desired ROI implies a profit of $8M • 5 = $40M. Thus, $40M / $10.72 = 3.73M chips must be sold.
Clearly, a large market is necessary to justify the investment in custom chip design.
Power consumption equates largely with heat generation, which is a primary enemy in achieving increased
performance. Newer processors are larger and faster, and keeping them cool can be a major concern.
In the embedded domain, applications cannot use a heat sink or a fan. A cellular phone with a fan would probably
not be a top seller.
With millions of PCs in use, and sometimes thousands located in the same company, the desire to conserve
energy has grown from a non-issue to a real issue in the last five years.
Reducing power usage is a primary objective for the designers of notebook computers and embedded systems,
since they run on batteries with a limited life and may even rely on ―supercapacitors‖ for staying operational during
power outages.
28
Newer processors strive to add additional features, integrate more and more peripherals and to run at faster
speeds, which tend to increase power consumption This trend let the ICs to be more sensitive to heat problems
since their components are crammed into such a small space.
FREQUENCY CURRENT
1.0 MHz 550 uA
2.0 MHz 750 uA
3.0 MHz 1 mA
4.0 MHz 1.25 mA
By lowering the clock speed used in an application, the power required (which is simply the product of input voltage and
current) will be reduced. This may mean that the application software may have to be written ―tighter,‖ but the gains in
product life for a given set of batteries may be an important advantage for the application. Therefore, a CMOS device
should be driven at the slowest possible speed, to minimize power consumption. (―Sleep‖ mode can dramatically reduce a
microcontroller’s power consumption during inactive periods because if no gates are switching, there is no current flow in
the device.).
In some application reducing the speed is not acceptable. This is why most of the new processors focus on
reducing power consumption in fully operational and standby modes. They do so by stopping transistor activity when a
particular block is not in use. To achieve that, such designs connect every register, flip-flop, or latch to the processor's
29
clock tree. The implementation of the clock therefore becomes crucial, and it often must be completely redesigned. (In
traditional microprocessor design, the clock signal is propagated from a single point throughout the chip.)
b) Effect of Reducing the Voltage on power consumption
Obviously, the intrinsic power consumption can be further reduced by supplying a lower voltage input to the microcontroller
(which may or may not be possible, depending on the circuitry attached to the microcontroller and the microcontroller
itself). A CPU core voltage of 1.8V or even less is the state-of-the-art processor technology. The problem her is that the
power consumed by the CPU core is no longer the most power consuming part as it did in the past. The increasing
integration of power-consuming peripherals alongside embedded cores forces us to measure the power consumption of
the entire system. Overall power consumption differs a lot, depending on your system design and the degree of integration.
Increasingly, the processor core is only a small part of the entire system.
External and Internal Voltage Levels (Standard Voltage Levels and Motherboard Voltage Support
Early processors had a single voltage level that was used by the motherboard and the processor, typically 5 volts. As
processors have increased in speed and size the desire to use lower voltage levels has led designers to look at using
lower voltage levels. The first step was to reduce the voltage level to 3.3 volts. Newer processors reduce voltage levels
even more by using what is called a dual voltage, or split rail design.
A split rail processor uses two different voltages. The external or I/O voltage is higher, typically 3.3V for compatibility with
the other chips on the motherboard. The internal or core voltage is lower: usually 2.5 to 2.9 volts. This design allows these
lower-voltage CPUs to be used without requiring wholesale changes to motherboards, chipsets etc. The voltage regulator
on the motherboard is what must be changed to supply the correct voltages to the processor socket.
There are several "industry standard" voltages in use in processors today. The phrase ―industry standard‖ is put in quotes
because it seems that the number of different voltages being used continues to increase, and the new market presence of
AMD and Cyrix makes this even more confusing than when it is was just Intel we had to worry about. Table-1.3 shows the
current standard voltages with their names and the typical range of voltages that is considered acceptable to run a
processor that uses that nominal voltage level:
30
Using this mode can reduce the power consumption of a microcontroller from milliwatts to microwatts. An excellent
example of what this means is taken from the Parallax BASIC Stamp manual in a question-and-answer section:
―Q. How long can the BASIC Stamp run on a 9-volt battery?
A. This depends on what you’re doing with the BASIC Stamp. If your program never uses sleep mode and
has several LEDs connected to I/O lines, then the BASIC Stamp may only run for several hours. If, however,
sleep mode is used and I/O current draw is minimal, then the BASIC Stamp can run for weeks.‖
Using the sleep mode in a microcontroller will allow the use of a virtual ―on/off‖ switch that is connected directly to the
microcontroller. This provides several advantages.
The first is cost and reliability; a simple momentary on/off switch is much cheaper and much less prone to failure than a
slide or toggle switch. Second is operational; while sleep mode is active, the contents of the variable RAM will not be lost or
changed.
There is one potential disadvantage of sleep mode for some applications, and that is the time required for the
microcontroller to ―wake up‖ and restart its oscillator. As mentioned before, his can be as long as the initial start-up time.
This time can be as long as ten milliseconds, which will be too long for many applications. Actually, this time is too slow for
interfacing with other computer equipment. If the main thing the microcontroller is interfacing to is a human, this wake-up
time will not be an issue at all.
One thing to remember with sleep mode is to make sure there is no current draw when it is active. A microcontroller sinking
current from an LED connected to the power rail while in sleep mode will result in extra power being consumed.
In this section we give guidelines for the designer who wants to use modeling to find initial estimates for the performance,
the area, and the power consumption of the processor he is designing (e.g. an adder) at early stages. We based our
discussions assuming that the designer is using cell-based VLSI design techniques to design combinational circuits with
any degree of complexity.
Total circuit complexity (GEtotal) can be measured by the number of gate equivalents (1 GE 1 2-input NAND-gate
4 MOSFETs).
Circuit area (Acircuit) is occupied by logic cells and inter-cell wiring. In technologies with three and more metal
layers, over-the-cell routing capabilities allow the overlap of cell and wiring areas, as opposed to 2-metal
technologies. This means that most of the cell area can also be used for wiring, resulting in very low routing area
factors. (Acircuits = Acell + Awiring)
Total cell area (Acells) is roughly proportional to the number of transistors or gate equivalents (GEtotal) contained in
a circuit. This number is influenced by technology mapping, but not by physical layout. Thus, cell area can be
roughly estimated from a generic circuit description (e.g. logic equations or netlist with simple gates) and can be
precisely determined from a synthesized netlist. ( Acell GEtotal )
Wiring area (Awiring) is proportional to the total wire length. The exact wire lengths, however, are not known prior to
physical layout. ( Awiring Ltotal )
Total wire length (Ltotal) can be estimated from the number of nodes and the average wire length of a node
[Feu82, KP89] or, more accurate, from the sum of cell fan-out and the average wire length of cell-to-cell
connections (i.e. accounts for the longer wire length of nodes with higher fan-out). The wire lengths also depend
on circuit size, circuit connectivity (i.e., locality of connections), and layout topology, which are not known prior to
circuit partitioning and physical layout [RK92]. ( Ltotal FOtotal )
Cell fan-out (FO) is the number of cell inputs a cell output is driving. Fan-in is the number of inputs to a cell
[WE93], which for many combinational gates is proportional to the size of the cell. Since the sum of cell fan-out
(FOtotal) of a circuit is equivalent to the sum of cell fan-in, it is also proportional to circuit size. ( FOtotal GEtotal )
32
Therefore, in a first approximation, cell area as well as wiring area are proportional to the number of gate
equivalents. More accurate area estimations before performing actual technology mapping and circuit partitioning
are hardly possible. For circuit comparison purposes, the proportionality factor is of no concern. (
Acircuit GEtotal FOtotal )
The designer is normally interested in an area estimation model that is simple to compute while being as accurate as
possible, and it should anticipate from logic equations or generic netlists (i.e. netlists composed of simple logic gates)
alone. By considering the above observations, possible candidates are:
Unit-gate area model: This is the simplest and most abstract circuit area model, which is often used in the literature
[Tya93]. A unit gate is a basic, monotonic 2-input gate (or logic operation, if logic equations are concerned), such as
AND, OR, NAND, and NOR. Basic, non-monotonic 2-input gates like XOR and XNOR are counted as two unit gates,
reflecting their higher circuit complexities. Complex gates as well as multi-input basic gates are built from 2-input basic
gates and their gate count equals the sum of gate counts of the composing cells.
Fan-in area model: In the fan-in model, the size of 2- and multi-input basic cells is measured by counting the
number of inputs (i.e., fan-in). Complex cells are again composed of basic cells with their fan-in numbers summed up,
while the XOR/XNOR-gates are treated individually. The obtained numbers basically differ from the unit-gate numbers
only by an offset of 1 (e.g., the AND-gate counts as one unit gate but has a fan-in of two).
Other area models: The two previous models do not account for transistor level optimisation possibilities in complex gates,
e.g., in multiplexers and full-adders. More accurate area models need individual gate count numbers for such complex
gates. However, some degree of abstraction is sacrificed and application on arbitrary logic equations is not possible
anymore. The same holds true for models which take wiring aspects into consideration. One example of a more
accurate area model is the gate-equivalents model (GE) mentioned above, which bases on gate transistor counts and
therefore is only applicable after synthesis and technology mapping.
Inverters and buffers are not accounted for in the above area models, which make sense for pre-synthesis circuit
descriptions. Note that the biggest differences in buffering costs are found between low fan-out and high fan-out circuits.
With respect to area occupation however, these effects are partly compensated because high fan-out circuits need
additional buffering while low fan-out circuits usually have more wiring.
Investigations showed that the unit-gate model approach for the area estimation of complex gates, such as
multiplexers and full-adders, does not introduce more inaccuracies than e.g. the neglection of circuit connectivity for wiring
area estimation. With the XOR/XNOR being treated separately, the unit-gate model yields acceptable accuracy at the
given abstraction level.
Also, it perfectly reflects the structure of logic equations by modeling the basic logic operators individually and by regarding
complex logic functions as composed from basic ones. Investigations showed comparable performance for the fan-in and
the unit-gate models due to their similarity. After all, the unit-gate model is very commonly used in the literature.
Maximum delay (tcritical path) of a circuit is equal to the sum of cell inertial delays, cell output ramp delays, and wire
delays on the critical path.
33
Cell delay (tcell) depends on the transistor-level circuit implementation and the complexity of a cell. All simple gates
have comparable delays. Complex gates usually contain tree-like circuit and transistor arrangements, resulting in
logarithmic delay-to-area dependencies. ( tcell log( Acell ) )
Ramp delay (tramp) is the time it takes for a cell output to drive the attached capacitive load, which is made up of
interconnect and cell input loads. The ramp delay depends linearly on the capacitive load attached, which in turn
depends linearly on the fan-out of the cell. ( tramp FOcell )
Wire delay or interconnection delay (twire) is the RC-delay of a wire, which depends on the wire length. RC-delays,
however, are negligible compared to cell and ramp delays for small circuits such as the adders investigated in this
work.
(twire= 0).
Thus, a rough delay estimation is possible by considering sizes and, with a smaller weighting factor, fan-out of the
cells on the critical path.
( tcritical path
critical path
(log( Acell ) kFOcell ) )
Possible delay estimation models are:
Unit-gate delay model: The unit-gate delay model is similar to the unit-gate area model. Again, the basic 2-input gates
(AND, OR, NAND, NOR) count as one gate delay with the exception of the XOR/XNOR-gates which count as two
gate delays [Tya93]. Complex cells are composed of basic cells using the fastest possible arrangement (i.e., tree
structures wherever possible) with the total gate delay determined accordingly.
Fan-in delay model: As for area modeling, fan-in numbers can be taken instead of unit-gate numbers. Again, no
advantages over the unit-gate model are observed.
Fan-out delay model: The fan-out delay model bases on the unit-gate model but incorporates fan-out numbers, thus
accounting for gate fan-out numbers and interconnection delays [WT90]. Individual fan-out numbers can be
obtained from a generic circuit description. A proportionality factor has to be determined for appropriate weighting
of fan-out with respect to unit-gate delay numbers.
Other delay models: Various delay models exist at other abstraction levels. At the transistor level, transistors can be
modeled to contribute one unit delay each ( - model [CSTO91]). At a higher level, complex gates like full-adders
and multiplexers can again be modeled separately for higher accuracy [Kan91, CSTO91].
The impact of large fan-out on circuit delay is higher than on area requirements.
This is because high fan-out nodes lead to long wires and high capacitive loads and require additional buffering, resulting
in larger delays. Therefore, the fan-out delay model is more accurate than the unit-gate model. However, due to the much
simpler calculation of the unit-gate delay model and its widespread use, as well as for compatibility reasons with the
chosen unit-gate area model, this model will be used for the circuit comparisons in this work.
Total power (Ptotal) in CMOS circuits is dominated by the dynamic switching of circuit elements (i.e., charging and
discharging of capacitances), whereas dynamic short-circuit (or overlap) currents and static leakage are of less
34
importance. Thus, power dissipation can be assumed proportional to the total capacitance to be switched, the
square of the supply voltage, the clock frequency, and the switching activity in a circuit [CB95]. (
1
Ptotal .Ctotal.Vdd2 . f clk . )
2
Total capacitance (Ctotal) in a CMOS circuit is the sum of the capacitances from transistor gates, sources, and
drains and from wiring. Thus, total capacitance is proportional to the number of transistors and the amount of
wiring, both of which are roughly proportional to circuit size. ( Ctotal GEtotal ).
Supply voltage (Vdd) and clock frequency (fclk) can be regarded as constant within a circuit and therefore are not
relevant in our circuit comparisons. (Vdd , fclk =constant)
The switching activity factor ( ) gives a measure for the number of transient nodes per clock cycle and depends
on input patterns and circuit characteristics. In many cases, input patterns to data paths and arithmetic units are
assumed to be random, which results in a constant average transition activity of 50% on all inputs (i.e., each input
toggles each second clock cycle). Signal propagation through several levels of combinational logic may decrease
or increase transition activities, depending on the circuit structure. Such effects, however, are of minor relevance
in adder circuits and will be discussed later in the thesis. ( = constant)
Therefore, for arithmetic units having constant input switching activities, power dissipation is approximately
proportional to circuit size. ( Ptotal GEtotal )
If average power dissipation of a circuit can be regarded as proportional to its size, the presented area models can
also be used for power estimation. Thus, the unit-gate model is chosen for the power comparisons of generic circuit
descriptions.
35
In the following we are going to discuss some of the attributes (also design metrics) used in many of the system
effectiveness models.
Digital systems, as all other sophisticated equipment, undergo the cycle of repair, check-out, operational readiness, failure,
and back to repair. When the cost of a machine’s not being in operation is high, methods must be applied to reduce these
out-of-service, or downtime, periods. The cost of downtime is not simply the lost revenue when the system is not used, but
also the cost of having to rerun programs that were interrupted by ailing system (specially if the system is a computer),
retransmit and requesting other terminals to resend (if possible) copy from real-time data lost and, loss of control of
external processes, opportunity costs, and costs related to user inconvenience, dissatisfaction, and reduce confidence in
the system. Other costs are related directly to the diagnosis and corrective repair actions, and associated logistics and
book keepings.
Due to the complexity of the digital systems, many users often decide not to maintain the system (processors, memory,
system software, peripherals) themselves, but rather to have a maintenance contract with the system manufacturer. The
cost of a maintenance contract over the useful life of the system in relation to its capital cost is quite high. Some literatures
suggest that roughly 38% of the life cycle cost is directed toward maintainability issues. Such high costs due to unreliability
and maintenance needs a strong argument for designing reliability, maintainability, and serviceability.
For the purpose of better understanding, and to understand the factors that must be considered by the system designer to
insure minimum downtime and minimum costs of maintenance, in the following we are defining and explaining the related
terms, such as reliability, availability, maintainability and serviceability. Reliability, maintainability, availability and
serviceability have a direct impact on both operational capability and life cycle costs. From a life cycle cost perspective, it
has been universally recognized that the qualities of reliability, maintainability and availability result in reduced life cycle
costs
3.9.1.1. Reliability
Reliability is an attribute of any computer-related component (software, hardware, or a network, for example) that
consistently performs according to its specifications. It has long been considered one of three related attributes that must
be considered when making, buying, or using a computer product or component. Reliability, availability, and maintainability
RAM, for short - are considered to be important aspects to design into any system. (Note: sometimes together with
reliability and availability, serviceability is used instead of maintainability. In this case RAS is used instead of RAM).
Quantitatively, reliability can be defined as ―the probability that the system will perform its intended function over the stated
duration of time in the specified environment for its usage‖. Therefore, the probability that a system successfully performs
as designed is called ―system reliability‖ or the ―probability of survival‖. In theory, a reliable product is totally free of
technical errors; in practice, however, vendors frequently express a product's reliability quotient as a percentage.
Evolutionary products (those that have evolved through numerous versions over a significant period of time) are usually
considered to become increasingly reliable, since it is assumed that bugs have been eliminated in earlier releases.
Software bugs, instructions sensitivity, and problems that may arise due to durability of the EEPROM and Flash memories
(The nature of the EEPROM architecture, limits the number of updates that may be reliably performed on a single location
– this is called the durability of the memory. At least 10,000 updates are typically possible for EEPROM and 100 updates
for flash memory), are some of the possible reasons of the failure of embedded systems.
Reliability of a system depends on the number of devices used to build the system. As the number of units used to build
the system increases, the chance of system unreliability becomes greater, since the reliability of any system (or equipment)
depends on the reliability of its components. The relationship between parts reliability and the system reliability depends
mainly on the system configurations and the reliability function can be formulated mathematically to varying degrees of
precision, depending on the scale of the modeling effort. To understand how the system reliability depends on the system
configuration and to understand how to calculate the system reliability giving the reliability of each component, we are
considering here two simple configurations. The two examples are considering the reliability system consisting of n
components. These components can be hardware, software, or even human. Let Pr(A i), 1 i n , denote the probability
of event Ai that component I operates successfully during the intended period of time. Then the reliability of component I is
ri = Pr(Ai). Similarly, let Pr( Ai ) denote the probability of event Ai that component I fails during the intended period. In the
following calculations, we are going to assume that the failure of any component is independent of that of the other
components.
Case 1: Serial Configuration:
The series configuration is the simplest and perhaps one of the most common structures. The block diagram in
Fig.1.19 represents a series system consisting of n components.
36
1 2 i n
In this configuration, all n components must be operating to ensure system operations. In other words, the system fails
when any one of the n components fails.
Thus, the reliability of a series system Rs is:
= Pr( A1 A2 ...... An )
n
= Pr( A )
i 1
i
The last equality holds since all components operate independently. Therefore, the reliability of a series system is:
n
Rs = r
i 1
i (1.33)
In a parallel configuration consisting of n components, the system is successful if any one of the n components is
successful. Thus, the reliability of a parallel system is the probability of the union of the n events A1, Aa,….., An. which can
be written as:
Rs = Pr( A1 A2 ... An )
= 1 Pr( A1 A2 ..... An )
37
n
= 1 Pr( Ai )
i 1
n
= 1 (1 Pr( Ai )
i 1
The last equality holds since the component failures are independent. Therefore, the reliability of a parallel system is:
n
Rs 1 (1 ri ) (1.34)
i 1
Equations (1.33) and (1.34) show how the configuration of the components affects the system reliability. In addition, it is
possible to recognize two distinct and viable approaches to enhance system reliability; one on the level of the components
and the second on the level of the overall system organization.
i. Component technology: The first approach is based on component technology; i.e., manufacturing capability of
producing the component with the highest possible reliability, followed by parts screening, quality control,
pretesting to remove early failures (infant mortality effects), etc.
ii. System organization: The second approach is based on the organization of the system itself (e.g., fault-tolerant
architectures that make use of protective redundancy to mask or remove the effects of failure, and thereby
provide greater overall system reliability than would be possible by the use of the same components in a
simplex or nonredundant configuration).
Fault –tolerant and quasi fault-tolerant architectures: Fault tolerance is the capability of the system to perform its
functions in accordance with design specifications, even in the presence of hardware failures. If, in the event of faults,
the system functions can be performed, but do not meet the design specifications with respect to the time required to
complete the job or the storage capacity required for the job, then the system is said to be partially or quasi fault-
tolerant. Since the number of possible hardware failures can be very large, in practice it is necessary to restrict fault
tolerance to prespecified classes of faults from which the system is designed to recover.
Fault-tolerance is provided by application of protective redundancy, or the use of more resources so as to upgrade
system reliability. These resources may consist of more hardware, software, or time, or a combination of the three.
Extra time is required to retransmit messages or to execute programs, extra software is required to perform diagnosis
on the hardware, and extra hardware is required to provide replication of units.
When designing for reliability, the primary goal of the designer is to find the best way to increase system reliability.
Accepted principles for doing this include:
1. to keep the system as simple as is compatible with performance requirements;
2. to increase the reliability of the components in the system;
3. to use parallel redundancy for the less reliable components;
4. to use standby redundancy (hot standby) which can be switched to active components when failure occur;
5. to use repair maintenance where failed components are replaced but not automatically switched in;
6. to use preventive maintenance such that components are replaced by new ones whenever they fail, or at
some fixed time interval, whichever comes first;
7. to use better arrangement for exchangeable components; and
8. to use large safety factors or management programs for product improvement.
Note: The Institute of Electrical and Electronics Engineers (IEEE) sponsors an organization devoted to reliability in
engineering, the IEEE Reliability Society (IEEE RS). The Reliability Society promotes industry-wide acceptance of a
systematic approach to design that will help to ensure reliable products. To that end, they promote reliability not just in
engineering, but in maintenance and analysis as well. The Society encourages collaborative effort and information sharing
38
among its membership, which encompasses organizations and individuals involved in all areas of engineering, including
aerospace, transportation systems, medical electronics, computers, and communications.
3.9.1.2. Maintainability
A qualitative definition of maintainability M is given by Goldamn and Slattery (1979) as:
―…. The characteristics (both qualitative and quantitative) of material design and installation which make it
possible to meet operational objectives with a minimum expenditure of maintenance effort (manpower, personnel
skill, test equipment, technical data, and maintenance support facilities) under operational environmental
conditions in which scheduled and unscheduled maintenances will be performed‖
Recently, maintainability is described in MIL-HDBK-470A dated 4 August 1997, ―Designing and Developing Maintainable
Products and Systems‖ as:
―The relative ease and economy of time and resources with which an item can be retained in, or restored to, a
specified condition when maintenance is performed by personnel having specified skill levels, using prescribed
procedures and resources, at each prescribed level of maintenance and repair. In this context, it is a function of
design.‖
Based on the qualitative definitions, maintainability can also be expressed quantitatively by means of probability theory.
Thus quantitatively, according to Goldamn and Slattery,
―… maintainability is a characteristic of design and installation which is expressed as the probability that an item
will be restored to specified conditions within a given period of time when maintenance action is performed in
accordance with prescribed procedures and resources.‖
Mathematically, this can e expressed as:
-t/MTTR
M=1–e
Where t is the specified time to repair, and MTTR is the mean time to repair.
The importance of focusing on maintainability appears from some articles which suggests that roughly 38% of the life cycle
cost is directed toward maintainability issues
Design for maintainability requires a product that is serviceable (must be easily repaired) and supportable (must be cost-
effectively kept in or restored to a usable condition)—better yet if the design includes a durability feature called reliability
(absence of failures) then you can have the best of all worlds.
Supportability has a design subset involving testability (a design characteristic that allows verification of the status to be
determined and faults within the item to be isolated in a timely and effective manner such as can occur with build-in-test
equipment (BIT) so the new item can demonstrate it’s status (operable, inoperable, or degraded) and similar conditions for
routine trouble shooting and verification the equipment has been restored to useful condition following maintenance).
Maintainability is primarily a design parameter. The design for maintainability defines how long equipment will be down
and unavailable. Yes, you can reduce the amount of time spent by having a highly trained workforce and a responsive
supply system, which paces the speed of maintenance to achieve minimum downtimes. Unavailability occurs when the
equipment is down for periodic maintenance and for repairs. Unreliability is associated with failures of the system—the
failures can be associated with planned outages or unplanned outages.
Maintainability has true design characteristic. Attempts to improve the inherent maintainability of a product/item after the
design is frozen is usually expensive, inefficient, and ineffective as demonstrated so often in manufacturing plants when
the first maintenance effort requires the use of a cutting torch to access the item requiring replacement.
Poor maintainability results in equipment, which is unavailable, expensive for the cost of unreliability, and results in an
irritable state of conditions for all parties who touch the equipment or have responsibility for the equipment.
39
Availability: Availability refers to the probability that a system will be operative (up), and is expressed as:
Ai = MTBF/(MTBF+MTTR)
The above equation some times is called inherent availability equation. Inherent availability looks at availability from a
design perspective. In this equation:
MTBF = mean time between failures
MTTR = mean time to repair
Reliability and maintainability are considered complementary disciplines from the inherent availability equation. If mean
time between failure or mean time to failure (MTTF) is very large compared to the mean time to repair or mean time to
replace, then you will see high availability. Likewise if mean time to repair or replace is miniscule, then availability will be
high. As reliability decreases (i.e., MTTF becomes smaller), better maintainability (i.e., shorter MTTR) is needed to
achieve the same availability. Of course as reliability increases then maintainability is not so important to achieve the
same availability. Thus tradeoffs can be made between reliability and amenability to achieve the same availability and thus
the two disciplines must work hand-in-hand to achieve the objectives. Ai is the largest availability value you can observe if
you never had any system abuses.
The above quantitative definition of availability assumes a system model where all faults are immediately detected at the
time of their occurrence, and fault location and repair action are initiated immediately.
In the operational world we talk of the operational availability equation. Operational availability looks at availability by
collecting all of the abuses in a practical system
Ao = MTBM/(MTBM+MDT).
Where, MDT = mean down time.
MTBM = mean time between maintenance.
The mean time between maintenance includes all corrective and preventive actions (compared to MTBF which only
accounts for failures). The mean down time includes all time associated with the system being down for corrective
maintenance (CM) including delays (compared to MTTR which only addresses repair time) including self imposed
downtime for preventive maintenance (PM) although it is preferred to perform most PM actions while the equipment is
operating. Ao is a smaller availability number than Ai because of naturally occurring abuses when you shoot yourself in the
foot. The uptime and downtime concepts are explained in Figure 1 for constant values of availability. Figure 1 shows the
difficulty of increasing availability from 99% to 99.9% (increase MTBM by one order of magnitude or decrease MDT by one
order of magnitude) compared to improving availability from 85% to 90% (requires improving MTBM by less than ~½ order
of magnitude or decrease MDT by ~¾ order of magnitude).
40
Figure 1.21: Availability Relationships
Operational availability includes issues associated with: inherent design, availability of maintenance personnel, availability
of spare parts, maintenance policy, and a host of other non-design issues (whereas inherent availability addresses only the
inherent design)—in short, all the abuses! Testability, the subset of maintainability/supportability, enters strongly into the
MDT portion of the equation to clearly identify the status of an item so as to know if a fault exists and to determine if the
item is dead, alive, or deteriorated—these issues always affect affordability issues. Operational availability depends upon
operational maintainability which includes factors totally outside of the design environment such as insufficient number of
spare parts, slow procurement of equipment, poorly trained maintenance personnel, lack of proper tools and procedures to
perform the maintenance actions. Achieving excellent operational maintainability requires sound planning, engineering,
design, test, excellent manufacturing conformance, adequate support system [logistics] for spare parts, people, training,
etc to incorporate lessons learned from previous or similar equipment.
41