Report of Energy Efficient Desidn
Report of Energy Efficient Desidn
By
G.SHRICHARAN
(09B91D3812)
M.Tech (DECS)
I-year I-Semester
LIST of TABLES
1. SUMMARY
The distinctive features of our approach are the following: i) complete system-level
and component energy consumption estimates as well as battery lifetime estimates; ii)
ability to explore multiple architectural alternatives; and iii) easy estimation of the impact
of software changes both during and after the architectural exploration.
System model and the methodology for cycle-accurate simulation of energy
dissipation are presented. The simulation results of timing and energy dissipation using
the methodology presented are within 5% of the hardware measurements for the
Dhrystone test case. Hardware architecture trade-offs for Smart Badge’s real-time MPEG
video decode design are explored using cycle-accurate energy simulation. The profiling
support that developed is also discussed. A full software design example of MP3 audio
decoder for the Smart Badge that uses profiler is shown in the methodology (4).
3. RECENT WORK
As portable embedded systems have grown in importance in recent years, so has the
need for tools that enable energy consumption estimation for such systems.
Fig. 1. SmartBadge.
The system we use in this work to illustrate our methodology, the SmartBadge,
has an ARM processor. As a result, we implemented the energy models as extensions to
the cycle-accurate instruction-level simulator for the ARM processor family, called the
ARMmulator [1]. The ARMulator is normally used for functional and performance
validation. Fig. 2 shows the simulator architecture. The typical sequence of steps needed
to set up system simulation can be summarized as follows: 1) The designer provides a
simple functional model for each system component other than the processor; 2) The
functional modelis annotated with a cycle-accurate performance model; 3) Application
software (written in C) is cross-compiled and loaded in specified locations of the system
memory model; and 4) The simulator runs the code and the designer can analyze
execution using a cross-debugger or collecting statistics. A designer interested in using
our methodology would only need to additionallyprovide cycle-accurate energy models
for each component during step 2) of the simulation setup. Thus, the designer can
obtain power estimates with little incremental effort.
Two main classes of processor cycle types are processor active, where active
power is consumed, and processor idle, where idle power is consumed. The processor
idle state represents an off-chip memory request. The number of cycles that the processor
remains idle depends on L2 cache and memory model access times. L2 cache, when
present, is always accessed before the main memory and so is active on every memory
access request. On L2 cache miss, main memory is accessed. Memory model accounts for
energy spent during the memory access. The interconnect energy model calculates energy
consumed by the interconnect and pins based on the number of lines switched during the
cycle on the data and address busses.
The total energy consumed by the system per cycle is the sum of energies
The analysis of peak energy consumption and the fine tuning of the architectures
can be done by studying the energy consumption and the memory access patterns over a
period of time. Peak energy consumption can reach twice the average consumption, so
the thermal characteristics of the hardware design, the DC–DC converter, and the battery
have to be specified accordingly.
For best battery utilization, it is important to match the current consumption of the
embedded system to the discharge characteristic of the battery. On the other hand, the
more capacity battery has, the heavier and more expensive it will be.
The design exploration example presented in this section illustrates how the
methodology for cycle-accurate energy consumption simulation can be used to select and
fine-tune hardware configuration that gives the best tradeoff between performance and
energy consumption.The main limitation of the cycle-accurate energy simulator is that
the impact of code optimizations is not easily evaluated.
Profiling for energy and performance enables designers to identify those portions
of their source code that need to be further optimized in order to either decrease energy
consumption, increase performance, or both.
The profiler operates as follows. Source code is compiled using a compiler for a
target processor (e.g., application or operating system code). The output of the compiler
is the executable that the cycle-accurate simulator executes (represented in this figure as
assembly code that is input into the simulator) and a map of locations of each procedure
in the executable that a profiler uses to gather statistics (the map is correspondence of
assembly code blocks to procedures in “C” source code).
5.RESULTS
. A good example of profiler usage is shown in Table IV. The table shows a
portion of energy profile for MP3 audio decode. The first column gives the name of the
top procedure, followed by its children. The next column gives the total energy spent for
that procedure. For example, the total energy spent running the program (main) is 0.32
mWhr. The final column gives the amount of energy spent only in that particular
procedure. For example, under main it is clear that III hybrid and its descendants spend
the most energy, 0.0671 mWhr. Looking at the entry for III hybrid , it is easy to see that
the largest portion of energy is consumed by its child, inv_mdctl . Therefore, the
procedures to focus optimization on are inv_mdctl and SubBandSynthesis. Although in
this example we showed source code profile of total battery energy consumption, the
profiler can report energy consumption for any system component, such as SRAM or the
interconnect.
The profiler allows for fast and accurate evaluation of software and hardware
architectures. Most importantly, it gives good guidance to the designer during the design
process without requiring manual intervention. In addition, the profiler accounts for all
embedded system components, not just the processor and the L1 cache as most general-
purpose profilers do. In the next section we present a real design example that uses the
profiler to guide the implementation of the source code optimizations described earlier
for the MP3 audio decoder running on the SmartBadge.
TABLE 1
Sample Energy Profiling
TABLE 2
PROFILING FOR MP3 IMPLEMENTATIONS
Profiling results in Table 2 show that the algorithmic optimizations considerably
reduced the energy consumption of Sub- BandSynthesis function—it does not appear in
the top three functions, and in fact it is only 3.2% of the total energy consumption. The
final step is to combine the algorithmic changes with the data and instruction-level
changes, resulting in decrease of Sub- BandSynthesis fraction of energy consumption
to6% of total.
TABLE 3
ENERGY FOR MP3 IMPLEMENTATIONS
System and component energy consumptions are shown inTable 3 for different
revisions of source code optimization. Positive percentages show energy decrease with
respect to the original code.
TABLE 4
PERFORMANCE FOR MP3 IMPLEMENTATIONS
Table VII shows the same results but for performance measurements. Positive
percentages show performance increase. Although the energy savings of algorithmic
versus data and instruction-level optimizations as compared to original code are
comparable, the performance improvement of data and instruction-level optimizations is
significant. Note that the increase in energy consumption and the decrease in
performance of Flash is due to the increase in code size with the algorithmic change in
SubBandSynthesis procedure. The total improvement in system performance and energy
consumption more than makes up for the degradation of Flash performance and energy
consumption. Combined optimizations give real-time performance for MP3 audio decode
which is a primary constraint for this project. In addition, lower energy consumption
enables longer battery life. Note that faster implementation that is also more energy
efficient might imply higher power consumption, which can be an issue for thermal
design of the device. In the case presented in this paper, it was critical to get real-time
performance with longer battery lifetime. The average and peak power consumption
constraints are met with our final design.
6.CONCLUSION
We developed a methodology for cycle-accurate simulation of performance and energy
consumption in embedded systems. Accuracy, modularity, and ease of integration with
the instruction- level simulators widely used in industry make this methodology very
applicable to the embedded system hardware and software design exploration. Simulation
is found to be within 5% of the hardware measurements for Dhrystone benchmark. We
presentedMPEGvideo decoder embedded system design exploration as an example of
howour methodology can be used in practice to aid in the selection of the best hardware
configuration. We have also developed a tool for profiling energy consumption of
software in embedded systems. Profiling results enabled us to quickly and easily target
the redesign the MP3 audio decoder software. Our final MP3 audio decoder is fully
compliant with the MPEG standard and runs in real time with low energy consumption.
Using our design tools we have been able to increase performance by 92% while
decreasing energy consumption by 77%.
7.REFERENCES