High-Level Optimizations: Embedded System Optimization
High-Level Optimizations: Embedded System Optimization
High-Level Optimizations
Software optimization is the process of modifying a software system to make some aspects work more efficiently or use
fewer resources.
• Simple Loop Transformations
o Loop Permutations – These permutations swap the order of two loops to increase parallelism, improve
spatial locality, or enable other transformations.
o It may have a positive effect on the reuse of array elements in the cache since the next iteration of the
innermost loop will access an adjacent location in memory.
▪ Caches are normally organized such that adjacent locations can be accessed significantly faster
than locations that are further away from the previously accessed location in which it is
exploiting spatial locality.
o Loop Unrolling - is a standard transformation creating several instances of the loop body.
▪ The number of copies of the loop is called the unrolling factor. Unrolling factors larger than
two are possible. Unrolling reduces the loop overhead (fewer branches per execution), and
therefore typically improves the speed.
o Loop Fusion and Fission – implies the combining or separating of loop nests.
▪ There may be cases in which two separate loops can be merged, and there may be cases in
which a single loop is split into two.
▪ Some versions might lead to an improved cache behavior and increase the potential for
parallel computations within the loop body.
• Loop Tiling/Blocking – It implies the chunking of nested loops. In this optimization, the use of memory
hierarchies may be beneficial in which it includes caches and scratchpad memories. A significant reuse factor for
the information in those memories is required. Otherwise, the memory hierarchy cannot be exploited.
• Loop Splitting – Performing this loop splitting manually is very difficult and error-prone. There are published
algorithms is based on a sophisticated analysis of accesses to array elements in loops. Optimized solutions are
generated using genetic algorithms. Run-times can be reduced by loop splitting for various applications and
architectures.
• Array Folding – Combining different array elements into one to avoid resource mismanagement. At any time,
only a subset of array elements is needed.
o The maximum number of elements needed is called the address reference window. Each array is
allocated the maximum of the space it requires during the entire execution time.
• Processes are imperative programs with their own memory spaces. These programs cannot refer to each other’s
variables and do not exhibit the same difficulties as threads.
o To achieve concurrency, processes need to be able to communicate.
▪ Operating systems provide a variety of mechanisms such as creating shared memory spaces
leading to potential difficulties to multithreaded programming.
▪ A file system is simply a way to create a body of data that is persistent in the sense that it
outlives the process that creates it.
• One process can create data and write it to a file, and another process can read data
from the same file.
▪ Message Passing – One process creates a chunk of data, deposits it in a carefully controlled
section of memory that is shared, and then notifies other processes that the message is ready.
Those other processes can block waiting for the data to become ready.
• Semaphores are named after mechanical signals traditionally used on railroad tracks
to signal that a section of track has a train on it.
o It is possible to use a single section of track for trains to travel in both
directions (the semaphore implements mutex, preventing two trains from
simultaneously being on the same section of track).
• A preemptive scheduler may make a scheduling decision during the execution of a task, assigning a new task to
the same processor. That is, a task may be in the middle of executing when the scheduler decides to stop that
execution and begin the execution of another task.
o The interruption of the first task is called preemption.
• A non-preemptive scheduler always lets tasks run to completion before assigning another task to execute on
the same processor.
• A priority-based scheduler assumes each task is assigned a number called a priority, and the scheduler will
always choose to execute the task with the highest priority.
▪ A fixed priority is a priority that remains constant over all executions of a task.
▪ A dynamic priority is allowed to change during execution.
o A preemptive priority-based scheduler is a scheduler that supports the arrival of tasks and executes
the enabled task with the highest priority all the time.
o A non-preemptive priority-based scheduler is a scheduler that uses priorities to determine which task
to execute next after the current task execution completes, but never interrupts a task during execution
to schedule another task.
6. More and more embedded systems are used in mobile convergence applications. For example, they are key
platforms for web browsing, video streaming, and others. Such an amount of these makes their total power
consumption very high.
7. Data processing leads to power consumption. Now many corporate IT departments follow the main
tendency of green computing. It means that they try to reduce the environmental effect of their activity.
Efficient power management is very important for achieving these goals.
References:
Barkalov, A., Titarenko L. & Mazurkiewicz, M. (2019). Foundations of embedded systems. Springer International.
Barr. M, & Massa, A. (2006). Programming embedded systems (2nd ed.). Chap. 14 - optimization techniques. O'Reilly.
Cardoso, J., Coutinho, J., & Diniz, P. (2017). Embedded computing for high performance - Efficient mapping of
computations using customization, code transformations and compilation. Elsevier.
Colorado State University – Department of Computer Science. (n.d.). Legality of loop interchange [PDF]. Retrieved on
2021, July 9, from https://www.cs.colostate.edu/~mstrout/CS553Fall07/Slides/lecture23-looptransform.pdf
Darwish, T. & Bayoumi, M. (2005). The electrical engineering handbook (1st ed.). Academic Press.
Emertxe Information Technologies Pvt Ltd. (2014). Embedded C - optimization techniques [Slides]. SlideShare. Retrieved
on 2021, August 11, from https://www.slideshare.net/EmertxeSlides/embedded-c-optimization-techniques
Lee, E., Seshia, S. (2017). Introduction to embedded systems: a cyber-physical systems approach [2nd ed.]. MIT Press.
Lee, E., Ha, S. (1989). Scheduling strategies for multiprocessor real-time DSP. In Global Telecommunications Conference
(GLOBECOM). doi:10.1109/GLOCOM.1989.64160.
Marwedel, P. (2018). Embedded system design: Embedded systems, foundations of cyber-physical systems, and the
internet of things (3rd ed.). Springer International.
Pan, T., Zhu, Y. (2018). Designing embedded systems with Arduino: a fundamental technology for makers. Springer.
Peckol, J. (2019). Embedded systems – A contemporary design tool. (2nd ed.). Wiley.
Technische Universitt Hamburg. (n.d.). Optimization in embedded systems [Article]. Retrieved on 2021, August 11, from
https://www.tuhh.de/es/embedded-systems-design/teaching/seminars/optimization-in-embedded-systems.html