mastering-stm32June2016
mastering-stm32June2016
Carmine Noviello
This book is for sale at http://leanpub.com/mastering-stm32
This is a Leanpub book. Leanpub empowers authors and publishers with the Lean Publishing
process. Lean Publishing is the act of publishing an in-progress ebook using lightweight tools and
many iterations to get reader feedback, pivot until you have the right book and build traction once
you do.
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Why did I write the book? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Who is this book for? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
How to integrate this book? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
How is the book organized? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
About the author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
Errata and suggestions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Book support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
How to help the author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Copyright disclaimer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Credits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
I Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1. Introduction to STM32 MCU portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1 Introduction to ARM based processors . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Cortex and Cortex-M based processors . . . . . . . . . . . . . . . . . . . . . 4
1.1.1.1 Core registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.1.2 Memory Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.1.3 Bit-banding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.1.1.4 Thumb-2 and memory alignment . . . . . . . . . . . . . . . . . . . . 11
1.1.1.5 Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.1.1.6 Interrupts and exceptions handling . . . . . . . . . . . . . . . . . . . 14
1.1.1.7 SysTimer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.1.1.8 Power modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.1.1.9 CMSIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.1.1.10 Effective implementation of Cortex-M features in the STM32 portfolio 19
1.2 Introduction to STM32 microcontrollers . . . . . . . . . . . . . . . . . . . . . . . 20
1.2.1 Advantages of the STM32 portfolio…. . . . . . . . . . . . . . . . . . . . . . . 21
1.2.2 ….and its drawbacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3 A quick look to STM32 subfamilies . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.3.1 F0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.3.2 F1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
CONTENTS
1.3.3 F2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.3.4 F3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.3.5 F4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.3.6 F7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.3.7 L0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1.3.8 L1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.3.9 L4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.3.10 W and J STM32 MCUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.3.11 How to select the right MCU for you? . . . . . . . . . . . . . . . . . . . . . 37
1.4 The Nucleo development board . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3. Hello, Nucleo! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.1 Get in touch with the Eclipse IDE . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.2 Create a project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
3.3 Flashing the Nucleo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
3.3.1 Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
3.3.2 Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
3.3.3 Mac OSX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
3.4 Understanding the generated code . . . . . . . . . . . . . . . . . . . . . . . . . . 106
11.3.1.3 Using CubeMX to configure the source clock of a general purpose timer 332
11.3.2 Master/slave synchronization modes . . . . . . . . . . . . . . . . . . . . . . 333
11.3.2.1 Enable trigger-related interrupts . . . . . . . . . . . . . . . . . . . . . 338
11.3.2.2 Using CubeMX to configure the master/slave synchronization . . . . . 338
11.3.3 Generate timer-related events by software . . . . . . . . . . . . . . . . . . . 339
11.3.4 Counting modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
11.3.5 Input capture mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
11.3.5.1 Using CubeMX to configure the input capture mode . . . . . . . . . . 349
11.3.6 Output compare mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
11.3.6.1 Using CubeMX to configure the output compare mode . . . . . . . . . 355
11.3.7 Pulse-width generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
11.3.7.1 Generating a sinusoidal wave using PWM . . . . . . . . . . . . . . . 359
11.3.7.2 Using CubeMX to configure the PWM mode . . . . . . . . . . . . . . 364
11.3.8 One Pulse Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
11.3.8.1 Using CubeMX to configure the OPM mode . . . . . . . . . . . . . . 367
11.3.9 Encoder mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
11.3.9.1 Using CubeMX to configure the Encoder mode . . . . . . . . . . . . . 373
11.3.10Other features available in general purpose and advanced timers . . . . . . . 374
11.3.10.1 Hall sensor mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374
11.3.10.2 Combined three-phase PWM mode and other motor-control related
features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
11.3.10.3 Break input and locking of timer registers . . . . . . . . . . . . . . . . 375
11.3.10.4 Preloading of auto-reload register . . . . . . . . . . . . . . . . . . . . 375
11.3.11Debugging and timers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
11.4 SysTick timer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
11.4.1 Use another timer as system timebase source . . . . . . . . . . . . . . . . . . 378
11.5 A case study: how to precisely measure microseconds with STM32 MCUs . . . . 379
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531
A. Miscellaneous HAL functions and STM32 features . . . . . . . . . . . . . . . . . . . . . 532
Force MCU reset from the firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532
STM32 96-bit Unique CPU ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532
Nucleo-F303RE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545
Arduino compatible headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545
Morpho headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545
Nucleo-F302R8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546
Arduino compatible headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546
Morpho headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546
Nucleo-F103RB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547
Arduino compatible headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547
Morpho headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547
Nucleo-F091RC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548
Arduino compatible headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548
Morpho headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548
Nucleo-F072RB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549
Arduino compatible headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549
Morpho headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549
Nucleo-F070RB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 550
Arduino compatible headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 550
Morpho headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 550
Nucleo-F030R8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551
Arduino compatible headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551
Morpho headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551
Nucleo-L476RG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552
Arduino compatible headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552
Morpho headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552
Nucleo-L152RE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553
Arduino compatible headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553
Morpho headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553
Nucleo-L073R8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554
Arduino compatible headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554
Morpho headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554
Nucleo-L053R8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555
Arduino compatible headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555
Morpho headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555
i
Preface ii
strange. Compared to the software, the hardware has a really greater longevity. For example, all
STM32 MCU have a guaranteed longevity of ten years starting from January 2015 (ST is still updating
this “starting date” every year until now). This means that a book on this aspect may have a potential
longevity of ten years, and this is really uncommon in computer science. Apart from some really
important titles, most of technical books do not exceed two years of life.
I think that there are several reasons why this happens. First of all, in the electronics industry know
how is still a great value to protect. Different from the software world, the hardware requires years
of field experience. Every mistake has a cost, and it is highly dependent on the product stage (if
the device is already on the market, an issue may have a dramatic cost). So, electronics engineers
and firmware developers tend “to protect” their know how, and this may be one of the reasons that
discourages writing books on these topics from really experienced users.
I think that another reason is that if you want to write a book about a MCU, you must be able to
range from aspects of electronics to more high-level programming topics. This requires a lot of time
and effort, and it is really hard especially when things change day by day (during the time of writing
the first few chapters of this book, ST released more then ten versions of its HAL). In the electronics
industry, hardware engineers and firmware developers are traditionally two different figures, and
sometime they do not know what the other is doing.
Finally, another important reason is that the electronic design is a sort of niche if compared to
the software world (the comparison between the number of software programmers in general and
electronics designers is unequal), and the STM32 is in turn a niche in the niche. I think that is also
really hard to find a publisher willing to publish a book covering these topics.
For these and other minor reasons, I decided to write this book using a self-publishing platform like
LeanPub that allows you to build a book progressively. I think that the idea behind LeanPub is perfect
for books about niches, and it gives authors time and tools to write about as much complex topics as
they want. I do not know if this book will be a good reference for those interested in programming
with the STM32 platform or it will remain only a small pamphlet dedicated on how to setup a tool-
chain to develop STM32 applications. What I know is that I have all the time to write a topic I like,
without the coercion of a publisher that would require me to complete all in a short time.
how to use the HAL_DMA module both in polling and interrupt mode. Finally, a performance analysis
of memory-to-memory transfers is presented.
Chapter 10 introduces the clock tree of an STM32 microcontroller, showing main functional blocks
and how to configure them using the HAL_RCC module. Moreover, the CubeMX Clock configuration
view is presented, explaining how to change its settings to generate the right clock configuration.
Chapter 11 is a walkthrough into timers, one of the most advanced and really high-customizable
peripheral implemented in all STM32 microcontrollers. The chapter will guide the reader step-by-
step into this subject, introducing the most fundamental concepts of basic, general purpose and
advanced timers. Moreover, several advanced usage modes (master/slave, external trigger, input
capture, output compare, PWM, etc.) are illustrated with practical examples.
Chapter 12 introduces the reader to the power management capabilities offered by STM32F
and STM32L microcontrollers. It starts showing how Cortex-M cores handle low-power modes,
introducing WFI and WFE instructions. Then it explains how these modes are implemented in STM32
MCUs. The corresponding HAL_PWR module is also described.
Chapter 13 analizes the activities involved during the compilation and linking processes, which
define the memory layout of an STM32 application. A really bare-bone application is shown, and a
complete and working linker script is designed from scratch, showing how to organize the STM32
memories. Moreover, the usage of CCM RAM is presented, as well as other important Cortex-M
functionalities like the vector table relocation.
Chapter 14 is dedicated to the FreeRTOS Real-Time Operating System. It introduces the reader to
the most relevant concepts underlying an RTOS and shows how to use the main FreeRTOS func-
tionalities (like threads, semaphores, mutexes, and so on) using the CMSIS-RTOS layer developed
by ST on the top of the FreeRTOS API. Moreover, some advanced techniques, like the tickless mode
in low-power design, are shown.
Chapter 15 shows how to start a new custom PCB design using an STM32 MCU. This chapter is
mainly focused on hardware related aspects, like decoupling, signal routing techniques and so on.
Moreover, it shows how to use CubeMX during the PCB design process and how to generate the
application skeleton when the board design is complete.
During the book you will find some horizontal rulers with “badges”, like the above one. This means
that the instructions in that part of the book are specific for a given family of STM32 microcontrollers.
Sometimes, you could find a badge with a specific MCU type: this means that instructions are related
to only that MCU. A black horizontal ruler (like the one below) closes the specific section. This means
that the text returns to be general for the whole STM32 platform.
You will also find several asides, each one starting with an icon on the left. Let us explain them.
Preface vi
This a warning box. The text contained explains important aspects or gives important
instructions. It is strongly recommended to read the text carefully, and follow the instruc-
tions.
This is an information box. The text contained clarifies some concepts introduced before.
This is a tip box. It contains suggestions to the reader that could simplify the learning
process.
This a discussion box, and it is used to talk about the subject in a broader way.
This a bug-related box, used to report some specific and un-resolved bug (both hardware
and software).
I found myself catapulted into a world I had always considered obscure: electronics. I started
first developing firmware on low-cost MCUs, then designing custom PCBs. Meanwhile I founded
a company together with other crazy colleagues. The company is called AirQ Networks and it
produces wireless sensors and control boards used for small automations. In 2013 I was introduced
to the STM32 world during a presentation day at the ST headquarters in Naples. Since then, I have
successfully used STM32 in several products I have designed.
Book support
I have setup a small forum on my personal website as support site for the topics presented in
this book. For any question, please subscribe here: http://www.carminenoviello.com/en/mastering-
stm32/⁴.
It is impossible for me to answer questions sent privately by e-mail, since they are often
variants of the same themes. I hope you understand.
• give me feedback about unclear things or errors contained both in the text and examples;
• write a small review about what you think⁵ of this book in the feedback section⁶.
• use your favorite social network to spread the word. The suggested hashtag for this book on
Twitter is #MasteringSTM32⁷.
Copyright disclaimer
This book contains references to several products and technologies whose copyright is owned by
other companies, organizations or individuals.
ARTTM Accelerator, STM32, ST-LINK, STM32Cube and the STM32 logo with the white butterfly on
the cover of this book are copyright ©ST Microelectronics NV.
ARM, Cortex, Cortex-M, CoreSight, CoreLink, Thumb, Thumb-2, AMBA, AHB, APB, Keil are
registered trademarks of ARM Holdings.
GCC, GDB and other tools from the GNU Collection Compilers mentioned in this book are copyright
©Free Software Foundation.
Eclipse is copyright of the Eclipse community and all its contributors.
During the rest of the book, I will mention the copyright of the tools and libraries I will introduce.
If I have forgot to attribute copyrights for products and software used in this book, and you think I
should add them here, please e-mail me through the LeanPub platform.
Credits
The cover of this book was designed by Alessandro Migliorato (AleMiglio⁸)
1
1. Introduction to STM32 MCU
portfolio
This chapter gives a brief introduction to the whole STM32 portfolio. The aim is to introduce the
reader to this quite complex family of microcontrollers, subdivided in 9 different sub-families, each
one with its main characteristics common to all members and some other ones specific to a given
series. Moreover, a quick introduction to the Cortex-M architecture is given. This chapter will not
be a complete reference either for Cortex-M architecture or STM32 microcontrollers. The goal is to
guide the readers in choosing the microcontroller that best suits their development requirements,
given that, with more than 500 MCUs to choose from, it is not easy to decide which one fits the
needs.
2
Introduction to STM32 MCU portfolio 3
Figure 1: the relation between a Cortex-M core and a Cortex based MCU
ARM Holdings is a British company that develops the instruction set and architecture for ARM-
based products, but does not manufacture devices. This is a really important aspect of the ARM
world, and this is the reason why there are a lot of silicon manufacturers that develop, produce
and sell microcontrollers based on the ARM architectures and cores. ST Microelectronics is one of
these, and it is currently the only manufacturer that sells a complete portfolio of Cortex-M based
processors¹.
ARM Holdings neither manufactures nor sells CPU devices based on its own designs, but rather
licenses the processor architecture to interested parties. ARM offers a variety of licensing terms,
varying in cost and deliverables. When referring to Cortex-M cores, it is also common to talk about
Intellectual Property (IP) cores, that is a chip layout design that is considered the intellectual property
of one party, namely ARM Holdings.
Thanks to this business model, and thanks to really interesting features like low power capabilities,
low production costs of some architectures and so on, ARM is the most widely used instruction set
architecture in terms of quantity. ARM based products have become very popular. More than 50
billion ARM processors have been produced as of 2014, of which 10 billion were produced in 2013.
ARM based processors equip about 75 percent of the world’s mobile devices. A lot of mainstream
and popular 64-bit and multicores CPUs, used in devices that have become icons in the electronic
¹At the time of writing this chapter, october 2015, ST Microelectronics is the first manufacturer in the market that provides
Cortex-M7 based processors, the most advanced and performing Cortex-M core architecture introduced by ARM in 2014, which is
based on ARMv7E-M instructions architecture.
Introduction to STM32 MCU portfolio 4
industry (for example, iPhone phones from Apple), are based on an ARM architecture (ARMv8-A).
Being a sort of widespread standard, there are a lot of compilers and tools, as well as Operating
Systems (Linux is the most used OS on Cortex-A processors) which support these architectures,
offering developers many opportunities to build their applications.
The next sections will introduce the main features of Cortex-M processors, especially from the
embedded developer point of view.
Like all RISC architectures, Cortex-M processors are load/store machines, which perform operations
only on CPU registers except² for two categories of instructions: load and store, used to transfer
²This is not entirely true, since there are other instructions available in the ARMv6/7 architecture that access memory locations,
but for the purpose of this discussion it is best to consider that sentence to be true.
Introduction to STM32 MCU portfolio 5
For example, consider the following C code using the local variables “a”, “b”, “c”:
...
uint8_t a,b,c;
a = 3;
b = 2;
c = a * b;
...
³That assembly code was generated compiling in thumb mode with any optimization disabled, invoking GCC in the following
way: $ arm-none-eabi-gcc -mcpu=cortex-m4 -mthumb -fverbose-asm -save-temps -O0 -g -c file.c
Introduction to STM32 MCU portfolio 6
As we can see, all the operations always involve a register. Instructions at lines 1-2 move the number
3 into the register r3 and then store its content (that is, the number 3) inside the memory location
given by the register r7 (which is the frame pointer, as we will see in a following chapter) plus an
offset of 7 memory locations - that is the place where a variable is stored. The same happens for
the variable b at lines 3-4. Then, lines 5-7 load the content of variables a and b and perform the
multiplication. Finally, the line 8 stores the result in the memory location of variable c.
ARM defines a standardized memory address space common to all Cortex-M cores, which enforce
code portability among different silicon manufacturer. The address space is 4GB wide, and it is
Introduction to STM32 MCU portfolio 7
organized in several sub-regions with different logical functionalities. Figure 3 shows the memory
layout of a Cortex-M processor ⁴.
The first 512MB are dedicated to code area. STM32 devices further divide this area in some sub-
regions as shown in Figure 4. Let us briefly introduce them.
All Cortex-M processors map the code area starting from the 0x0000 0000 address⁵. This area also
includes the pointer to the beginning of the stack (usually placed in SRAM) and the vector table,
as we will see in Chapter 7. The position of the code area is standardized among all other Cortex-
M vendors, even if the core architecture is sufficiently flexible to allow manufacturers to arrange
this area in a different way. In fact, for all STM32 devices an area starting from the 0x0800 0000
address is bound to the internal MCU FLASH memory, and it is the area where program code resides.
However, thanks a to a specific boot configuration we will explore in a subsequent chapter, this area
is also aliased from the 0x0000 0000 address. This means that it is perfectly possible to refer to the
content of the FLASH memory both starting from the 0x0800 0000 address and the 0x0000 0000
one (for example, a routine placed at the address 0x0800 16dc can be also accessed from the 0x0000
16dc address).
⁴Although the memory layout and the size of sub-regions (and therefore also their addresses) are standardized between all
Cortex-M cores, some functionalities may differ. For example, Cortex-M7 does not provide bit-band regions, and some peripherals
in the Private Peripheral Bus region differ. Always consult the reference manual for the architecture you are considering.
⁵To increase readability, all 32-bit addresses in this book are written splitting the upper two bytes from the lower ones. So,
every time you see an address expressed in this way (0x0000 0000) you have to interpret it just as one common 32-bit address
(0x00000000). This rule does not apply to C and assembly source code.
Introduction to STM32 MCU portfolio 8
The last two sections are dedicated to System memory and Option bytes. The first one is a ROM
region reserved to boot loaders. Each STM32 family (and their sub-families - low density, medium
density, and so on) provides a variable number of boot loaders pre-programmed into the chip during
its production. As we will see in a subsequent chapter, these boot loaders can be used to load code
from several peripheral, including USARTs, USB and CAN Bus. The Option bytes region contains
a series of bit flags that can be used to configure several aspects of the MCU (like FLASH read
protection, and so on), and they are related to the specific STM32 microcontroller.
Going back to the whole 4GB address space, the next main region is the one bounded to the internal
MCU SRAM. It starts from the 0x2000 0000 address and can potentially last up to 0x3FFF FFFF.
However, the actual end address depends on the effective amount of internal SRAM. For example,
assuming an STM32F103RB MCU with 20KB of SRAM, we have that the final address is 0x2000
13FF⁶. Trying to access to a location outside of this area will cause a Bus Fault exception (more
about this later).
The next 0.5GB of memory are dedicated to the mapping of peripherals. Every peripheral provided
by the MCU (timers, I2C and SPI interfaces, USARTs, and so on) is aliased to this region. It is up to
the specific MCU to organize this memory space.
The next 2GB areas are dedicated to external SRAM and flashes. Cortex-M devices can execute code
and load/store data from external memories, which extend the internal memory resources, through
the EMI/FSMC interface. Some STM32 devices, like the STM32F7, are able to execute code from
external memories without performance penalty, thanks to an L1 cache and the ARTTM Accelerator.
The final 0.5 GB of memory are allocated to the internal (core) Cortex processor peripherals, plus
to a reserved area for future enhancements to Cortex processors. All Cortex processor registers are
at fixed locations for all Cortex-based microcontrollers. This allows code to be more easily ported
between different STM32 variants and indeed other vendors’ Cortex-based microcontrollers.
1.1.1.3 Bit-banding
In embedded applications, it is quite common to work with single bits of a word using bit masking.
For example, suppose that we want to set or clear the 3rd bit (bit 2) of an unsigned byte. We can
simply do this using the following C code:
...
uint8_t temp = 0;
temp |= 0x4;
temp &= ~0x4;
...
⁶The final address is computed in the following way: 20K is equal to 20 * 1024 bytes, that is 20 * 1024 * 8 bits = 163.840 bits.
However, Cortex-M are 32-bits machines, and every address contains 32 bits, so 163.840/32 = 5120 memory locations of 32-bits,
which in base 16 is 0x1400. But addresses starts from 0, hence the final address is 0x2000 0000 + 0x13FF.
Introduction to STM32 MCU portfolio 9
Bit masking is used when we want to save space in memory (using one single variable and assigning a
different meaning to each of its bits) or we have to deal with internal MCU registries and peripherals.
Considering the previous C code, we can see that the compiler will generate the following ARM
assembly code⁷:
#temp |= 0x4;
a: 79fb ldrb r3, [r7, #7]
c: f043 0304 orr.w r3, r3, #4
10: 71fb strb r3, [r7, #7]
#temp &= ~0x4;
12: 79fb ldrb r3, [r7, #7]
14: f023 0304 bic.w r3, r3, #4
18: 71fb strb r3, [r7, #7]
As we can see, such a simple operation requires three assembly instructions (fetch, modify, save).
This leads to two types of problems. First of all, there is a waste of CPU cycle related to those three
instructions. Second, that code works fine if CPU is working in single task mode, and we have just
one execution stream. But, if we are dealing with concurrent execution, another task (or simply an
interrupt routine) may affect the content of the memory before we complete the “bit mask” operation
(that is, for example, an interrupt occurs between instructions at lines 0xC-0x10 or 0x14-0x18 in the
above assembly code).
Bit-banding is the ability to map each bit of a given area of memory to a whole word in the aliased
bit-banding memory region, allowing atomic access to this bit. Figure 5 shows how the Cortex CPU
aliases the content of memory address 0x2000 0000 to the bit-banding region 0x2200 0000-1c. For
example, if we want to modify bit 3 of 0x2000 0000 memory location we can simply access to 0x2200
0008 memory location.
Figure 5: memory mapping of SRAM address 0x2000 0000 in bit-banding region (first 8 of 32 bits shown)
⁷That assembly code was generated compiling in thumb mode with any optimization disabled, invoking GCC in the following
way: $ arm-none-eabi-gcc -mcpu=cortex-m4 -mthumb -fverbose-asm -save-temps -O0 -g -c file.c
Introduction to STM32 MCU portfolio 10
ARM defines two bit-band regions for Cortex-M based MCUs⁸, each one is 1MB wide and mapped
to a 32Mbit bit-band alias region. Each consecutive 32-bit word in the “alias” memory region refers
to each consecutive bit in the “bit-band” region (which explains that size relationship: 1Mbit <->
32Mbit). The first one starts at 0x2000 0000 and ends at 0x200F FFFF, and it is aliased from 0x2200
0000 to 0x23FF FFFF. It is dedicated to the bit access of SRAM memory locations. Another bit-
banding region starts at 0x4000 0000 and ends at 0x400F FFFF, as shown in Figure 6.
This other region is dedicated to the memory mapping of peripherals. For example, ST maps the
GPIO Output Data Register (GPIO->ODR) of GPIOA peripheral from 0x4002 0014. This means that
each bit of the word addressed at 0x4002 0014 allows modifying the output state of a GPIO (from
LOW to HIGH and vice versa). So if we want modify the status of PIN5 of GPIOA port⁹, using the
previous formula we have:
alias_region_base = 0x42000000
region_base_offset = 0x40020014 - 0x40000000 = 0x20014
bit_band_address = 0x42000000 + 0x20014*32 + (0x5 x 0x4) = 0x42400294
We can define two macros in C that allow to easily compute bit-band alias addresses:
Still using the above example, we can modify quickly the state of PIN5 of GPIOA port in the
following way;
Historically, ARM processors provide a 32-bit instructions set. This not only allows to have a rich set
of instructions, but also guarantees the best performance during the execution of instructions that
involves arithmetic operations and memory transfers between core registers and SRAM. However,
a 32-bit instruction set has a cost in term of memory footprint of the firmware. This means that a
program written with a 32-bit Instruction Set Architecture (ISA) requires a higher number of bytes
of flash storage, which impacts on power consumption and overall costs of the MCU (silicon wafers
cost, and manufacturers constantly shrink chips size to reduce their cost).
To address these issues, ARM introduced the Thumb 16-bit instruction set, which is a subset of the
most commonly used 32-bit ARM instructions. Thumb instructions are each 16 bits long, and are
automatically “translated” to the corresponding 32-bit ARM instruction that has the same effect on
the processor model. This means that 16-bit Thumb instructions are transparently expanded (from
the developer point of view) to full 32-bit ARM instructions in real time, without performance loss.
Thumb code is typically 65% of the size of ARM code, and provides 160% of the performance of ARM
Introduction to STM32 MCU portfolio 12
code when running from a 16-bit memory system. However, in Thumb, the 16-bit opcodes have less
functionality. For example, only branches can be conditional, and many opcodes are restricted to
accessing only half of all of the CPU’s general-purpose registers.
Subsequently, ARM introduced the Thumb-2 instruction set, which is a mix of 16-bit and 32-bit
instruction sets in one operation state. Thumb-2 is a variable length instruction set, and offers a lot
more instructions compared to the Thumb instruction set, achieving similar code density.
Cortex-M3/4/7 where designed to support the full Thumb and Thumb-2 instructions sets, and some
of them support other instruction sets dedicated to Floating Point operations (Cortex-M4/7) and
SIMD operations (also known as NEON instructions).
Another interesting feature of Cortex-M3/4/7 cores is the ability to do unaligned access to memory.
ARM based CPU are traditionally capable of accessing byte (8-bit), half word (16-bit) and word
(32-bit) signed and unsigned variables, without increasing the number of assembly instructions as it
happens on 8-bit MCU architectures. However, early ARM architectures was unable to do unaligned
memory access, causing a waste of memory locations.
To understand the problem, consider the left diagram in Figure 7. Here we have eight variables.
With memory aligned access we mean that to access to word variables (variable 1 and 4 in the
diagram), we need to access addresses that are multiple of 32-bits (4 bytes). That is, a word variable
can be stored only in 0x2000 0000, 0x2000 0004, 0x2000 0008 and so on. Every attempt to access
to a location that is not a multiple of 4 causes an UsageFaults exception. So, the following ARM
pseudo-instruction is not correct:
STR R2, 0x20000002
The same applies for half word access: it is possible to access to memory location stored at multiple
of 2 bytes: 0x2000 0000, 0x2000 0002, 0x2000 0004 and so on. This limitation causes fragmentation
inside the RAM memory. To solve this issue, Cortex-M3/4/7 based MCUs are able to perform
unaligned memory access, as shown in the right diagram in Figure 7. As we can see, variable 4
is stored starting from 0x2000 0007 address (which in early ARM architectures was possible only
for single byte variables). This allows to store variable 5 in memory location 0x2000 000b, causing
the variable 8 to be stored in 0x2000 000e. Memory is now packed, and we have saved 4 bytes of
SRAM.
Introduction to STM32 MCU portfolio 13
• LDR, LDRT
• LDRH, LDRHT
• LDRSH, LDRSHT
• STR, STRT
• STRH, STRHT
1.1.1.5 Pipeline
Whenever we talk about instructions execution we are assuming a series of non-trivial details. Before
an instruction is executed, the CPU has to fetch it from memory and decode it. This procedure
consumes a number of CPU cycles, depending on the memory and core CPU architecture, and it is
a cost separated from the effective instruction cost (that is, the number of cycles required to execute
the given instruction).
Modern CPUs introduce a way to parallelize these operations to increase their instruction through-
put (the number of instructions that can be executed in an unit of time). The basic instruction cycle is
broken up into a series of steps, as if the instructions traveled along a pipeline. Rather than processing
each instruction sequentially (one at a time, finishing one instruction before starting the next), each
instruction is split up into a sequence of stages so that different steps can be executed in parallel.
All Cortex-M based microcontrollers introduce a form of pipelining. The most common one is the
3-stage pipeline, as shown in Figure 8. 3-stage pipeline is supported by Cortex-M0/3/4. Cortex-M0+
cores, which are dedicated to low-power MCUs, provide a 2-stage pipeline (even if pipelining allows
to reduce the time cost related to the instruction fetch/decode/execution, it introduces an energy cost
which has to be minimized in low-power applications). Cortex-M7 cores provide a 6-stage pipeline.
When dealing with pipelines, branching is an issue to be addressed. Program execution is all about
taking different ways, and this is achieved through branching (if equal goto). Unfortunately,
branching causes the invalidation of pipeline stream, as shown in Figure 9. The last two instructions
have been loaded into the pipeline but they are discarded because the branch is taken (we usually
refer to them as branch shadow)
Introduction to STM32 MCU portfolio 14
Even in this case there are several techniques to minimize the impact of branching. They are often
referred as branching prediction techniques. The ideas behind these techniques is that the CPU starts
fetching and decoding both the instructions following the branching one and the instruction that
would be reached if the branch happen (in Figure 9 both MOV and ADD instructions). However, there
are other ways to implement a branch prediction scheme. If you want to look deeper into this matter,
this post¹⁰ from official ARM support forum is a good starting point.
Interrupts and exception management is one of the most powerful features of Cortex-M based
processors. Interrupts and exceptions are asynchronous events that alter the program flow. When an
exception or an interrupt occurs, the CPU suspends the execution of current task, saves its context
(that is, its stack pointer) and starts the execution of a routine designed to handle the interrupting
event. This routine is called Exception Handler in case of exceptions and Interrupt Service Routine
(ISR) in case of an interrupt. After the exception or interrupt has been handled, the CPU resumes
the previous execution flow, and the previous task can continue its execution¹¹.
In the ARM architecture, interrupts are one type of exception. Interrupts are usually generated from
on-chip peripherals (e.g. a timer) or external inputs (e.g. a tactile switch connected to a GPIO), and in
some cases they can be triggered by software. Exceptions are, instead, related to software execution,
and the CPU itself can be a source of exceptions. These could be fault events (e.g. an attempt to
access an invalid memory location) or events generated by the Operating System, if any.
Each exception (and, hence, interrupt) has an exception number which uniquely identifies it. Table
1 shows the fixed exceptions common to all Cortex-M cores, plus a variable number of user-defined
exceptions related to interrupts management. The exception number reflects the position of the
exception handler routine inside the vector table, where the effective address of the routine is stored.
For example, position 15 contains the memory address of code area containing the exception handler
of SysTick interrupt, generated when the SysTick timer reaches zero.
¹⁰http://bit.ly/1k7ggh6
¹¹With the term task we refer to a series of instructions which constitute the main flow of execution. If our firmware is based on
an OS, the scenario could be a little bit more articulated. Moreover, in case of low-power sleep mode, the CPU may be configured
to going back to sleep after an interrupt management routine is executed.
Introduction to STM32 MCU portfolio 15
Except for the first three exceptions, it is possible to assign to each exception a priority level, which
defines the processing order of exceptions in case of concurrent interrupts. The lower the priority
number is, the higher the priority is. For example, suppose that we have two interrupt routines
related to external inputs, A and B. We can assign a higher-priority interrupt (lower number) to the
A input. If interrupt related to A arrives when the processor is serving the interrupt from B input,
the execution of B is suspended, allowing the higher priority interrupt service routine to be executed
immediately.
Both exceptions and interrupts are processed by a dedicated unit called Nested Vectored Interrupt
Controller (NVIC). The NVIC has the following features:
• Flexible exception and interrupt management: NVIC is able to process both interrupt sig-
nals/requests coming from peripherals and exceptions coming from processor core, allowing
Introduction to STM32 MCU portfolio 16
1.1.1.7 SysTimer
Cortex-M based processors can optionally provide a System Timer, also known as SysTick. And the
good news is that all STM32 devices will provide one, as shown in Table 3.
SysTick is a 24-bit down-counting timer used to provide a system tick for Real Time Operating
Systems (RTOS), like FreeRTOS. It is used to generate periodic interrupts to scheduled tasks.
Programmers can define the update frequency of SysTick timer, setting its registers. SysTick timer
is also used by the STM32 HAL to generate precise delays, even if we aren’t using an RTOS. More
about this timer in Chapter 11.
The actual trends in electronics industry, especially for mobile devices design, is all about power
management. Reducing the power to minimum level is the main goal of all hardware designers and
programmers involved in the development of battery-powered devices. Cortex-M processors provide
several levels of power management, which can be divided into two main groups: intrinsic features
and user-defined power modes.
With the term intrinsic features we refer to those native capabilities related to power consumption
defined during the design of both the Cortex-M core and the whole MCU. For example, Cortex-
M0+ cores define only two pipeline stages, to reduce the power consumption related to instructions
prefetch. Another native behavior connected to power management is the high code density of
¹²Also the Reset exception cannot be disabled, even if it is improper to talk about the Reset exception disabling, since it is the
first exception generated after the MCU resets. As we will see in Chapter 7, the Reset exception is the actual entry point of every
STM32 application.
Introduction to STM32 MCU portfolio 17
Thumb-2 instruction set, which allows developers to choose MCUs with smaller flash memory to
reduce power.
Traditionally, Cortex-M processors provide user programmable power modes, through the control
of the System Control Register(SCR). The first mode is the Run one (see Figure 10), that is the CPU
running at its full capabilities. In Run mode the power consumption depends on the clock frequency
and used peripherals.
Sleep mode is the first low-power mode available to reduce power consumption. In this mode most
of the functionalities are suspended, and CPU frequency is lowered and its activities are reduced to
those that are necessary for it to wake up.
In Deep sleep mode all clock signals are stopped, and the CPU needs an external event to wake up
from this mode.
However, these power modes are only schemes, which are further implemented in the actual MCU.
For example, consider Figure 11 that shows the power consumption of an STM32F2 MCU when
running at 80MHZ @30°C¹³. As we can see, the maximum power consumption is reached in the
Run-mode (that is, the Active mode) with the ARTTM accelerator disabled. Enabling the ARTTM
accelerator we can save up to 10mAh, achieving a better computing performance too. This clearly
shows that the real MCU implementation can introduce different power levels.
¹³Source ST AN3430
Introduction to STM32 MCU portfolio 18
STM32Lx families provide several further intermediate power levels, allowing to precisely select the
preferred power mode and hence MCU performance and power consumption.
We will go in more depth about this topic in a subsequent chapter.
1.1.1.9 CMSIS
One of the key advantages of the ARM platform (both for silicon vendors and application developers)
is the existence of a complete set of development tools (compilers, run-time libraries, debuggers, and
so on) that are reusable among several vendors.
ARM is also actively working on a way to standardize the software infrastructures among MCUs
vendors. Cortex Microcontroller Software Interface Standard (CMSIS) is a vendor-independent
hardware abstraction layer for the Cortex-M processor series and specifies debugger interfaces. The
CMSIS consists of the following components:
• CMSIS-CORE: API for the Cortex-M processor core and peripherals. It provides a standard-
ized interface for Cortex-M0/3/4/7.
• CMSIS-Driver: defines generic peripheral driver interfaces for middleware making them
reusable across supported devices. The API is RTOS independent and connects microcontroller
peripherals with middleware that implements, for example, communication stacks, file
systems, or graphical user interfaces.
• CMSIS-DSP: DSP Library Collection with over 60 Functions for various data types: fixed-point
(fractional q7, q15, q31) and single precision floating-point (32-bit). The library is available for
Cortex-M0, Cortex-M3, and Cortex-M4. The Cortex-M4 implementation is optimized for the
SIMD instruction set.
Introduction to STM32 MCU portfolio 19
• CMSIS-RTOS API: Common API for Real-Time Operating Systems. It provides a standardized
programming interface that is portable to many RTOS and enables therefore software
templates, middleware, libraries, and other components that can work across supported RTOS
systems. We will talk about this API layer in a following chapter.
• CMSIS-Pack: describes with a XML based package description file (named PDSC) the user and
device relevant parts of a file collection (called software pack) that includes source, header,
and library files, documentation, Flash programming algorithms, source code templates, and
example projects. Development tools and web infrastructures use the PDSC file to extract
device parameters, software components, and evaluation board configurations.
• CMSIS-SVD: System View Description(SVD) for Peripherals. Describes the peripherals of a
device in an XML file and can be used to create peripheral awareness in debuggers or header
files with peripheral registers and interrupt definitions.
• CMSIS-DAP: Debug Access Port. Standardized firmwares for a Debug Unit that connects to
the CoreSight Debug Access Port. CMSIS-DAP is distributed as separate package and well
suited for integration on evaluation boards.
However, this initiative from ARM is still evolving, and the support to all components from ST is
still really bare-bone. The official ST HAL is the main way to develop applications for the STM32
platform, which presents a lot of peculiarities between MCUs of different families. Moreover, it is
quite clear that the main objective of silicon vendors is to retain their customers, and to avoid their
migration to other MCUs platform (even if it is based on the same ARM Cortex core). So, we are
really far from having a complete and portable layer that works on all ARM based MCUs available
on the market. and may not be available in a given MCU
Some of the features presented in the previous paragraphs are optional and may not be available in
a given MCU. Tables 2 and 3 summarize the Cortex-M instructions and components available in
the STM32 Portfolio. These could be useful during the selection process of an STM32 MCU.
The remaining paragraphs in this chapter will introduce the reader to STM32 microcontrollers,
giving a complete overview of all STM32 subfamilies.
• They are Cortex-M based MCUs: this could be still not clear for those of you that are novice
to this platform. Being Cortex-M based microcontrollers ensures that you have several tools
available on the market to develop your applications. ARM has became a sort of standard in
the embedded world (we have to say that this is especially true for Cortex-A processors; in the
Cortex-M market segment there are still several good alternatives - PIC, MSP430, etc.) and 50
billions of devices sold at 2014 is a strong guarantee that investing on this platform is a good
deal.
• Free ARM based tool-chain: thanks to the diffusion of ARM based processors, it is possible
to work with completely free tool-chains, without investing a lot of money to start working
with this platform. And this is extremely important if you are a hobbyist or a student.
Introduction to STM32 MCU portfolio 22
• Learning curve: STM32 learning curve can be very steep, especially for inexperienced users.
If you are totally new to the embedded development, the process of learning how to develop
STM32 applications can be really frustrating. Even if ST is doing a great job trying to improve
the overall documentation and the official libraries, it is still hard to deal with this platform.
And this is a shame. Historically, ST documentation is not the best one for inexperienced
people, being too much cryptic and lacking clear examples.
• Lack of official tools: this book will guide the reader in the process of setting a full tool-
chain for the STM32 platform. But the fact that ST does not provide its official development
environment (like, for example, Microchip does for its MCUs) causes that a lot of people to
simply stay away from this platform. And this is a strategic thing that people from ST should
seriously take into consideration.
• Fragmented and dispersive documentation: ST is actively working on improving its official
documentation about the STM32 platform. You can find a lot of really huge datasheets on ST’s
Introduction to STM32 MCU portfolio 23
website, but there is still a lack of good documentation especially for its HAL. Recent versions
of the CubeHAL provides one or more “CHM” files¹⁵, which are automatically generated from
the documentation inside the CubeHAL source code. However, those files are not sufficient
to start programming with this framework, especially if you are new to the STM32 ecosystem
and the Cortex-M world.
• Buggy HAL: unfortunately, the official HAL from ST contains several bugs, and some of them
are really severe and lead to confusion to novices. For example, during the development
of this book I have found errors in several linker scripts¹⁶, which are supposed to be the
foundation blocks of the HAL, or in some critical routines¹⁷ that should work seamlessly.
Every day at least a new post regarding HAL bugs appears in the official ST forum¹⁸, and
this is really annoying. ST is actively working on fixing the HAL bugs, but it seems that
we are still far from a “stable release”. Moreover, their software release lifecycle is too old
and not appropriate for the time we are living: bug fixes are released after several months,
and sometime the fix is worse than the broken code. ST should seriously take in account
to invest LESS on designing the next development kit and MORE on the development of a
decent STM32 HAL, which is currently not adeguate to the hardware development. I would
respectfully suggest to place the whole HAL on a community for developers like github, and
let the community to help fixing the bugs. This would simplify a lot the bug reporting process
too, which is currently demanded to scattered posts on the ST forum. It is a shame.
• Lack of MCUs for the IoT: the Internet of Things is the current trend in electronics, and I think
that an STM32 with 2.4Ghz frontend, ∼100k of SRAM and 512/1024k of flash is mandatory¹⁹.
An STM32 with integrated wireless network processors would be great. In short, an STM32
like the TI CC3200. Hey ST guys, can you hear me? :-)
be even less than 1€/pc and space is a strong constraint. In this group we can find Cortex-M0/3/4
base MCUs, with a maximum clock frequency ranging from 48MHz (F0) to over 72MHz (F1/F3).
Ultra Low-Power group contains those STM32 families of MCUs addressed to low-power applica-
tions, that are used in battery-powered devices that need to reduce the total power consumption to
low values ensuring longer battery life. In this group we can find both Cortex-M0+ based MCUs, for
cost-sensitive applications, and Cortex-M4 based microcontrollers with Dynamic Voltage Scaling
(DVS), a technology that allows optimizing the internal CPU voltage according to its frequency.
Introduction to STM32 MCU portfolio 25
The following paragraphs give a brief description of each STM32 family, introducing its main
features. The most important ones will be summarized inside tables. Tables were arranged by the
Introduction to STM32 MCU portfolio 26
1.3.1 F0
The STM32F0-series is the famous 32-cents for 32-bit line of MCU from the STM32 portfolio. It is
designed to have a street price able to compete with 8/16-bit MCUs from other vendors, offering a
more advanced and powerful platform.
The most important features of this series are:
• Core:
– ARM Cortex-M0 core at a maximum clock rate of 48 MHz.
– Cortex-M0 options include the SysTick Timer.
• Memory:
– Static RAM from 4 to 32 KB.
– Flash from 16 to 256 KB.
– Each chip has a factory-programmed 96-bit unique device identifier number.
• Peripherals:
– Each F0-series includes various peripherals that vary from line to line (see Table 4 for a
quick overview).
• Oscillator source consists of internal RC (8 MHz, 40 kHz), optional external HSE (4 to 32 MHz),
LSE (32.768 to 1000 kHz).
• IC packages: LQFP, TSSOP20²¹, UFBGA, UFQFPN, WLCSP (see Table 4 for more about this).
• Operating voltage range is 2.0V to 3.6V with the possibility to go down to 1.8V ±8%.
²¹F0 is the only STM32 family that provides this convenient package.
Introduction to STM32 MCU portfolio 27
1.3.2 F1
The STM32F1-series was the first ARM based MCU from ST. Introduced in the market in 2007, it is
still the most widespread MCU from the STM32 portfolio. There are a lot of development boards on
the market, produced by ST and other vendors, and you will find tons of examples on the web for
F1 microcontrollers. If you are new to the STM32 world, probably F1 line is the best choice to start
working with to learn this platform.
The F1-series has evolved over time by increasing speed, size of internal memory, variety of
peripherals. There are five F1 lines: Connectivity (STM32F105/107), Performance (STM32F103), USB
Access (STM32F102), Access (STM32F101), Value (STM32F100).
The most important features of this series are:
• Core:
– ARM Cortex-M3 core at a maximum clock rate ranging from 24 to 72 MHz.
• Memory:
– Static RAM from 4 to 96 KB.
– Flash from 16 to 256 KB.
– Each chip has a factory-programmed 96-bit unique device identifier number.
• Peripherals:
– Each F1-series includes various peripherals that vary from line to line (see Table 5 for a
quick overview).
• Oscillator source consists of internal RC (8 MHz, 40 kHz), optional external HSE (4-24MHz(F100),
4-16MHz(F101/2/3), 3-25MHz (F105/7)), LSE (32.768 - 1000 KHz) ).
• IC packages: LFBGA, LQFP, UFBGA, UFQFPN, WLCSP (see Table 5 for more about this).
Introduction to STM32 MCU portfolio 28
1.3.3 F2
Figure 14: the first Pebble watch with STM32F205 MCU inside
• Core:
– ARM Cortex-M3 core at a maximum clock rate of 120 MHz.
• Memory:
– Static RAM from 64 to 128 KB.
* 4 KB battery-backed, 80 bytes battery-backed with tamper-detection erase.
– Flash from 128 to 1024 KB.
– Each chip has a factory-programmed 96-bit unique device identifier number.
• Peripherals:
– Each F2-series includes various peripherals that vary from line to line (see Table 6 for a
quick overview).
• Oscillators consist of internal RC (16 MHz, 32 kHz), optional external HSE (1 to 26 MHz), LSE
(32.768 to 1000 kHz).
• IC packages: BGA, LQFP, UFBGA, WLCSP (see Table 6 for more about this).
• Operating voltage range is 1.8V to 3.6V.
1.3.4 F3
The STM32F3-series is the most powerful series of MCU in the Mainstream segment, based on the
ARM Cortex-M4F core. It is designed to be almost pin-to-pin compatible with the STM32 F1-series,
even if it does not offer the same variety of peripherals. STM32F3 was the MCU chosen by the
developers of BB-8 droid toy²² by Sphero²³.
²²http://cnet.co/1M2NyJS
²³http://www.sphero.com/
Introduction to STM32 MCU portfolio 30
The distinguishing feature for this series is the presence of integrated analog peripherals leading to
cost reduction at application level and simplifying application design, including:
Another interesting feature of this series is the presence of a Core Coupled Memory (CCM), a specific
memory architecture that couples some regions of memory to the CPU core, allowing 0-wait states.
This can be used to boosting time-critical routines (their stack is stored inside this memory area),
accelerating the performance up to 43%. For example, OS routines for context switching can be stored
in this area to speed up RTOS activities.
The most important features of this series are:
• Core:
– ARM Cortex-M4F core at a maximum clock rate of 72 MHz.
• Memory:
– Static RAM from 16 to 80 KB general-purpose with hardware parity check.
Introduction to STM32 MCU portfolio 31
1.3.5 F4
The STM32F4-series is the most widespread group of Cortex-M4F based MCUs in the High-
performance segment. The F4-series is also the first STM32 series to have DSP and Floating Point SP
Introduction to STM32 MCU portfolio 32
instructions. The F4 is pin-to-pin compatible with the STM32 F2-series and adds higher clock speed,
64K CCM static RAM, full duplex I²S, improved real-time clock, and faster ADCs. The STM32F4-
series is also targeted to multimedia applications, and some MCUs offer dedicated support to LCD-
TFT.
The most important features of this series are:
• Core:
– ARM Cortex-M4F core at a maximum clock ranging from 84 to 180 MHz.
• Memory:
– Static RAM from 128 to 384 KB.
* 4 KB battery-backed, 80 bytes battery-backed with tamper-detection erase.
– 64 KB core coupled memory (CCM).
– Flash from 256 to 2048 KB.
– Each chip has a factory-programmed 96-bit unique device identifier number.
• Peripherals:
– Each F4-series includes various peripherals that vary from line to line (see Table 8 for a
quick overview).
• Oscillators consist of internal RC (16 MHz, 32 kHz), optional external HSE(4 to 26 MHz),
LSE(32.768 to 1000 kHz).
• IC packages: BGA, LQFP, TFBGA, UFBGA, UFQFPN, WLCSP (see Table 8 for more about
this).
• Operating voltage range is 1.8V to 3.6V.
1.3.6 F7
as well as an L1 cache, STM32F7 devices deliver the maximum theoretical performance of the
Cortex-M7 no matter whether code is executed from embedded Flash or external memory: 1082
CoreMark/462 DMIPS at 216 MHz. STM32F7 is clearly targeted to heavy multimedia embedded
applications. Thanks to STM32 longevity program (10 years) it is possible to develop powerful
embedded applications without worrying about the MCU availablility on the market in the far
future. Cortex-M7 is backwards compatible with the Cortex-M4 instruction set, and STM32F7 series
is pin-to-pin compatible with the STM32F4 series.
The most important features of this series are:
• Core:
– ARM Cortex-M7 core at a maximum clock of 216 MHz.
• Memory:
– Static RAM of 384 KB.
– L1 cache (I/D 4 KB + 4 KB).
– Flash from 512 to 1024 KB.
– Each chip has a factory-programmed 96-bit unique device identifier number.
• Peripherals:
– Each F7-series includes various peripherals that vary from line to line (see Table 9 for a
quick overview).
• Oscillators consist of internal RC (16 MHz, 32 kHz), optional external HSE(4 to 26 MHz),
LSE(32.768 to 1000 kHz).
• IC packages: LQFP, TFBGA, UFBGA, WLCSP (see Table 8 for more about this).
• Operating voltage range is 1.7V to 3.6V.
1.3.7 L0
STM32L0-series is the cost-effective solution of the Ultra Low-Power segment. The combination
of an ARM Cortex-M0+ core and ultra-low-power features makes STM32L0 the best fit for
applications operating on battery or supplied by energy harvesting, offering the world’s lowest
power consumption at 125°C. The STM32L0 offers dynamic voltage scaling, an ultra-low-power
clock oscillator, LCD interface, comparator, DAC and hardware encryption. Current consumption
reference values:
• Core:
– ARM Cortex-M0+ core at a maximum clock rate of 32 MHz.
• Memory:
– Static RAM of 8 KB.
* 20-byte battery-backed with tamper-detection erase.
– Flash sizes from 32 to 64 KB.
– EEPROM sizes of 2 KB (with ECC).
– Each chip has a factory-programmed 96-bit unique device identifier number.
• Peripherals:
– Each L0-series includes various peripherals that vary from line to line (see Table 10 for
a quick overview).
• Oscillators consist of internal RC (16 MHz, 37 kHz), optional external HSE(1 to 24 MHz),
LSE(32.768kHz).
• IC packages are LQFP, TFBGA, UFQFPN, WLCSP (see Table 10 for more about this).
• Operating voltage range is 1.65V to 3.6V, including a programmable brownout detector.
Introduction to STM32 MCU portfolio 35
1.3.8 L1
STM32L1-series is the mid-range solution of the Ultra Low-Power segment. The combination of
an ARM Cortex-M3 core with FPU and ultra-low-power features makes the STM32L1 optimal
for applications operating on battery that also demand sufficient computing power. Like the L0-
series, The STM32L1 offers dynamic voltage scaling, an ultra-low-power clock oscillator, plus LCD
interface, comparator, DAC and hardware encryption. Current consumption reference values:
• Core:
– ARM Cortex-M3 core with FPU at a maximum clock rate of 32 MHz.
• Memory:
– Static RAM from 4 to 80 KB.
* 20 bytes battery-backed with tamper-detection erase.
– Flash sizes from 32 to 512 KB.
– EEPROM sizes of 2 KB (with ECC).
– Each chip has a factory-programmed 96-bit unique device identifier number.
• Peripherals:
Introduction to STM32 MCU portfolio 36
– Each L1-series includes various peripherals that vary from line to line (see Table 11 for
a quick overview).
• Oscillators consist of internal RC (16 MHz, 37 kHz), optional external HSE(1 to 24 MHz),
LSE(32.768kHz).
• IC packages are LQFP, TFBGA, UFBGA, UFQFPN, WLCSP (see Table 11 for more about this).
• Operating voltage range is 1.65V to 3.6V, including a programmable brownout detector.
1.3.9 L4
STM32L4-series is the new best-in-class MCU portfolio of the Ultra Low-Power segment. The
combination of an ARM Cortex-M4 core with FPU and ultra-low-power features, makes the
STM32L4 the best fit for applications demanding high performance while operating on battery or
supplied by energy harvesting. Like the L1-series, The STM32L4 offers dynamic voltage scaling, an
ultra-low-power clock oscillator. Current consumption reference values:
• Ultra-low-power mode: 30 nA with backup registers without real-time clock (5 wakeup pins).
• Ultra-low-power mode + RTC: 330 nA with backup registers (5 wakeup pins).
• Ultra-low-power mode + 32 Kbytes of RAM: 360 nA.
• Ultra-low-power mode + 32 Kbytes of RAM + RTC: 660 nA.
• Dynamic run mode: down to 100 μA/MHz.
• Wake-up time: 5 μs.
Introduction to STM32 MCU portfolio 37
• Core:
– ARM Cortex-M4F core with FPU at a maximum clock rate of 80 MHz.
• Memory:
– Static RAM of 128 KB.
* 20 bytes battery-backed with tamper-detection erase.
– Flash sizes from 256 to 1024 KB.
– Support to SDMMC and FSMC interfaces.
– Each chip has a factory-programmed 96-bit unique device identifier number.
• Peripherals:
– Each L4-series includes various peripherals that vary from line to line (see Table 12 for
a quick overview).
• Oscillators consist of internal RC(16 MHz, 37 kHz), optional external HSE(1 to 24 MHz),
LSE(32.768kHz).
• IC packages are LQFP, UFBGA, WLCSP (see Table 12 for more about this).
• Operating voltage range is 1.7V to 3.6V.
As we have seen in the previous paragraphs, the STM32 is really an extensive portfolio. We can
choose a MCU from more than 500 devices, if we consider package variants too. So, where to start?
In an ideal world, the first step of the selection process involves the understanding of needed
computing power. If we are going to develop a CPU intensive application, focused on multimedia and
graphic applications, then we have to shift our attention to the High-performance group of STM32
microcontrollers. If, instead, the computing power is not the main requirement of our electronic
device, then we can focus on the Mainstream segment, giving a close look to the STM32F1 series
that offers the most extensive portfolio to choose from.
The next step is about connectivity requirements. If we need to interact with external world thorough
an Ethernet connection or other industrial protocols like CAN bus, and our application has to be
responsive and able to deal with several Internet protocols, then the STM32F4 portfolio is probably
the best. Otherwise the STM32F105/7 connectivity line is again the best choice.
If we are going to develop a battery-powered device (maybe the new bestseller in the wearable
market), then we have to look to the STM32L portfolio, choosing among the various sub-families
according to the computing power we need.
As said at the beginning of this paragraph, this is the selection process as it happens in the ideal
world. But what about the real world? In everyday development process, we probably have to answer
the following questions before we start selecting the right MCU for our project:
• Is this device targeted for mass-market or a niche? If you are developing a device that will
be produced in small quantities, then the price difference among the STM32 microcontrollers
does not impact your project too much. You may also consider the brand new STM32F7 and
put little attention to software optimization (when dealing with low performance MCUs you
have to do all the best to optimize the code, and keep in mind that this is also a cost that
increases the total investment). But if you are going to build a mass-market device, then the
price of a single IC is really important: what you will save during production outweighs the
initial investment.
• What is the allowed budget for the total BOM? This is a corollary to the previous point. If
you already have the target price of your board, then you must carefully select the right MCU
in the early stage.
• What about space constraints? Does your board have to fit the latest wearable device, or
do you have sufficient room to use the IC package you want? The answer to this question
deeply affects the selection process of a MCU and what we can demand from it in terms of
performance and peripherals capabilities.
• What production technology can my company afford? This is another non-trivial question.
LQFP packages are still really popular in the MCU market thanks to the fact they do not require
complex production costs, and they can be easily assembled even by old production lines. BGA
and WLCSP packages require X-Ray inspection machines, and this could affect your selection
process.
• Is time-to-market critical for you? Time-to-market is always a critical point for anybody
doing business. But sometimes you need to finish the firmware the day before you started the
Introduction to STM32 MCU portfolio 39
development process. This could lead to non optimized firmware, at least in an early stage.
And this means that probably a MCU with more computing power is the best choice for you.
• Can you reuse board layouts or code? Every embedded developer has his portfolio of
libraries and well known ICs. Software development is a complex task, that involves several
stages before we can consider our firmware stable and ready for production. Sometimes (this
is happening really frequently nowadays), you have to deal with undocumented hardware
bugs or, at least, with its strange behavior. This implies that you have to be really careful in
deciding to switch to another architecture or even another MCU in the same series.
One of the key features of the STM32 platform could help a lot during the selection process: the
pin-to-pin compatibility. This allows you to choose a more powerful or a cheaper MCU during the
selection process, giving to you the freedom to change it in a more advanced development stage.
For example, for a recent board I have developed, I started by choosing an STM32F1 MCU, but I
downgraded it to a cheaper STM32F0 when I reached to the conclusion that it would fit my needs.
However, keep in mind that this process always involves adapting the code to the different sub-
family.
Figure 16: the STM32 selection tool available on the ST web site
ST offers two convenient tools to help you in the MCU selection process. The first one is available in
ST web site²⁷, in the STM32 section. It is a parametric search tool, which allows you to choose
the features you are interested in. The tool automatically filters for those MCUs that fit your
requirements.
²⁷http://www.st.com/web/en/catalog/mmc/FM141/SC1169
Introduction to STM32 MCU portfolio 40
The second tool is a useful mobile app available for iOS²⁸, Android²⁹ and Windows Mobile³⁰.
²⁸http://apple.co/Uf20WR
²⁹http://bit.ly/1Pvo8EV
³⁰http://bit.ly/1Gf4YBd
Introduction to STM32 MCU portfolio 41
For example, the new STM32L0538DISCOVERY board (Figure 18) allows to test both the STM32L053
MCU and an e-paper display. You can find a lot of tutorials around on the Internet covering boards
from Discovery line.
ST has recently introduced a completely new range of development boards: the Nucleo. The Nucleo
line-up is divided in three main groups: Nucleo-32, Nucleo-64 and Nucleo-144 (see Figure 19). The
name of each group comes from the MCU package type used: Nucleo-32 uses an STM32 in an
LQFP-32 package; Nucleo-64 uses an LQFP-64; Nucleo-144 an LQFP-144. The Nucleo-64 was the
first line introduced to the market, and there are 16 different boards, each one with a given STM32
microcontroller. The Nucleo-144 has been introduced in January 2016, and it is the first low-cost kit
equipping the powerful STM32F746. It also provides an Ethernet phyther and a LAN port. Since the
Nucleo-64 is the most complete range, this book will cover only this type of boards. In the remaining
part of the book we refer to the Nucleo-64 simply with the term “Nucleo”.
The Nucleo is composed of two parts, as shown in Figure 20. The part with the mini-USB connector
is an ST-LINK 2.1 integrated debugger, which is used to upload the firmware on the target MCU and
to do step-by-step debugging. ST-LINK interface also provides a Virtual COM Port (VCP), which
can be used to exchange data and messages with the host PC. One key feature of Nucleo boards is
that the ST-LINK interface can be easily separated from the rest of the board (two red scissors in
Figure 19 show where to break). In this way it can be used as stand-alone ST-LINK programmer
(a stand-alone ST-LINK programmer costs about 25€). However, the ST-LINK provides a optional
SWD interface that can be used to program another board without detaching the ST-LINK interface
from the Nucleo (as it already happens with Discovery boards) by removing the two jumpers labeled
ST-LINK. The rest of the board contains the target MCU, that is the microcontroller that we will use
to develop our applications, a RESET button, a user programmable tactile button and a LED. The
board contains also one pad to mount an external high speed crystal (HSE). All recent Nucleo boards
Introduction to STM32 MCU portfolio 42
already provide a low-speed crystal. Finally, the board has several pin headers we will see in a while.
The reason why ST introduced this new kit is not clear, given that Discovery boards are valid
development tools. I think that the main reason is to attract people from the Arduino world. In fact,
Nucleo boards provide pin headers to accept the Arduino shields, expansion boards specifically built
to expand the Arduino UNO and all other Arduino boards. Figure 21³¹ shows the STM32 peripherals
and GPIOs associated to the Arduino compatible connector.
To be honest, the Nucleo boards have other interesting advantages compared to the Discovery ones.
³¹Figure 21 and 21 are taken from mbed.org website and they refer to the Nucleo-F401RE board. Please, refer to Appendix C for
the right pin-out of your Nucleo board.
Introduction to STM32 MCU portfolio 43
First of all, ST sells them at a really aggressive price (probably for the same reason as before). A
Nucleo costs from 10€ to 15€, depending on where you buy it. And if you think about what you can
do with this architecture, you have to agree that it is really underpriced compared to Arduino DUE
board (which is also equipped with a 32-bit processor from ATMEL). Another interesting feature is
that Nucleo boards are designed to be pin-to-pin compatible with each other. This means that you can
develop the firmware for the STM32Nucleo-F103RB board (equipped with the popular STM32F103
MCU) and you can adapt it to a more powerful Nucleo (e.g. STM32Nucleo-F401RE) if you need more
computing power.
In addition to Arduino compatible pin headers, the Nucleo provides its own expansion connectors.
They are two 2x19 2.54mm spaced male pin headers. They are called Morpho connectors and are a
convenient way to access most of the MCU pins. Figure 22 shows the STM32 peripherals and GPIOs
associated to the Morpho connector.
Introduction to STM32 MCU portfolio 44
As far as I know there aren’t expansion boards that use the Morpho connector yet. Even ST is
releasing several expansion shields for the Nucleo that are compatible with the Arduino UNO. For
example, Figure 23 shows a Nucleo board with a X-NUCLEO-IDB04A1 expansion board, a shield
that equips the BlueNRG monolithic Bluetooth Low Energy 4.0 network processor.
There are sixteen Nucleo boards available at the time of writing this chapter (September 2015). Table
13 summarizes their main features, together with those ones common to all Nucleo boards.
Introduction to STM32 MCU portfolio 45
However, keep in mind that the whole book is designed to give the reader all the necessary
tools to start working with any board, even with custom ones. It will be really easy to adapt
the examples to your needs.
2. Setting-up the tool-chain
Before we can start developing applications for the STM32 platform, we need a complete tool-chain.
A tool-chain is a set of programs, compilers and tools that allows us:
• to write down our code and to navigate inside source files of our application;
• to navigate inside the application code, allowing us to inspect variables, function definition-
s/declarations, and so on;
• to compile the source code using a cross-platform compiler;
• to upload and debug our application on the target development board (or a custom board we
have made).
There are several complete tool-chains for the STM32 Cortex-M family, both free and commercial.
IAR for Cortex-M¹ and Keil² are two of the most used commercial tool-chains for Cortex-M
microcontrollers. They are a complete solution for developing applications for the STM32 platform,
but being commercial products they have a street price that may be too high for small sized
companies or students (they may cost more than €5.000 according the effective features we need).
However, this book does not cover commercial IDEs and, if you already have a license for one
of these environments, you can skip this chapter, but you will need to arrange the instructions
contained in this book according your tool-chain.
CooCox³ and System Workbench for STM32⁴ (shortened as SW4STM32) are two free development
environments for the STM32 platform. These IDEs are essentially based on Eclipse and GCC (like
the majority of free environments for ARM Cortex-M - for example, Atollic True Studio⁵ is an
Eclipse/GCC based IDE that is not free). They do a good job trying to provide support for the
¹http://bit.ly/1Qxtkql
²http://www.keil.com/arm/mdk.asp
³http://www.coocox.org/
⁴http://www.openstm32.org/
⁵http://atollic.com/
46
Setting-up the tool-chain 47
STM32 family, and they work out of the box in most cases. However, there are several things to
consider while evaluating these tools. First of all, they do not support Operating Systems other
than Windows. If you have a Mac or a Linux PC, unfortunately CooCox IDE currently supports
only Windows; instead, SWSTM32 provides a limited support for Linux (no installer is available for
Linux at time of writing this chapter) and no support for Mac. Moreover, they already come with all
needed tools preinstalled and configured. While this could be an advantage if you are totally new to
the development process for Cortex-M processors, it can be a strong limitation as long as you start
doing serious work. It is really important to have the full control over the tools needed to develop
your firmware, especially when dealing with Open Source software. So, the best choice is to set
up a complete tool-chain from scratch. This allows you to familiarize with the programs and their
configuration procedures, giving the full control over your development environment. This could
be annoying especially at the first time, but it is the only way to learn which piece of software is
involved into a given development stage.
In this chapter I will show the required steps to setup a complete tool-chain for the STM32 platform
on Windows, Mac OSX and Linux. The tool-chain is based on two main tools, Eclipse and GCC,
plus a series of external tools and Eclipse plug-ins that allow to build STM32 programs efficiently.
Although the instructions are essentially equal for the three platforms, I will adapt them for each OS,
showing dedicated screen captures and commands. This will simplify a lot the installation procedure,
and will allow you to setup a complete tool-chain in less time. This will also give us the opportunity
to study in detail every components of our tool-chain. In the next chapter, I will show you how to
setup a minimal application (a blinking LED - the Hello World application in electronics), which will
allow us to test our tool-chain.
• It is GCC based: GCC is probably the best compiler on the earth, and it gives excellent results
even with ARM based processors. ARM is nowadays the most widespread architecture (thanks
to the embedded systems becoming widespread in the recent years), and many hardware and
software manufacturers use GCC as the base tool for their platform.
Setting-up the tool-chain 48
• It is cross-platform: if you have a Windows PC, the latest sexy Mac or a Linux server you
will be able to successfully develop, compile and upload the firmware on your development
board with no difference. Nowadays, this is a mandatory requirement.
• Eclipse wide-spread: a lot of commercial IDEs for STM32 (like TrueSTUDIO and others) are
also based on Eclipse, which has become a sort of standard. There are a lot of useful plug-ins
for Eclipse that you can download with just one click. And it is a product that evolves day by
day.
• It is Open Source: ok. I agree. For such giant pieces of software it is really hard to try to
understand their internals and modify the code, especially if you are a hardware engineer
committed to transistors and interrupts management. But if you get in trouble with your tool,
it is simpler to try to understand what goes wrong with an open source tool than a closed one.
• Large and growing community: these tools have by now a great international community,
which continuously develops new features and fixes bugs. You will find tons of examples and
blogs, which can help you during your work. Moreover, many companies, which have adopted
this software as official tools, give economical contribution to the main development. This
guarantees that the software will not suddenly disappear.
• It is free: Yep. I placed this as the last point, but it is not the least. As said before, a commercial
IDE can cost a fortune for a small company or a hobbyist/student. And the availability of free
tools is one of the key advantages of STM32 platform.
an assembler, a linker, a debugger (also known as GDB), several tools for binary files inspection,
disassembly and optimization. Moreover, GCC is also equipped with the run-time environment for
the C language, customized for the target architecture.
In recent years, several companies, even in the embedded world, have adopted GCC as official
compiler. For example, ATMEL uses GCC as cross-compiler for its AVR Studio development
environment.
What is a cross-compiler?
We usually refer to term compiler as a tool able to generate machine code for the processor
equipping our PC. A compiler is just a “language translator” from a given programming
language (C in our case) to a low-level machine language, also known as assembly. For
example, if we are working on Intel x86 machine, we use a compiler to generate x86
assembly code from the C programming language. For the sake of completeness, we have
to say that nowadays a compiler is a more complex tool that addresses both the specific
target hardware processor and the Operating System we are using (e.g. Windows 7).
A cross-platform compiler is a compiler able to generate machine code for a hardware
machine different from the one we are using to develop our applications. In our a case,
the GCC ARM Embedded compiler generates machine code for Cortex-M processors while
compiling on an x86 machine with a given OS (e.g. Windows or Mac OSX).
In the ARM world, GCC is the most used compiler especially due the fact that it is used as
main development tool for Linux based Operating Systems for ARM Cortex-A processors (ARM
microcontrollers that equip almost every mobile device). ARM engineers actively collaborate to the
development of ARM GCC. ST Microelectronics does not provide its development environment, but
explicitly supports GCC based tool-chains. For this reason, it is really simple to setup a complete
and working tool-chain to develop embedded applications with GCC.
The next three paragraphs, and their sub-paragraphs, are almost identical. They only differ
on those parts specific for the given OS (Windows, Linux or Mac OS). So, jump to the
paragraph you are interested in, and skip the remaining ones.
• A Windows based PC with sufficient hardware resources (I suggest to have at least 4Gb of
RAM and 5Gb of free space on the Hard Disk); the screen captures in this section are based
on Windows 7, but the instructions have been tested successfully on Windows XP, 7, 8.1 and
the latest Windows 10.
Setting-up the tool-chain 50
• Java 8 Update 60 or later. If you do not have this version, you can download it for free from
official Java support page⁸.
The Eclipse IDE is distributed as a ZIP archive. Extract the content of the archive as-is inside the
folder C:\STM32Toolchain. At the end of the process you will find the folder C:\STM32Toolchain\eclipse
containing the whole IDE.
Now we can execute for the first time the Eclipse IDE. Go inside the C:\STM32Toolchain\eclipse
folder and run the eclipse.exe file. After a while, Eclipse will ask you for the preferred folder where
all Eclipse projects are stored (this is called workspace), as shown in Figure 2.
You are free to choose the folder you prefer, or leave the suggested one. In this book we will assume
that the Eclipse workspace is located inside the C:\STM32Toolchain\projects folder. Arrange the
instructions accordingly if you choose another location.
What is a plug-in?
A plug-in is an external software module that extends Eclipse functionalities. A plug-in
must adhere to a standard API defined by Eclipse developers. In this way, it is possible for
third party developers to add features to the IDE without changing the main source code.
We will install several plug-ins in this book to adapt Eclipse to our needs.
The first plug-in we need to install is the C/C++ Development Tools SDK, also known as Eclipse CDT.
CDT provides a fully functional C and C++ Integrated Development Environment based on Eclipse
platform. Features include: support for project creation and managed build for various tool-chains,
standard make build, source navigation, various source knowledge tools, such as type hierarchy, call
graph, include browser, macro definition browser, code editor with syntax highlighting, folding and
hyperlink navigation, source code refactoring and code generation, visual debugging tools, including
memory, registers, and disassembly viewers.
Setting-up the tool-chain 52
To install CDT we have to follow this procedure. Go to Help->Install new software… as shown in
Figure 3.
In the plug-ins install window, we need to enable other plug-in repositories clicking on Available
software Sites link. In the Preferences window, select the “Install/Update->Available Software Sites”
entry on the left and then check “CDT ” entry as shown in Figure 4. Click on the OK button.
Setting-up the tool-chain 53
Now, from “work with” drop-down menu choose “CDT ” repository, as shown in Figure 5, and
then select “CDT Main Features->C/C++ Development Tools SDK” as shown in Figure 6. Click on
“Next” button and follow the instructions to install the plug-in. At the end of installation process (the
installation takes a while depending your Internet connection speed), restart Eclipse when requested.
Setting-up the tool-chain 54
Now we have to install the GNU ARM plug-ins for Eclipse¹⁰. These plug-ins add a rich set of features
to Eclipse CDT to interface GCC ARM tool-chain. Moreover, they provide specific functionalities
for the STM32 platform. Plug-ins are developed and maintained by Liviu Ionescu, who did a really
excellent work in providing support for the GCC ARM tool-chain. Without these plug-ins it is almost
impossible to develop and run code with Eclipse for the STM32 platform. To install GCC ARM plug-
ins go to Help->Install New Software… and click on the “Add…” button. Fill the text fields in the
following way (see Figure 7):
Name: GNU ARM Eclipse Plug-ins
Location: http://gnuarmeclipse.sourceforge.net/updates
and click the “OK” button. After a while, the complete list of available plug-ins will be shown. Select
plug-ins to install as shown in Figure 8.
¹⁰http://gnuarmeclipse.github.io/
Setting-up the tool-chain 56
Click on “Next” button and follow the instructions to install the plug-ins. At the end of installation
process, restart Eclipse when requested.
Eclipse is now essentially configured to start developing STM32 applications. We will install
additional plug-ins later, in a subsequent chapter dedicated to debugging. Now we need the cross-
compiler suite to generate the firmware for the STM32 family.
Setting-up the tool-chain 57
The installer, by default, suggests a destination folder that is related to the GCC version
we are going to install (5.2 2015q4). This is not convenient, because when GCC is updated
to a newer version we need to change settings for each Eclipse project we have made.
Once the installation is complete, the installer will show us a form with four different checkboxes.
Leave only the last one checked (Add registry information), as shown in Figure 10.
¹¹https://launchpad.net/gcc-arm-embedded
¹²https://launchpad.net/gcc-arm-embedded/+download
¹³http://bit.ly/1UjU3uD
Setting-up the tool-chain 58
If this is the only GCC installed in your system, it is safe to check the entry Add path
to environment variable too. Otherwise, I suggest to leave that option unchecked, and
to handle the PATH environment variable using Eclipse on a project basis. So, once
you have created a new project (we will see this in a while), go inside the Eclipse
properties (File->Properties), then go to C/C++ Build->Environment. Ensure that the path
C:\STM32Toolchain\gcc-arm\bin; is inside the PATH environment variable. If it is not
there, add it manually, as shown below.
Setting-up the tool-chain 59
Once again, this ensures us that we should not change Eclipse settings when a new
release of OpenOCD will be released, but we only need to replace the content inside
C:\STM32Toolchain\openocd folder with the new software release.
Nucleo-F401RE, which is based on the STM32F401RE MCU, and we want to use its user LED (marked
as LD2 on the board), then STM32CubeMX will automatically generate all source files containing the
C code required to configure the MCU (clock, peripheral ports, and so on) and the GPIO connected
to the LED (port GPIO 5 on port A on almost all Nucleo boards). You can download STM32CubeMX
from the official ST page¹⁸ (the download link is at the bottom of the page), and follow the installation
instructions.
Another useful tool is the ST-LINK Utility¹⁹. It is a software that downloads firmware on the MCU
using the ST-LINK interface of our Nucleo, or a dedicated ST-LINK programmer. We will use it in
the next chapter. You can download ST-LINK Utility from the official ST page²⁰ (the download link
is at the bottom of the page), and follow the installation instructions.
Warning
Before installing drivers, disconnect the board from the USB port!
It is important to note that the Nucleo board provides a more recent version of ST-LINK, which is
different from the commercial ST-LINK programmer: the 2.1 version. This means that you have to
update your drivers even if you already have an ST-LINK programmer. You can download the latest
drivers from the ST web site²¹.
Drivers come as ZIP file. Extract the file in a convenient place. You will find two files inside the
package: dpinst_x86 and dpinst_amd64. Choose the first one if your PC (and OS) is 32bit, the second
one if it is 64bit.
¹⁸http://bit.ly/1RLCa4G
¹⁹http://www.st.com/web/en/catalog/tools/PF258168
²⁰http://www.st.com/web/en/catalog/tools/PF258168
²¹http://bit.ly/1PwwHiS
Setting-up the tool-chain 61
Figure 11: Two new devices appears once ST-LINK drivers are installed
Install the drivers and check that everything works correctly. When you connect your board to the
PC, you should see two new devices inside the Windows Device Manager (Figure 11): the STLink
dongle under USB devices and the STLink Virtual COM Port under Ports entry. If everything works
correctly, you can go to the next step.
Warning
Read this paragraph carefully. Do not skip this step!
I have bought several Nucleo boards and I saw that all boards come with an old ST-LINK firmware.
In order to use the Nucleo with OpenOCD, the firmware must be updated to the latest version.
Once the ST-LINK drivers are installed, we can download the latest ST-LINK firmware update from
ST website²². The firmware is distributed as ZIP file. Extract it in a convenient place. Connect
your Nucleo board using a USB cable and go inside the Windows sub-folder and execute the file
ST-LINKUpgrade. Click on Device Connect button.
²²http://bit.ly/1RLDp3H
Setting-up the tool-chain 62
After a while, ST-LINK Upgrade will show if your Nucleo firmware needs to be updated (pointing out
a different version, as shown in Figure 12). If so, click on Yes >>>> button and follow the instructions.
Congratulation. The tool-chain is now complete, and you can jump to the next chapter.
• A PC running Ubuntu Linux 14.04 LTS Desktop (aka Trusty Tahr) with sufficient hardware
resources (I suggest to have at least 4Gb of RAM and 5Gb of free space on the Hard Disk); the
instructions should be easily arranged for other Linux distributions.
• Java 8 Update 60 or later. Read the next paragraph dedicated for Java installation if it is not
installed yet.
After successfully installing the JDK, check that all works well running the java -version command
at command line:
$ java -version
java version "1.8.0_60"
Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)
The Eclipse IDE is distributed as a .tar.gz archive. Extract the content of the archive as-
is inside the folder ∼/STM32Toolchain. At the end of the process you will find the folder
∼/STM32Toolchain/eclipse containing the whole IDE.
Now we can execute for the first time the Eclipse IDE. Go inside the ∼/STM32Toolchain/eclipse
folder and run the eclipse file. After a while, Eclipse will ask you for the preferred folder where all
Eclipse projects are stored (this is called workspace), as shown in Figure 14.
Setting-up the tool-chain 65
You are free to choose the folder you prefer, or leave the suggested one. In this book we will assume
that the Eclipse workspace is located inside the ∼/STM32Toolchain/projects folder. Arrange the
instructions accordingly if you choose another location.
What is a plug-in?
A plug-in is an external software module that extends Eclipse functionalities. A plug-in
must adhere to a standard API defined by Eclipse developers. In this way, it is possible for
third party developers to add features to the IDE without changing the main source code.
We will install several plug-ins in this book to adapt Eclipse to our needs.
The first plug-in we need to install is the C/C++ Development Tools SDK, also known as Eclipse CDT.
CDT provides a fully functional C and C++ Integrated Development Environment based on Eclipse
platform. Features include: support for project creation and managed build for various tool-chains,
standard make build, source navigation, various source knowledge tools, such as type hierarchy, call
graph, include browser, macro definition browser, code editor with syntax highlighting, folding and
hyperlink navigation, source code refactoring and code generation, visual debugging tools, including
memory, registers, and disassembly viewers.
To install CDT we have to follow this procedure. Go to Help->Install new software… as shown in
Figure 15.
Setting-up the tool-chain 66
In the plug-ins install window, we need to enable other plug-in repositories clicking on Available
software Sites link. In the Preferences window, select the “Install/Update->Available Software Sites”
entry on the left and then check “CDT ” entry as shown in Figure 16. Click on the OK button.
Setting-up the tool-chain 67
Now, from “work with” drop-down menu choose “CDT ” repository, as shown in Figure 17, and
then select “CDT Main Features->C/C++ Development Tools SDK” as shown in Figure 18. Click on
“Next” button and follow the instructions to install the plug-in. At the end of installation process (the
installation takes a while depending your Internet connection speed), restart Eclipse when requested.
Setting-up the tool-chain 68
Now we have to install the GNU ARM plug-ins for Eclipse²⁴. These plug-ins add a rich set of features
to Eclipse CDT to interface GCC ARM tool-chain. Moreover, they provide specific functionalities
for the STM32 platform. Plug-ins are developed and maintained by Liviu Ionescu, who did a really
excellent work in providing support for the GCC ARM tool-chain. Without these plug-ins it is almost
impossible to develop and run code with Eclipse for the STM32 platform. To install GCC ARM plug-
ins go to Help->Install New Software… and click on the “Add…” button. Fill the text fields in the
following way (see Figure 19):
Name: GNU ARM Eclipse Plug-ins
Location: http://gnuarmeclipse.sourceforge.net/updates
and click the “OK” button. After a while, the complete list of available plug-ins will be shown. Select
plug-ins to install as shown in Figure 20.
²⁴http://gnuarmeclipse.github.io/
Setting-up the tool-chain 70
Click on “Next” button and follow the instructions to install the plug-ins. At the end of installation
process, restart Eclipse when requested.
Eclipse is now essentially configured to start developing STM32 applications. We will install
additional plug-ins later, in a subsequent chapter dedicated to debugging. Now we need the cross-
Setting-up the tool-chain 71
Warning
Read this paragraph carefully. Do not skip this step!
On Linux, we do not need to install Nucleo drivers from ST, but we need to install libusb-1.0 with
the following command:
Warning
Read this paragraph carefully. Do not skip this step!
I have bought several Nucleo boards and I saw that all boards come with an old ST-LINK firmware.
In order to use the Nucleo with OpenOCD, the firmware must be updated to the latest version.
²⁵https://launchpad.net/gcc-arm-embedded
²⁶https://launchpad.net/gcc-arm-embedded/+download
²⁷http://bit.ly/1Px0pAH
Setting-up the tool-chain 72
We can download the latest ST-LINK drivers from ST website²⁸. The firmware is distributed as ZIP
file. Extract it in a convenient place. Connect your Nucleo board using a USB cable and go inside
the AllPlatforms subfolder and execute the file STLinkUpgrade.jar. Click on Open in update mode
button (see Figure 21).
After a while, ST-LINK Upgrade will show if your Nucleo firmware needs to be updated (it shows
different versions). If so, click on Upgrade button and follow the instructions.
$ cd ~/STM32Toolchain
$ tar -xvjf ~/Downloads/openocd-0.9.0-tar.bz2
²⁸http://bit.ly/1RLDp3H
²⁹http://openocd.org/
³⁰http://sourceforge.net/projects/openocd/files/latest/download?source=files
Setting-up the tool-chain 73
$ mv openocd-0.9.0 openocd
Once again, this ensures us that we should not change Eclipse settings when a new
release of OpenOCD will be released, but we only need to replace the content of
∼/STM32Toolchain/openocd folder with the new software release.
$ cd openocd
$ ./configure --enable-stlink
...
$ make -j4
...
Now we need one more step. By default, Linux does not not allow unprivileged users to access an
USB device using libusb. So, to start a connection between OpenOCD and the ST-LINK interface,
we need to run OpenOCD with root privileges. This not convenient, because we will have troubles
with the Eclipse configuration. So, we have to configure the Universal DEVice manager (aka udev)
to grant access to unprivileged users to ST-LINK interface. To do so, let us create a file named
stlink.conf inside the /etc/udev/rules.d directory and add this line inside it:
Save the file and restart udev service in the following way:
Now we are ready to test our Nucleo board. Plug it in your PC using USB cable. After a few seconds,
type the following commands:
$ cd ~/STM32Toolchain/openocd/tcl
$ ../src/openocd -f board/<nucleo_conf_file>.cfg
where <nucleo_conf_file>.cfg must be substituted with the config file that fits your Nucleo board,
according Table 1. For example, if your Nucleo is the Nucleo-F401RE, then the proper config file to
pass to OpenOCD is st_nucleo_f4.cfg.
Setting-up the tool-chain 74
If everything went the right way, you should see these messages on the console:
At the same time, the LED LD1 on the Nucleo board should start blinking GREEN and RED
alternatively.
Setting-up the tool-chain 75
click on the STM32CubeMX icon. After a while, STM32CubeMX will appear on the screen, as shown
in Figure 23.
To test if it works well, launch the program with the following command:
$ qstlink2
Then connect your Nucleo to the USB port of your PC and click on “Connect” button. If all works
well, the program should show the type of MCU that equips your Nucleo board, as shown in Figure
24
Congratulation. The tool-chain is now complete, and you can jump to the next chapter.
• A Mac running Mac OSX 10.9 (aka Mavericks) or higher with sufficient hardware resources (I
suggest to have at least 4Gb of RAM and 5Gb of free space on the Hard Disk); the instructions
are based on Mac OSX 10.9, but they work well also on Mac OSX 10.7.
• You have already installed the Xcode release that fits your Mac OSX version (you can
download it using the App Store) and its corresponding command line tools. You will find
several tutorials on the web describing how to install Xcode and command line tools if you
are completely new to this topic.
Setting-up the tool-chain 78
• You have already installed MacPorts³³ and upgraded it issuing the command sudo port
selfupdate at terminal command line. You are free to use another package manager for Mac
OSX, but arrange following instructions accordingly.
• Java 8 Update 60 or later. If you do not have this version, you can download it for free from
official Java support page³⁴.
The Eclipse IDE is distributed as a ZIP archive. Extract the content of the archive as-is inside the
folder ∼/STM32Toolchain. At the end of the process you will find the folder ∼/STM32Toolchain/eclipse
containing the whole IDE.
Now we can execute for the first time the Eclipse IDE. Go inside the ∼/STM32Toolchain/eclipse
folder and run the Eclipse file. After a while, Eclipse will ask you for the preferred folder where all
Eclipse projects are stored (this is called workspace), as shown in Figure 26.
Setting-up the tool-chain 80
You are free to choose the folder you prefer, or leave the suggested one. In this book we will assume
that the Eclipse workspace is located inside the ∼/STM32Toolchain/projects folder. Arrange the
instructions accordingly if you choose another location.
What is a plug-in?
A plug-in is an external software module that extends Eclipse functionalities. A plug-in
must adhere to a standard API defined by Eclipse developers. In this way, it is possible for
third party developers to add features to the IDE without changing the main source code.
We will install several plug-ins in this book to adapt Eclipse to our needs.
The first plug-in we need to install is the C/C++ Development Tools SDK, also known as Eclipse CDT.
CDT provides a fully functional C and C++ Integrated Development Environment based on Eclipse
platform. Features include: support for project creation and managed build for various tool-chains,
standard make build, source navigation, various source knowledge tools, such as type hierarchy, call
graph, include browser, macro definition browser, code editor with syntax highlighting, folding and
hyperlink navigation, source code refactoring and code generation, visual debugging tools, including
memory, registers, and disassembly viewers.
To install CDT we have to follow this procedure. Go to Help->Install new software….
Setting-up the tool-chain 81
In the plug-ins install window, we need to enable other plug-in repositories clicking on Available
software Sites link. In the Preferences window, select the “Install/Update->Available Software Sites”
entry on the left and then check “CDT ” entry as shown in Figure 26. Click on the OK button.
Now, from “work with” drop-down menu choose “CDT ” repository, as shown in Figure 27, and
then select “CDT Main Features->C/C++ Development Tools SDK” as shown in Figure 28. Click on
“Next” button and follow the instructions to install the plug-in. At the end of installation process (the
installation takes a while depending your Internet connection speed), restart Eclipse when requested.
Setting-up the tool-chain 82
Now we have to install the GNU ARM plug-ins for Eclipse³⁶. These plug-ins add a rich set of features
to Eclipse CDT to interface GCC ARM tool-chain. Moreover, they provide specific functionalities
for the STM32 platform. Plug-ins are developed and maintained by Liviu Ionescu, who did a really
excellent work in providing support for the GCC ARM tool-chain. Without these plug-ins it is almost
impossible to develop and run code with Eclipse for the STM32 platform. To install GCC ARM plug-
ins go to Help->Install New Software… and click on the “Add…” button. Fill the text fields in the
following way:
Location: GNU ARM Eclipse Plug-ins
Location: http://gnuarmeclipse.sourceforge.net/updates
and click the “OK” button. After a while, the complete list of available plug-ins will be shown. Select
plug-ins to install as shown in Figure 29.
³⁶http://gnuarmeclipse.github.io/
Setting-up the tool-chain 84
Click on “Next” button and follow the instructions to install the plug-ins. At the end of installation
process, restart Eclipse when requested.
Eclipse is now essentially configured to start developing STM32 applications. We will install
additional plug-ins later, in a subsequent chapter dedicated to debugging. Now we need the cross-
compiler suite to generate the firmware for the STM32 family.
Warning
Read this paragraph carefully. Do not skip this step!
On Mac, we do not need to install Nucleo drivers from ST, but we need to install libusb-1.0 with
the following command:
Warning
Read this paragraph carefully. Do not skip this step!
I have bought several Nucleo boards and I saw that all boards come with an old ST-LINK firmware.
In order to use the Nucleo with OpenOCD, the firmware must be updated to the latest version.
Once the ST-LINK drivers are installed, we can download the latest ST-LINK drivers from ST
website⁴⁰. The firmware is distributed as ZIP file. Extract it in a convenient place. Connect your
Nucleo board using a USB cable and go inside the AllPlatforms subfolder and execute the file
STLinkUpgrade.jar. Click on Open in update mode button.
³⁸https://launchpad.net/gcc-arm-embedded/+download
³⁹http://bit.ly/20SkQht
⁴⁰http://bit.ly/1RLDp3H
Setting-up the tool-chain 86
After a while, ST-LINK Upgrade will show if your Nucleo firmware needs to be updated (pointing out
a different version, as shown in Figure 30). If so, click on Upgrade button and follow the instructions.
$ cd ~/STM32Toolchain
$ tar -xvjf ~/Downloads/openocd-0.9.0-tar.bz2
$ mv openocd-0.9.0 openocd
Once again, this ensures us that we will not change Eclipse settings when a
new release of OpenOCD will be released, but we will only need to replace the
∼/STM32Toolchain/openocd with the new software release.
$ cd openocd
$ LDFLAGS=-L/opt/local/lib CPPFLAGS=-I/opt/local/include ./configure --enable-stlink
...
$ make -j4
...
Ok. We are ready to test our Nucleo board. Plug it in your Mac using USB cable. After a few seconds,
type the following commands:
$ cd ~/STM32Toolchain/openocd/tcl
$ ../src/openocd -f board/<nucleo_conf_file>.cfg
where <nucleo_conf_file>.cfg must be substituted with the config file that fits your Nucleo board,
according Table 1. For example, if your Nucleo is the Nucleo-F401RE, then the proper config file to
pass to OpenOCD is st_nucleo_f4.cfg.
If everything went the right way, you should see these messages on the console:
At the same time, the LED LD1 on the Nucleo board should start blinking GREEN and RED
alternatively.
Congratulation. The tool-chain is now complete, and you can jump to the next chapter.
⁴⁴https://github.com/texane/stlink
Setting-up the tool-chain 91
$ cd ~/STM32Toolchain
$ git clone https://github.com/texane/stlink
$ cd stlink
$ ./autogen.sh
$ ./configure
$ make
To test if it works well, connect your Nucleo to the USB of your Mac and type this command:
$ ./st-info --descr
F4 device (Dynamic Efficency)
If all went well, the st-info command should print a string that identifies the target MCU of our
Nucleo.
Congratulation. The tool-chain is now complete, and you can jump to the next chapter.
3. Hello, Nucleo!
There is no programming book that does not begin with the classic “Hello world!” program. And
this book will follow the tradition. In the previous chapter we have configured the development
environment needed to program STM32 based boards. So, we are now ready to start coding.
In this chapter we will create a really basic program: a blinking LED. We will use the GNU ARM
Eclipse plug-in to create a complete application in a few steps without dealing, in this phase, with
aspects related to the ST Hardware Abstraction Layer (HAL). I am aware that not all details presented
in this chapter will be clear from the beginning, especially if you are totally new to the embedded
programming.
However, this first example will allow us to become familiar with the development environment.
Following chapters, especially the next one, will clarify a lot of obscure things. So I suggest you to
be patient and try to take the best from the following paragraphs.
If you are totally new to Eclipse IDE, it is a good idea to take a break before using it. The next
paragraph will briefly explain its main functionalities.
92
Hello, Nucleo! 93
Figure 1: the Eclipse interface once started for the first time¹
Eclipse is a multi-view IDE, organized so that all the functionalities are displayed in one window,
but the user is free to arrange the interface at its needs. The fist time Eclipse starts, a welcome screen
is presented. The content of that welcome tab is called view.
To close the welcome view, click on the cross icon, as shown in Figure 2. Once the welcome view
goes away, the C/C++ perspective appears, as shown in Figure 3.
Hello, Nucleo! 94
In Eclipse a perspective is a way to arrange views in a manner that is related to the functionalities
of the perspective. The C/C++ perspective is dedicated to coding, and it presents all aspects related
to the editing of the source code and its compiling. It is divided into four views.
The view on the left, named Project Explorer, shows all projects inside the workspace.
If you recall from the previous chapter, the first time we have started Eclipse we had to
choose the workspace directory. The workspace is the place where a group of projects are
stored. Please note that we say a group of projects and not all the projects. This means that
we can have several workspaces (that is, directories) where different groups of projects are
stored. However, a workspace contains also IDE configurations, and we can have different
configurations for every workspace.
The centered view, that is the larger one, is the C/C++ editor. Each source file is shown as a tab, and
it is possible to have many tabs opened at the same time.
The view in the bottom of Eclipse window is dedicated to several activities related to coding and
compiling, and it is subdivided into tabs. For example, the Console tab shows the output from the
compiler; the Problems tab organizes all messages coming from the compiler in a convenient way
to inspect them; the Search tab contains the search results.
The view on the right contains other several tabs. For example the Outline tab shows the content of
each source file (functions, variables, and so on), allowing quickly navigation inside the file content.
Hello, Nucleo! 95
There are other views available (and many other ones that are provided by custom plug-ins). User
can show them by going inside the Window->Show View->Other… menu.
Sometimes it happens that a view is “minimized” and it seems to disappear from the IDE.
When you are new to Eclipse, this might lead to frustration trying to understand where
it went. For example, looking at Figure 4 it seems that the Project Explorer view has
disappeared, but it is simply minimized and you can restore it clicking on the icon circled
in red. However, sometimes the view has really been closed. This happens when there is
only one tab active in that view and we close it. In this case you can enable the view again
going in the Window->Show View->Other… menu.
To switch between different perspectives you can use the specific toolbar available in top-right side
of Eclipse (see Figure 5)
Hello, Nucleo! 96
By default, the other available perspective is Debug, which we will see in more depth later. You can
enable other perspectives going into Window->Perspective->Open Perspective->Other… menu.
We stop here for the moment. As we go forward with the topics of this book, we will have a chance
to see other features of Eclipse.
In the Project name field write hello-nucleo (your are totally free to choose the project name
you like). The important part, indeed, is the Project type section. Here we have to choose the
Hello, Nucleo! 97
STM32 family of our Nucleo board. For example, if we have a NUCLEO-F401RE we have to choose
STM32F4xx C/C++ Project, or if we have a NUCLEO-F103RB we have to choose STM32F1xx C/C++
Project.
Unfortunately, Liviu Ionescu still has not implemented project templates for the
STM32L0/1/4 families. If your Nucleo is based on one of these series, you have to jump
to the next chapter, where we will see a more general way to generate projects for the
STM32 platform. However, it could be that by the time you read this chapter, the plug-in
has been updated with new templates.
Now click on the Next button. In this step of the wizard it is really important to select the right
size of RAM and FLASH memory (if those fields do not match the quantity of RAM and FLASH of
the MCU equipping your Nucleo, it will be impossible to start the example application)². Use Table
1 to choose the correct values for your Nucleo board³.
Table 1: RAM and FLASH size to select according the given Nucleo
So, fill the fields of second step in the following way (see Figure 7 for reference):
Chip Family: Select the Cortex-M core of the MCU equipping your Nucleo (see Table 1).
Flash size: keep the right value from Table 1.
RAM size: keep the right value from Table 1.
External clock(Hz): it is ok to leave this field as is at the moment.
²Owners of STM32F4 and STM32F7 development boards will not find the entry to specify the RAM size. Do not complain about
this, since the project wizard is designed to properly configure the right amount of RAM if you choose the right Chip family type.
³In case you are using a different development board (e.g. a Discovery kit), check on the ST web site for right values of RAM
and FLASH.
Hello, Nucleo! 98
Those of you having a STMF3 Nucleo, will find an additional field in the wizard step. It is
named CCM RAM Size (KB), and it is related to the Core Coupled Memory (CCM). If you have
a NUCLEOF334 or a NUCLEO-F303 board, fill the field with the value from Table 1. For other
STM32F3 based boards place a zero in that field.
Now click on the Next button. In the next two wizard steps, leave all parameters as default.
Finally, in the last step you have to select the GCC tool-chain path. In the previous chapter,
we have installed GCC inside the ∼/STM32Toolchain/gcc-arm folder (in Windows the folder
Hello, Nucleo! 99
was C:\STM32Toolchain\gcc-arm). So, select that folder as shown in Figure 8 (either typing the
pathname or using the Browse button), and ensure that the Toolchain name field contains GNU
Tools for ARM Embedded Processors (arm-none-eabi-gcc), otherwise select it from the drop-down
menu. Click on the Finish button.
Our test project is almost complete. We only need to modify one thing to make it work on the Nucleo.
However, before we complete the example, it is better to take a look at what has been generated by
the GNU ARM plug-in.
Figure 9 shows what appears in the Eclipse IDE after the project has been generated. The Project
Explorer view shows the project structure. This is the content of the first-level folders (going from
top to bottom):
Includes: this folder shows all folders that are part of the GCC Include Folders⁴.
src: this Eclipse folder contains the .c files⁵ that make up our application. One of these files is main.c,
which contains the int main(int argc, char* argv[]) routine.
system: this Eclipse folder contains header and source files of many relevant libraries (like, among
⁴Every C/C++ compiler needs to be aware of where to look for include files (files ending with .h). These folders are called
include folders and their path must be specified to GCC using the -I parameter. However, as we will see later, Eclipse is able to do
this for us automatically.
⁵The exact type and amount of files in this folder depends on the STM32 family. Do not worry if you see additional files than
the ones shown in Figure 9, and focus your attention exclusively on the main.c file.
Hello, Nucleo! 100
the other, the ST HAL and the CMSIS package). We will see them more in depth in the next chapter.
include: this folder contains the header files of our main application.
ldscripts: this folder contains some relevant files that make our application work on the MCU. These
are LD (the GNU Link eDitor) script files, and we will study them in depth in a following chapter.
As said before, we need to modify one more thing to make the example project work on our Nucleo
board. The GNU ARM plugin generates an example project that fits the Discovery hardware layout.
This means that the LED is routed to a different MCU I/O pin. We need to modify this.
How can we know to which pin the LED is connected? ST provides schematics⁶ of the Nucleo board.
Schematics are made using the Altium Designer CAD, a really expensive piece of software used in
the professional world. However, luckily for us, ST provides a convenient PDF with schematics.
Looking at page 4, we can see that the LED is connected to the PA5 pin ⁷, as shown in Figure 10.
⁶http://bit.ly/1FAVXSw
⁷Except for the Nucleo-F302RB, where LD2 is connected to PB13 port. More about this next.
Hello, Nucleo! 101
PA5 is shorthand for PIN5 of GPIOA port, which is the standard way to indicate a GPIO in the
STM32 world.
We can now proceed to modify the source code. Open the Include/BlinkLed.h and go to line 19.
Here we find the macro definition for the GPIO associated to the LED. We need to change the code
in the following way:
Filename: include/BlinkLed.h
BLINK_PORT_NUMBER defines the GPIO port (in our case GPIOA=0), and BLINK_PIN_NUMBER the pin
number.
Nucleo-F302R8 is the only Nucleo board that has a different hardware configuration regarding the
pin used for LED LD2, because it is connected to pin PB13, as you can see in schematics. This means
that the right pin configuration is:
Ok. We can proceed compiling the project. Go to menu Project->Build Project. After a while, we
should see something similar to this in the output console[ˆch3-flash-image-size].
Hello, Nucleo! 102
3.3.1 Windows
ST provides a really practical tool to flash firmware on the target board: ST-LINK Utility. We have
installed it in Chapter 2 and now we are going to use it. Launch the program (you will find it in the
Windows Start menu, under the STMicroelectronics folder) and connect your Nucleo to the PC
using the USB cable. Once Windows has identified the board, go on Target->Connect menu. After
a while you will see the content of flash memory, as shown in Figure 11.
Figure 11: the ST-LINK Utility interface once connected to the board
In the top-right side of the window you can also see a brief summary regarding your Nucleo board.
Ok, let us upload the example firmware to the board. Go to File->Open file… menu and select the
⁸This is the only time we will use different flashing procedure between the three Operating Systems. In chapter 5 we will setup
a cross-platform debugging environment.
Hello, Nucleo! 103
Pay attention to a behavior of ST-LINK Utility that causes a lot of headaches every time
someone starts working with it. If you change something in the firmware and recompile it,
ST-LINK Utility does not automatically reload the binary file. This means that the previous
version is still in memory, and it will continue to upload that version to the Nucleo MCU.
So, you need to manually reload the file every time it changes. You can do that by simply
clicking with the right mouse button on the file name tab, and choosing Open file entry,
as shown below (you may have to click on the file name with the left mouse button first,
to select it).
3.3.2 Linux
To flash the firmware on the target board with Linux, we have configured the QSTlink2 utility in
Chapter 2. But before we can flash the board, we need to convert the binary file from the ELF format
to the RAW binary format, which is the format accepted by the QSTlink2 tool to upload firmware
to the target board. We can set Eclipse to do it automatically for us.
Hello, Nucleo! 104
Go to Project->Properties menu, then go to C/C++ Build->Settings. Select the Cross ARM GNU
Create Flash Image->General entry and select the entry Raw binary in the Output file format
(-O) field, as shown in Figure 12. Click on the OK button and rebuild the project again.
Now, connect the Nucleo to the PC using an USB cable and launch the QSTlink2 tool. Click on
Hello, Nucleo! 105
the Connect button to start the connection with the ST-LINK interface. If the board is identified
correctly, QSTlink2 will show its target MCU, as shown in Figure 13. To flash the firmware, click on
the Send… button and select the file ∼/STM32Toolchain/projects/hello-nucleo/Debug/hello-
nucleo.bin.
Now the LD2 LED of your Nucleo board blinks⁹. Congratulations: welcome to the STM32 world ;-)
$ cd ~/STM32Toolchain/stlink
$ ./st-flash write ../projects/hello-nucleo/Debug/hello-nucleo.bin 0x08000000
where 0x0800 0000 is the starting address of FLASH memory, as we will see in the next chapter. If
the flashing procedure goes well⁷, you should see the following messages at command line:
⁹Unfortunately, I have to admit that it happens quite often that both QSTLink2 and stlink tools do not works very well. When
this happens, try to reset the board and repeat the upload procedure again. However, in chapter 5 we will install OpenOCD, a tool
that has become a sort of standard in the embedded world.
Hello, Nucleo! 106
Now the LD2 LED of your Nucleo board blinks. Congratulations: welcome to the STM32 world ;-)
Filename: src/main.c
¹⁰Experienced STM32 programmers know that it is improper to say that the main() function is the entry point of an STM32
application. The execution of the firmware begins much earlier, with the calling of some important setup routines that create the
execution environment for the firmware. However, from the application point of view, its start is inside the main() function. A
subsequent chapter will show in detail the bootstrap process of an STM32 microcontroller.
Hello, Nucleo! 107
Instructions at line 51, 52 and 71 are related to debugging¹¹ and we will see them in depth in Chapter
5. Function timer_start(); initializes the SysTick timer so that it fires an interrupt every 1ms. This
is used to compute delays, and we will study how it works in Chapter 7. The function blink_led_-
init(); initializes the GPIO pin PA5 to be an output GPIO. Finally, the infinite loop turns ON and
OFF the LED LD2, keeping it ON for 2/3 of second and OFF for 1/3 of second.
The only way to learn something in this field is to get your hand dirty writing code and
making a lot of mistakes. So, if you are new to the STM32 platform, it is a good idea to start
looking inside the code generated by the GNU ARM plugin, and trying to modify it.
For example, a good exercise is to modify the code so that the LED starts blinking when
the user button (the blue one) is pressed. A hint? The user button is connected to PC13 pin.
¹¹For the sake of completeness, they are tracing functions that use ARM semihosting, a feature allowing to execute code in the
host PC invoking it from the microcontroller - a sort of remote procedure call.
Hello, Nucleo! 108
Eclipse intermezzo
Eclipse allows us to easily navigate inside the source code, without jumping between source files
manually looking for where a function is defined. For example, suppose that we want to see how
the function timer_start() is coded. To go to its definition, highlight the function call, click with
the right mouse button and select Open declaration entry, as shown in the following image.
Sometimes, it happens that Eclipse makes a mess of its index files, and it is impossible to navigate
inside the source code. To address this issue, you can force Eclipse to rebuild its index going to
Project->C/C++ Index->Rebuild menu.
Another interesting Eclipse feature is the ability to expand complex macros. For example, click
with right mouse button on the BLINK_OFF_TICKS macro at line 67, and choose the entry Explore
macro expansion. The following contextual window will appear.
4. STM32CubeMX tool
STM32CubeMX¹ is the Swiss army knife of every STM32 developer, and it is a fundamental tool
especially if you are new to the STM32 platform. It is a quite complex piece of software distributed
freely by ST, and it is part of the STCube initiative², which aims to provide to developers with a
complete set of tools and libraries to speed up the development process.
Although there is a well-established group of people that still develops embedded software in pure
assembly code³, time is the most expensive things during project development nowadays, and it is
really important to receive as much help as possible for a quite complex hardware platform like the
STM32.
In this chapter we will see how this tool from ST works, and how to build Eclipse project from
scratch using the code generated by it. This will make GNU ARM plugin a less critical component for
project generation, allowing us to create better code and ready to be integrated with the STM32Cube
HAL. However, this chapter isn’t a substitute for the official ST documentation for CubeMX tool⁴,
a document made of more than 170 pages that explains in depth all its functionalities.
In addition to features related to the hardware, CubeMX is also able to deal with the following
software aspects:
¹STM32CubeMX name will be simplified in CubeMX in the rest of the book.
²http://bit.ly/1YKvl85
³Probably, one day someone will explain them that, except for really rare and specific cases, a modern compiler can generate
better assembly code from C than could be written directly in assembly by hand. However, we have to say that these habits are
limited to ultra low-cost 8-bit MCUs like PIC12 and similar.
⁴http://bit.ly/1O50wrp
109
STM32CubeMX tool 110
• Management of the ST HAL for the chosen MCU family (CubeF0, CubeF1, and so on).
• Additional software library functionalities we need in our project (FATFS, RTOS, etc.).
• The development environment we will use to build the firmware (IAR, TrueSTUDIO, and so
on).
CubeMX aims to be a complete project management tool. However, it has some limitations that
restrict its usage to the early stages of board and firmware development (more about this later).
We have already installed CubeMX in Chapter 2. If you still have not done it, it is strongly suggested
to refer to that chapter especially if you are a Linux or a Mac user.
Once CubeMX is launched, a nice welcome screen is presented (see Figure 1). Clicking on New
project will bring up the MCU and board selector dialog, as shown Figure 2.
STM32CubeMX tool 111
The dialog is a tab-based window, with two main tabs: MCU Selector and Board Selector.
The first tab allows to choose a microcontroller from the whole STM32 portfolio. Using the Series
combo box, we can filter all the MCUs belonging to a given series. The Lines combo box allows
to further filter the MCUs belonging to a sub-family (Value line, etc.). Packages combo box allows
to select all MCUs having the desired package. Clicking on the More Filters button we can show
additional fields limiting the search.
The Board Selector tab allows to filter among all the official ST development boards (see Figure
STM32CubeMX tool 112
3). There are three kinds of development boards to choose from: Nucleo, Discovery and EvalBoard,
which are the most complete (and expensive) development boards to experiment with an STM32
MCU. We are, obviously, interested to Nucleo boards. So, start by selecting the type of your Nucleo
board and click on the OK button.
In the Board Selector view there is a checkbox under the Vendor combo box. The label
says Initialize all IP with their default Mode. What does it mean? First of all, let us clarify
that IP does not mean Internet Protocol, but it is the acronym for Integrated Peripheral.
Checking that box causes that CubeMX will automatically generate the C initialization
code for all the peripherals available on the board and not only for those relevant to the
user application. For example, Nucleo boards have a USART (USART2) connected to the ST-
LINK interface, which maps it as a Virtual COM Port. Checking that box says to CubeMX
to generate all necessary code to initialize the USART.
This could seems a good feature to enable, but for novices it is best to leave that feature
disabled and to enable each peripheral by hand only when needed. This simplifies the
learning process and avoids wasting a lot of time trying to understand all at once the
generated code.
Once we have selected the MCU (or the development board) to work with, the main CubeMX
window appears, as shown in Figure 4. A nice graphical representation of the STM32 MCU
dominates the view. Even in this case we have a tabbed view. Let us see each tab more in depth.
STM32CubeMX tool 113
The Chip view allows to easily navigate inside the MCU configuration, and it is a really convenient
way to configure the microcontroller.
Pins⁵ colored in bright green are enabled. This means that CubeMX will generate the needed code
to configure that pin according its functionalities. For example, for pin PA5 CubeMX will generate
the C code needed to setup it as generic output pin⁶.
A pin is colored in orange when the corresponding peripheral is not enabled. For example, pins
PA2⁷ and PA3 are enabled and CubeMX will generate corresponding C code to initialize them, but the
associated peripherals (USART2) is not enabled and no setup code will be automatically generated.
Yellow pins are power source pins, and their configuration cannot be changed.
BOOT and RESET pins are colored in khaki, and their configuration cannot be changed.
A contextual tool-tip is showed moving the mouse pointer over the MCU pins (see Figure 5). For
example, contextual tool-tip for pin PB3 says to us that the signal is mapped to Serial Wire Debug
(SWD) interface and it acts as Serial Wire Output (SWO) pin. Moreover, the pin number (55) is also
shown.
⁵In this context, pin and signal can be used as synonyms.
⁶Except for Nucleo-F302, where the LD2 is connected to PB13 pin. More about this later in this chapter.
⁷The pin configurations shown in this section are referred to the STM32F401RE MCU.
STM32CubeMX tool 114
STM32 MCUs allow mapping a peripheral to different pins. For example, in an STM32F401xE MCU,
SPI2 MOSI signal can be mapped to pins PC2 or PB14. CubeMX makes it easy to see the allowed
alternatives with a Ctrl+click. If an alternate pin exists, it is shown in light blue (the alternative is
shown only if the pin is not in reset state - that is, it is enabled). For example, in Figure 6 we can
see that, if we do a Ctrl+click on PC2 pin, the PB14 signal is highlighted in blue. This comes really
handy during the layout of a board. If it is really hard to route a signal to that pin, or if that pin is
needed for some other functionality, an alternate pin may simplify the board.
In the same way, most of MCU pins can have alternate functionalities. A contextual menu is shown
when clicking on a pin. This allows us to select the function we are interested to enable for that
signal.
Such flexibility leads to the generation of conflicts between signal functions. CubeMX tries to resolve
these conflicts automatically, assigning the signal to another pin. Pinned signals are those pins whose
functionality is locked to a specific pin, preventing CubeMX to choose an alternate pin. When a
STM32CubeMX tool 115
conflict prevents a peripheral to be used, the pin mode in Chip View is disabled, and the pin is
colored in orange.
The IP tree pane provides a convenient way to enable/disable and to configure the desired peripherals
and software middleware. CubeMX shows the peripherals list in a smart way, using icons and
different colors, so that the user can quickly understand if the peripheral is available and what
configuration capabilities it has. Let us see them in depth.
• Case 1: indicates that the peripheral is available and currently disabled, and all its possible
modes can be used. For example, in case of I2C interface, all possible modes for this peripheral
are: I2C, SMBus-Alert-mode, SMBus-two-wire-interface (TWI).
• Case 2: shows that the peripheral is disabled due to a conflict with another peripheral. This
means that both the peripherals use the same GPIOs, and it is not possible to use them
simultaneously. Passing the mouse over it will show the other peripheral involved in conflict.
For example, for an STM32F401RE MCU it is impossible to use I2C2 and SWD debug pins at
the same time.
STM32CubeMX tool 116
• Case 3: indicates that the peripheral is available and currently disabled, but at least one
of its modes is not available due to a conflict with other peripherals. For example, in an
STM32F401RE MCU the fourth channel of TIM2 peripheral uses the PA2 GPIO, which is
the USART_RX signal of the USART2 peripheral. This means that you cannot use the TIM2
channel 4 as input capture while using the Nucleo VCP.
• Case 4: indicates that the peripheral is unavailable for the chosen package type (if you strongly
need that peripheral, you have to switch to another package type - usually one with more pins).
• Case 5: indicates that the peripheral is used and all its modes are available (refer to Case 7).
• Case 6: shows that the peripheral is used, but some of its modes or I/Os are not available (refer
to Case 3 and 8).
• Case 7: when all peripheral modes are available, all configuration options are shown in black.
• Case 8: when not all peripheral modes are available, unavailable configuration options are
shown with red background.
Clock view is the area where all configurations related to clocks management take place. Here we
can set both the main core and the peripherals clocks. All clock sources and PLLs configurations
are presented in a graphical way (see Figure 8). The first times the user see this view, he could be
STM32CubeMX tool 117
puzzled by the amount of configuration options. However, with a little bit of practice, this is the
simplest way to deal with the STM32 clock configuration (which is quite complex if compared to
8-bit MCUs).
If your board design needs an external source for the High Speed clock (HSE), the Low Speed clock
(LSE) or both, you have to first enable it in the Pinout view in the RCC section, as shown in Figure
9.
Once this is accomplished, you will be able to change clock sources in clock view.
Clock tree configuration will be explored in Chapter 10. To avoid confusion in this phase, leave all
parameters as automatically configured by CubeMX.
Overclocking
A common hacking practice is to overclock the MCU core, changing the PLL configuration
so that it can run at a higher frequency. This author strongly discourages this practice,
which not only could seriously damage the microcontroller, but it may result in abnormal
behavior difficult to debug.
Do not change anything unless you are absolutely sure of what you are doing.
Configuration options defined in this view impact the automatically generated C source code. A
good management of this CubeMX section allows to simplify a lot the development process related
to peripherals optimizations. We will analyze each configuration view when we will deal with each
type of peripheral.
• Create a new “universal” Eclipse project, ready to accept CubeMX auto-generated C code.
• Import the CubeMX generated files inside the Eclipse project.
• Configure the project, if needed.
The final result of this chapter will be another blinking application, but this time we will create
it using the most of the code coming from the latest STCube framework. This will also give us the
opportunity to start understanding the foundation blocks of the STCube Hardware Abstraction Layer
(HAL). Once we understand the steps explained here, we would be fully autonomous in setting up
any project for the STM32 platform.
STM32CubeMX tool 120
Once CubeMX has created the new project, go to Project->Settings… menu. The Project Settings
dialog appears, as shown in Figure 11.
In the Project Name field write the name you like for the project. For the Project Location field,
it is best to create a folder inside the ∼/STM32Toolchain⁸ folder (C:\STM32Toolchain for Windows
users). A good folder name could be ∼/STM32Toolchain/cubemxout. In the Toolchain/IDE field
select the SW4STM32 entry. Leave all the other fields as default.
⁸Once again, you are completely free to choose the preferred path for your workspace. Here, to simplify the instructions, all
path assumed relative to ∼/STM32Toolchain.
STM32CubeMX tool 121
Switch now to Code Generator tab, and select the options as shown in Figure 12. Click on the OK
button.
Now we are ready to generate the C initialization code for our Nucleo. Go to Project->Generate
Code menu. CubeMX may ask you to download the latest version of the STCube HAL framework
for your Nucleo (e.g., if you have a Nucleo-F401RE it will ask you to download STCube-F4 HAL). If
so, click on Yes button and wait for completion. After a while, you will find the C code inside the
∼/STM32Toolchain/cubemxout/<project-name> directory.
Before we continue with the Eclipse project creation, it is a good thing to take a look to the code
generated by CubeMX.
Figure 13: the generated code compared to the CMSIS architectural view
• HAL for Cortex-M processor registers, with standardized definitions for the SysTick, NVIC,
System Control Block registers, MPU registers, FPU registers, and core access functions.
• System exception names to interface to system exceptions without having compatibility
issues.
• Methods to organize header files that make it easy to learn new Cortex-M microcontroller
products and improve software portability. This includes naming conventions for device-
specific interrupts.
¹⁰You will find also the sub-folder SW4STM32. It contains the project file for the ACS6 IDE, which we cannot import in our
tool-chain. So, simply ignore it.
STM32CubeMX tool 123
• Methods for system initialization to be used by each MCU vendor. For example, the
standardized SystemInit() function is essential for configuring the clock system of the device.
• Intrinsic functions used to generate CPU instructions that are not supported by standard C
functions.
• A variable to determine the system clock frequency that simplifies the setup the SysTick timer.
The CMSIS-CORE pack is subdivided in several files in the project generated with CubeMX, as
shown in Figure 13:
• Include folder contains several core_<cpu>.h files (where <cpu> is replaced by cm0, cm3,
etc). These files define the core peripherals and provide helper functions that access the core
registers (SysTick, NVIC, ITM, DWT etc.). These files are generic for all Cortex-M based
MCUs.
• Device folder contains device specific informations for all STM32F/L devices (e.g., STM32F4),
such as interrupt numbers (IRQn) for all exceptions and device interrupts, definitions for the
Peripheral Access to all device peripherals (all data structures and the address mapping for de-
vice-specific peripherals) - file system_<device>.h. It also contains additional helper functions
to simplify peripherals programming. Moreover, there are also several startup_<device>.s
assembly files: these contain startup code and system configuration code (reset handler which
is executed after CPU reset, exception vectors of the Cortex-M Processor, interrupt vectors
that are device specific).
Finally, Inc and Src folders in the project root contain headers and source files of the skeleton
application generated by CubeMX, and the STM32xxxx_HAL_Driver folder, inside the Drivers one,
is the whole ST HAL for that microcontroller series.
In the second step fill the fields Processor core, Clock, Flash size and RAM size according your
Nucleo type (refer to Table 1 in Chapter 3 if you do not know them), and leave all other fields as
shown in Figure 14.
In the third step leave all fields as default, except for the last one: Vendor CMSIS name. That field
must have this pattern: <stm32family>xx. For example, for a Nucleo-F1 write stm32f1xx, or for a
Nucleo-L4 write stm32l4xx, as shown in Figure 15. Go ahead with the project wizard until it is
completed.
Once again, we have used the GNU ARM Eclipse plug-in to generate the project, but this time there
are some files we do not need, since we will use the ones generated by CubeMX tool. Figure 16
shows the Eclipse project and five highlighted files in the Project Explorer view. You can safely
delete them hitting the delete button on your keyboard.
STM32CubeMX tool 125
We need to change one more thing to the files generated by GNU ARM plugin. Opening the file
ldscripts/mem.ld we can see that the origin of FLASH memory is wrong because, as we have seen
in Chapter 1, the flash memory is mapped from the address 0x0800 0000 for all STM32 devices. So,
STM32CubeMX tool 126
ensure that memory origin definitions¹¹ of your .ld file are equal to the following ones:
...
MEMORY
{
FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 512K
RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 96K
...
Starting from this point, you will find several paths and filenames related to F4 MCUs.
For example, a path having this structure ‘Drivers/STM32F4xx_HAL_Driver/Inc’ or a file
name like this one ‘system_stm32f4xx.c’. Please, note that you have to substitute the F4
with the STM32 family of your MCU (F0, F1, F2, F3, F7, L0, L1, L4). Pay attention if the
path or the filename has a capital letter (F4) or not (f4).
Please, take also note that starting from this paragraph we will use the Courier font
to indicate filesystem paths (e.g. Drivers/STM32F4xx_HAL_Driver/Inc); instead, we will
indicate Eclipse folders in bold (e.g. system/include/stm32f4xx). This convention is valid
in the whole book.
Now, go inside the Drivers/STM32F4xx_HAL_Driver/Inc filesystem folder and import all its content
inside the system/include/stm32f4xx Eclipse folder. In the same way, go inside the Driver-
s/STM32F4xx_HAL_Driver/Src filesystem folder and import all its content inside the system/sr-
c/stm32f4xx Eclipse folder. We have successfully imported the ST HAL in our project. It is now the
turn of CMSIS-CORE package.
¹¹Clearly, the memory length depends on the specific MCU. Always double check that they correspond with the hard-
ware specifications of your MCU. If they do not match, strange faults may occur at startup (we will learn how to
deal with hardware faults in a subsequent chapter). Another quick solution is offered to us by CubeMX. Opening the
∼/STM32Toolchain/cubemxout/<project-name>/SW4STM32/<project-name> Configuration folder you will find a file ending
with .ld. It is the linker script containing the right memory origin definitions for your MCU. You can simply copy the MEMORY
section contained in that file and past it into the ldscripts/mem.ld file.
STM32CubeMX tool 127
First, we start importing the official CMSIS-CORE package. Go inside the Drivers/CMSIS/Include
filesystem folder and drag all its content inside the system/include/cmsis Eclipse folder. When
asked, answer Yes to replace existing files.
Now we have to import the specific device files for the CMSIS-CORE.
Please, take note that the GNU ARM Plugin already embeds the CMSIS-CORE inside the
generated project, but it is an old version (3.20). We are replacing it with the latest official
version shipped by ST (4.00 at the time of writing this chapter).
You will find several .s files inside this folder. Select the one corresponding to the MCU of
your Nucleo. For example, for a Nucleo-F401RE select the file startup_stm32f401xe.s.
Read carefully
.s files are assembly files that need to be processed directly by the GNU Assembler (AS).
However, Eclipse CDT is programmed to expect the assembly file ending with .S (capital
S). So rename the file from startup_stm32f4xxxx.s to startup_stm32f4xxxx.S. To do this,
right click on the file in Eclipse and choose Rename entry.
1. First, we have created an Empty ARM C/C++ project, using the specs of our MCU.
2. Then we have deleted some files generated by GNU ARM plugin and imported those generated
by CubeMX; we have also updated the FLASH address origin in mem.ld file.
3. Then we have imported the ST HAL for our MCU and the latest CMSIS-CORE package.
4. Then we have imported the device adapter file (system_stm32f4xx{.h,.c} and stm32f4xx.h
files) for the CMSIS-CORE package.
5. Finally we have added the right startup assembly file startup_stm32f4xxxx.s for our MCU,
and renamed it in startup_stm32f4xxxx.S (ending with capital .S).
Table 2 summarizes the files and folders that have to be imported in the Eclipse project. Paths on
the left are filesystem paths (relative to the CubeMX project output directory); paths on the right
are the corresponding Eclipse folder.
STM32CubeMX tool 128
Table 2: files and folders that have to be imported in the corresponding Eclipse folders
I am aware of the fact that this procedures seems cumbersome, but trust me: once you
get familiar with this procedure, you will be able to create a project for every STM32
MCU, including the latest STM32F7 and future STM32 microcontrollers. However, I am
considering to write a tool to automate this task. Stay tuned.
If you try to compile the project, you will see a lot of errors and warnings. To complete its
configuration, we still need another two steps.
The ST HAL is designed to work with all MCUs of a given series (F0, F1, etc.). Several conditional
macros are used inside the HAL to discriminate the MCU type. So we have to specify the MCU
equipping our Nucleo.
Go to Project->Properties menu, and then in C/C++ Build->Settings section. Select the Cross
ARM C Compiler->Preprocessor section and then click on the Add… icon (the one circled in red
STM32CubeMX tool 129
in Figure 17). Use the macro corresponding to your Nucleo (refer to Table 3). For example, for a
Nucleo-F401RE, use the macro STM32F401xE.
If you are using a custom board with a microcontroller not listed in Table 3, you can find
the macro for your MCU inside the file system/include/cmsis/stm32XXxx.h.
Eclipse intermezzo
You might have noticed that every time you change something to the project settings, a lot of time
is required to compile the whole source tree. This happens because Eclipse recompiles all the HAL
source files, contained in system/src/stm32f4xx/. This is really annoying, and you can speed up
the compile time by disabling all those files not needed to your application. For example, if your
board does not need to use I2C devices, you can safely disable the compilation of stm32f4xx_-
hal_i2c_ex.c and stm32f4xx_hal_i2c.c files by right clicking on them and then choosing Resource
configuration->Exclude from build, and selecting all the project configurations defined.
Another solution to the same problem is to configure CubeMX so that it adds to the project only
necessary library files. To do it, choose in CubeMX project settings the entry Copy only the
necessary library files, as shown below.
However, keep in mind that excluding the unused HAL files from compilation will not impact on the size of
binary file: any modern liker is able to automatically exclude from generation of the absolute file (the binary file we
will load on our board) all those relocatable files that contains unused code and data (more about the linking process
of an STM32 binary in a following chapter).
This will cause that if you need to use an additional peripheral later, you will have to import the corresponding
HAL files manually.
STM32CubeMX tool 131
Read Carefully
The tool automatically deletes all unneeded existing project files. This includes also the
main.c file and all other files contained in src and include Eclipse Folder. For this reason,
do not execute the CubeMXImporter on an existing project. Always execute it on a fresh
new Eclipse project generated with the GNU ARM Eclipse plugin.
This script works well only if you have generated a CubeMX project for the
SW4STM32 (aka AC6) tool-chain.
CubeMXImporter relies on Python 2.7.x and the lxml library. Here you can find the installation
instructions for Windows, Linux and Mac OSX.
Windows
In Windows we have to install first the latest Python 2.7 release. We can download
it directly from this link¹³. Once downloaded, launch the installer and ensure that all
installation options are enabled, as shown in Figure 18. When the installation is completed,
you can install a pre-compiled lxml package, downloading it from here¹⁴. ####Linux and
MacOS X On these two Operating Systems, Python 2.7 is installed by default. So, we only
need to install the lxml library (if it is not already installed). We can simply install it using
the pip command:
¹²https://github.com/cnoviello/CubeMXImporter
¹³http://bit.ly/1MjXoGb
¹⁴http://bit.ly/1P4lxSO
STM32CubeMX tool 132
Figure 18: all installation options have to be enabled when installing Python in Windows
Once we have installed Python and the lxml library, we can download the CubeMXImporter
script from github and place it in a convenient place (I assume that it is downloaded inside the
∼/STM32Toolchain/CubeMXImporter folder).
Now, close the Eclipse project (do not skip this step) and execute the CubeMXImporter at terminal
console in the following way:
After few seconds, the CubeMX project is correctly imported. Now, open again the Eclipse project
and perform a refresh of the source tree, clicking with the right mouse button on the project root
and selecting the Refresh entry.
You can proceed building the project.
• store the template project in a place separated from the Eclipse workspace;
• import it inside the workspace when you need to start a new project (Go to File->Import…
and choose the entry Import Existing Projects into Workspace);
• open the project and rename it as you want by clicking with the right mouse button on the
project root and choosing the entry Rename….
STM32CubeMX tool 133
We are now going to customize its main.c to do something useful with our Nucleo. But, before
changing application files, let us have a look to them.
The first important file we are going to analyze is include/stm32XXxx_hal_conf.h. This is the
file where the HAL configurations are translated into C code, using several macro definitions. These
macros are used to “instruct” the HAL about processor capabilities. You will find a lot of commented
macros, as shown below:
Filename: include/stm32XXxx_hal_conf.h
87 //#define HAL_QSPI_MODULE_ENABLED
88 //#define HAL_CEC_MODULE_ENABLED
89 //#define HAL_FMPI2C_MODULE_ENABLED
90 //#define HAL_SPDIFRX_MODULE_ENABLED
91 //#define HAL_DFSDM_MODULE_ENABLED
92 //#define HAL_LPTIM_MODULE_ENABLED
93 #define HAL_GPIO_MODULE_ENABLED
94 #define HAL_DMA_MODULE_ENABLED
95 #define HAL_RCC_MODULE_ENABLED
96 #define HAL_FLASH_MODULE_ENABLED
97 #define HAL_PWR_MODULE_ENABLED
98 #define HAL_CORTEX_MODULE_ENABLED
These macros are used to selectively include HAL modules at compile time. When you need a
module, you can simply uncomment the corresponding macro. We will have the opportunity to
see all the other macros defined in this file throughout the rest of the book.
The file src/stm32f4xx_it.c is another fundamental source file. It is where all the Interrupt Service
Routines (ISR) generated by CubeMX are stored. Let us see its content.
Filename: src/stm32XXxx_it.c
56 HAL_IncTick();
57 HAL_SYSTICK_IRQHandler();
58 /* USER CODE BEGIN SysTick_IRQn 1 */
59
60 /* USER CODE END SysTick_IRQn 1 */
61 }
Given the CubeMX configuration we have chosen, the file contains essentially only the definition of
the function void SysTick_Handler(void), which is declared inside the file system/include/cor-
texm/ExceptionHandlers.h. SysTick_Handler() is the ISR of the SysTick timer, that is the routine
that is invoked when the SysTick timer reaches 0. But where is this ISR invoked?
The answer to this question gives us the opportunity to start dealing with one of the most interesting
features of Cortex-M processors: the Nested Vectored Interrupt Controller (NVIC). Table 1 in Chapter
1 shows the Cortex-M exception types. If you remember, we have said that in Cortex-M CPU
interrupts are a special type of exceptions. Cortex-M defines the SysTick_Handler to be the fifteenth
exception in the NVIC vector array. But where is this array defined? In the previous paragraph we
have added a special file written in assembly, that we have called startup file. Opening this file we
can see the minimal vector table for a Cortex processor, about at line 140, as shown below:
Filename: system/src/cmsis/startup_stm32f401xe.S
142 g_pfnVectors:
143 .word _estack
144 .word Reset_Handler
145 .word NMI_Handler
146 .word HardFault_Handler
147 .word MemManage_Handler /* Not available in Cortex-M0/0+ */
148 .word BusFault_Handler /* Not available in Cortex-M0/0+ */
149 .word UsageFault_Handler /* Not available in Cortex-M0/0+ */
150 .word 0
151 .word 0
152 .word 0
153 .word 0
154 .word SVC_Handler
155 .word DebugMon_Handler /* Not available in Cortex-M0/0+ */
156 .word 0
157 .word PendSV_Handler
158 .word SysTick_Handler
Line 158 is where the SysTick_Handler() is defined as ISR for the SysTick timer.
STM32CubeMX tool 135
Please, consider that startup files have minor modifications between the ST HALs. Line
numbers reported here could differ a little bit from the startup file for your MCU. Moreover,
the MemManage Fault, Bus Fault, Usage Fault and Debug Monitor exceptions are not
available (and hence the corresponding vector entry is RESERVED - see the Table 1 in
Chapter 1) in Cortex-M0/0+ based processors. However, the first fifteen exceptions in NVIC
are always the same for all Cortex-M0/0+ based processors and all Cortex-M3/4/7 based
MCUs.
Another really important file to analyze is the src/stm32XXxx_hal_msp.c. First of all, it is important
to clarify the meaning of “MSP”. It stands for MCU Support Package, and it defines all the
initialization functions used to configure the on-chip peripherals according to the user configuration
(pin allocation, enabling of clock, use of DMA and Interrupts). Let us explain this in depth with an
example. A peripheral is essentially composed of two things: the peripherals itself (for example, the
SPI2 interface) and the hardware pins associated with this peripheral.
Figure 19: the relation between MSP files and the HAL
The ST HAL is designed so that the SPI module of the HAL is generic and abstracts from the specific
I/O settings, which may differ due to the MCU package and the user-defined hardware configuration.
So, ST developers have leaved to the user the responsibility to “fill” this piece of the HAL with the
code necessary to configure the peripheral, using a sort of callback routines, and this code resides
inside the src/stm32XXxx_hal_msp.c file (see Figure 19).
Let us open the src/stm32XXxx_hal_msp.c file. Here we can find the function void HAL_-
MspInit(void):
Filename: src/ch4-stm32XXxx_hal_msp.c
44 void HAL_MspInit(void)
45 {
46 /* USER CODE BEGIN MspInit 0 */
47
48 /* USER CODE END MspInit 0 */
49
50 HAL_NVIC_SetPriorityGrouping(NVIC_PRIORITYGROUP_0);
51
52 /* System interrupt init*/
STM32CubeMX tool 136
HAL_MspInit(void) is called inside the function HAL_Init(), which is in turn called in the main.c
file as we will see soon. The function simply defines the priority of SysTick_IRQn exception, the
one handled by the SysTick_Handler() ISR. The code assigns the highest user defined priority (the
lower the number, the higher is the priority).
The last file that remains to analyze is src/main.c. It essentially contains three routines: System-
Clock_Config(void), MX_GPIO_Init(void) and int main(void).
The first function is used to initialize core and peripheral clocks. Its explanation is outside the scope
of this chapter, but its code is not so much complicated to understand. MX_GPIO_Init(void) is the
function that configures the GPIO. Chapter 6 will explain this matter in depth.
Finally, we have the main(void) function, as shown below.
Filename: src/main.c
60 int main(void)
61 {
62 /* USER CODE BEGIN 1 */
63
64 /* USER CODE END 1 */
65
66 /* MCU Configuration----------------------------------------------------------*/
67 /* Reset of all peripherals, Initializes the Flash interface and the Systick. */
68 HAL_Init();
69 /* Configure the system clock */
70 SystemClock_Config();
71 /* Initialize all configured peripherals */
72 MX_GPIO_Init();
73
74 /* USER CODE BEGIN 2 */
75
76 /* USER CODE END 2 */
77
78 /* Infinite loop */
79 /* USER CODE BEGIN WHILE */
80 while (1)
81 {
82 /* USER CODE END WHILE */
STM32CubeMX tool 137
83
84 /* USER CODE BEGIN 3 */
85 }
86 /* USER CODE END 3 */
87 }
88
89 /** System Clock Configuration
90 */
91 void SystemClock_Config(void)
92 {
The code is really self-explaining. First, the HAL is initialized by calling the function HAL_Init().
Don’t forget that this causes the function HAL_MSP_Init() to be automatically called by the HAL.
Then, clocks and GPIOs are initialized. Finally, the application enters an infinite loop: that is the
place where our code must be placed.
You will have noticed that the code generated by CubeMX is full of these commented
regions:
What are those comments for? CubeMX is designed so that if you change the hardware
configuration you can regenerate the project code without losing the pieces of code you
have added. Placing your code inside those “guarded regions” should guarantee that you
will not lose your work. However, I have to admit that CubeMX often makes a mess
with generated files, and the user code goes lost. So, I suggest always generating another
separated project and doing a copy and paste of the changed code inside the application
files. This also gives you the full control over your code.
Filename: src/main.c
72 while (1) {
73 if(HAL_GPIO_ReadPin(B1_GPIO_Port, B1_Pin) == GPIO_PIN_RESET) {
74 while(1) {
75 HAL_GPIO_TogglePin(LD2_GPIO_Port, LD2_Pin);
76 HAL_Delay(500);
77 }
78 }
79 }
Why do we have to check when the PC13 goes low (that is HAL_GPIO_ReadPin() returns
GPIO_PIN_RESET state) to detect that the button was pressed?
The answer comes from the Nucleo schematics. Looking below, we can see that one side
of the button is connected to the ground, and resistor R30 pulls up the MCU pin when the
button is not pressed.
Now compile and try out the program on your Nucleo board!
Figure 20: The content of the GitHub repository containing all the book examples
The examples are divided for each Nucleo model, as you can see in Figure 20. You can clone the
whole repository using git command:
or you can download only the repository content by clicking on the Download ZIP button
(highlighted in red in Figure 16). Now you have to import the Eclipse project for your Nucleo into
the Eclipse workspace.
Open Eclipse and go to File->Import…. The Import dialog appears. Select the entry General-
>Existing Project into Workspace and click on the Next button. Now browse to the folder
containing the example projects clicking on the Browse button. Once selected the folder, a list
of the contained projects appear. Select the project you are interested in and check the entry Copy
projects into workspace as shown in Figure 21 and click the Finish button.
STM32CubeMX tool 140
Now you can see all imported projects inside the Project Explorer pane. Close the projects you
are not interested in. For example, if your Nucleo is based on an STM32F030 MCU, than close all
projects except the nucleo-F030R8 one¹⁶ (or you can simply import only projects that fits your
Nucleo boards).
Each project contains all the examples shown in this book. This is done using different build
configurations for each type of Nucleo. Build configurations is a feature that all modern IDEs
¹⁶You can do this simply by clicking with the right mouse button on the project you are interested in (in our example case, the
stm32nucleo-F0) and select the entry Close Unrelated Projects.
STM32CubeMX tool 141
support. It allows having several project configurations inside the same project. Every Eclipse project
has at least two build configurations: Debug and Release. The former is used to generate a binary
file suitable to be debugged. The latter is used to generate optimized firmware for production.
To select the configuration for your Nucleo go to Project->Build Configurations->Set Active menu
and choose the corresponding configuration, or click the down arrow close the build icon, as shown
in Figure 22. Now you can compile the whole project. At the end, you will find the binary file of
your firmware inside the folder ∼/STM32Toolchain/projects/nucleo-XX/CHx-EXx folder.
5. Introduction to debugging
Coding is all about debugging, said a friend of mine one day. And this is dramatically true. We can
do all the best writing really great code, but sooner or later we have to deal with software bugs
(hardware bugs are another terrible beast to fight). And a good debugging of embedded software is
all about to be a happy embedded developer.
In this chapter we will start analyzing an important debugging tool: OpenOCD. It has become a
sort of standard in the embedded development world, and thanks to the fact that many companies
(including ST) are officially supporting its development, OpenOCD is facing a rapid growth. Every
new release includes the support for tens of microcontrollers and development boards. Moreover,
being portable among the three major Operating Systems (Windows, Linux and Mac OS), it allows
us to use one unique and consistent tool to debug examples in this book.
This chapter also covers another important debugging mechanism: ARM semi-hosting. It is a way to
communicate input/output requests from application code to a host computer running a debugger
and it is extremely useful to execute functions that would be too complicated (or impossible due to
the lack of some hardware features) to execute on the target microcontroller.
This chapter is a preliminary view of the debugging process, which would require a separated book
even for simpler architectures like the STM32. A subsequent chapter will give a close look to other
debugging tools, and it will focus on Cortex-M exception mechanism, which is a distinctive feature
of this platform.
142
Introduction to debugging 143
OpenOCD is designed to be a generic tool able to work with tens of hardware debugger, using several
transport protocols. This requires a way to configure how to interface the specific debugger, and this
is done through the use of script files. OpenOCD uses an extended definition of Jim-TCL, which in
turn is a subset of the TCL programming language.
Figure 1 shows a typical debugging environment for the Nucleo board. Here we have the hardware
part, composed by a Nucleo with its integrated ST-LINK interface, and OpenOCD interacting
with the ST-LINK debugger using libusb, or any API-compatible library able to allow user-space
applications to interface USB devices. OpenOCD also provides needed drivers to interact with the
internal STM32 flash memory³ and the ST-LINK protocol. So it is instructed about the specific
hardware under debugging (and the used debugger) through configuration files.
Once OpenOCD has established the connection with the board to debug, it provides two ways to
communicate with the developer. The first one is through a local telnet connection on the port
4444. OpenOCD provides a convenient shell that is used to send commands to it and to receive
information about the board under debugging. The second option is offered by using it as remote
server for GDB. OpenOCD also implements the GDB remote protocol and it is used as “mediator”
component between GDB and the hardware. This allows us to debug the firmware using GDB and,
more important, using Eclipse as graphical debugging environment.
The instructions to start OpenOCD are different between Windows and UNIX like systems. So, jump
to the paragraph that fits your OS.
Open the Windows Command Line tool⁴ and go inside the C:\STM32Toolchain\openocd\scripts
folder and execute the following command:
$ cd C:\STM32Toolchain\openocd\scripts
$ ..\bin\openocd.exe -f board\<nucleo_conf_file.cfg>
where <nucleo_conf_file.cfg> must be substituted with the config file that fits your Nucleo board,
according to Table 1⁵. For example, if your Nucleo is the Nucleo-F401RE, then the proper config file
to pass to OpenOCD is st_nucleo_f4.cfg.
If everything went the right way, you should see messages similar to those appearing in Figure 2.
⁴It is strongly suggested to use a decent terminal emulator like ConEmu(https://conemu.github.io/) or similar.
⁵OpenOCD 0.9.0 still does not provide full support to all types of Nucleo boards, but the community is working hard on this
and in the next main release the support will be completed. However, you can use alternative configuration files to work with your
Nucleo at the time of writing this chapter.
Introduction to debugging 145
Figure 2: what appears on the command line prompt when OpenOCD starts correctly
At the same time, the LED LD1 on the Nucleo board should start blinking GREEN and RED
alternatively. Now we can jump to the next paragraph.
Linux and MacOS X users share the same instructions. Go inside the ∼/STM32Toolchain/openocd/tcl
folder and execute the following command:
$ cd ~/STM32Toolchain/openocd/tcl
$ ../src/openocd -f board/<nucleo_conf_file.cfg>
where <nucleo_conf_file.cfg> must be substituted with the config file that fits your Nucleo board,
according to Table 1. For example, if your Nucleo is the Nucleo-F401RE, then the proper config file
to pass to OpenOCD is st_nucleo_f4.cfg.
If everything went the right way, you should see messages similar to those appearing in Figure
2. At the same time, the LED LD1 on the Nucleo board should start blinking GREEN and RED
alternatively. Now we can jump to the next paragraph.
Introduction to debugging 146
This happens because a wrong version of libusb is used to interface the ST-LINK Debug Interface.
To solve this, download the Zadig utility for your Windows version. Launch the Zadig tool
ensuring that your Nucleo board is plugged to the USB port, and go to the Option->List All
Devices menu. After a while the ST-LINK Debug (Interface 0) entry should appear inside the
device list combo box. If the installed driver is not the WinUSB one, then select it and click on
Reinstall Driver button, as shown below.
http://zadig.akeo.ie/
Introduction to debugging 147
To access to the list of supported commands, we can type help. The list is quite huge, and its
content is outside of the scope of this book (the official OpenOCD document is a good place to
start understanding what those commands are used for). Here, we will simply see how to flash the
firmware.
Before we can upload a firmware to the target MCU of our Nucleo, we have to halt the MCU. This
is done issuing a reset init command:
OpenOCD says to us that the micro is now halted and we can proceed to upload the firmware using
the flash write_image command:
⁶Daemon is the way in UNIX to name those programs that works like a service. For example, a hTTP server or an FTP server
is called a daemon in UNIX. In the Windows world these kind of programs are called services.
⁷Starting from Windows 7, telnet is an optional component to install. However, it is strongly suggested to use a more evolute
telnet client like putty (http://bit.ly/1jsQjnt).
⁸The default port can be changed issuing a telnet_port command inside the board configuration file. This can be useful if we
are debugging two different boards using two OpenOCD sessions, as we will see next.
Introduction to debugging 148
where <path to the .elf file> is the full path to the binary file (it is usually stored inside the
Debug subdirectory in the Eclipse project folder).
To start running our firmware we can simply type the reset command to the OpenOCD command
line.
There are other few OpenOCD commands that may be useful during firmware debugging, especially
when dealing with hardware faults. The reg commands shows the current status of all Cortex-M
core registries when the target MCU is halted:
Another group of useful commands are md[whb] to read a word, half-word and byte respectively.
For example, the command:
reads 32 bit (a word) from the address 0x8000 000. The commands mw[whb] are the equivalent
commands to store data in a given memory location.
Now you can close the OpenOCD daemon sending the shutdown command to the telnet console.
This will also close the telnet session.
Eclipse is a generic and high configurable IDE, and it allows to create configurations that easily
integrate external tools like OpenOCD in its development life-cycle. The process we are going to
accomplish here is essentially to create a debug configuration. There are at least three ways to
integrate OpenOCD in Eclipse, but only one is probably the more convenient way when we deal
with the ST-LINK debugger.
We will configure OpenOCD as external debugging tool that we execute only once and leave as
daemon process, like we have done in the previous paragraph executing it from command line
prompt. The next step is to create a GDB debug configuration that instructs GDB to connect to
OpenOCD port 3333 and use it as GDB server.
First, ensure that you have a project opened in Eclipse. Then, go to Run->External Tools->External
Tools Configurations… menu. The External Tools Configurations dialog appears. Highlight the
Program entry in the list view on the left and click on the New icon (the one circled in red in
Figure 3). Now, fill the following fields in this way:
• Name: write the name you like for this configuration; it is suggested to use OpenOCD FX,
where FX is the STM32 family of your Nucleo board (F0, F1, and so on).
• Location: choose the location of the OpenOCD executable (C:\STM32Toolchain\openocd\bin\
openocd.exe for Windows users, ∼/STM32Toolchain/openocd/src/openocd for Linux and
Mac OS users).
• Worikng directory: choose the location of the OpenOCD scripts directory
(C:\STM32Toolchain\openocd\scripts for Windows users, ∼/STM32Toolchain/openocd/tcl
for Linux and Mac OS users).
• Arguments: write the command line arguments for OpenOCD, that is “-f board\<nucleo_-
conf_file.cfg>” for Windows users and “-f board/<nucleo_conf_file.cfg>” for Linux
Introduction to debugging 150
and Mac OS users. <nucleo_conf_file.cfg> must be substituted with the config file that
fits your Nucleo board, according to Table 1.
When completed, click on the Apply button and than on the Close one. To avoid mistakes that could
cause confusion, Figure 4 shows how to fill the fields on Windows and Figure 5 on a UNIX-like
system (arrange the home directory accordingly).
Figure 5: how to fill the External Tools Configurations fields on UNIX systems
To launch OpenOCD now you can simply go to Run->External Tools menu and choose the
configuration you have created. If everything went the right way, you should see the classical
OpenOCD messages inside the Eclipse Console, as shown in Figure 6. At the same time, the LED
LD1 on the Nucleo board should start blinking GREEN and RED alternatively.
Now we are ready to create a Debug Configuration to use GDB in conjunction with OpenOCD. This
operation must be repeated every time we create a new project.
Introduction to debugging 152
Eclipse fills automatically all the needed fields in the Main tab. However, if you are using a project
with several build configurations, you need to click on the Search Project button and choose the
ELF file for the active build configuration.
Introduction to debugging 153
Next, go in the Debugger tab and uncheck the entry Start OpenOCD locally, since we have created
the specific OpenOCD external tool configuration. Ensure that all other fields are equal to the ones
shown in Figure 8.
Introduction to debugging 154
Now, go in the Startup section and leave all options as default but do not forget to add the OpenOCD
command set remotetimeout 20 as shown in Figure 9.
Finally, go in the Common section and check the option Shared file¹⁰ in Save as frame box and
check the entry Debug in Display in favorites menu frame box, as shown in Figure 10.
Click on the Apply button and then on the Close one. Now we are ready to start debugging.
To start a new debug session using the debug configuration made earlier, you can click on the arrow
near the Debug icon on the Eclipse toolbar and choose the debug configuration, as shown in Figure
11. Eclipse will ask you if you want to switch to the Debug Perspective. Click on the Yes button (it
is strongly suggested to flag the Remember my decision checkbox). Eclipse switches to the Debug
Perspective, as shown in Figure 12.
Let us see what each view is used for. The top-left view is called Debug and it shows all the
running debug activities. This is a tree-view, and the first entry represents the OpenOCD process
launched using the external debug configuration. We can eventually stop the execution of OpenOCD
Introduction to debugging 157
highlighting the executable program and clicking on the Terminate icon on the Eclipse toolbar, as
shown in Figure 13.
The second activity showed in the Debug view represents the GDB process. This activity is really
useful, because when the program is halted the complete call stack is shown here and it offers a
quick way to navigate inside the call stack.
The top-right view contains several sub-panes. The Variables one offers the ability to inspect the
content of variables defined in the current stack frame (that is, the selected procedure in the call
stack). Clicking on an inspected variable with the right button of mouse, we can further customize
the way the variable is shown. For example, we can change its numeric representation, from decimal
Introduction to debugging 158
(the default one) to hexadecimal or binary form. We can also cast it to a different datatype (this is
really useful when we are dealing with raw amount of data that we know to be of a given type - for
example, a bunch of bytes coming from a stream file). We can also go to the memory address where
the variable is stored clicking on the View Memory… entry in the contextual menu.
The Breakpoint pane lists all the used breakpoints in the application. A breakpoint is a hardware
primitive that allows to stop the execution of the firmware when the Program Counter(PC) reaches
a given instruction. When this happens, the debugger is warned and Eclipse will show the context
of the halted instruction. Every Cortex-M base MCU has a limited number of hardware breakpoints.
Table 2 summarizes the maximum breakpoints and watchpoints¹¹ for a given Cortex-M family.
Eclipse allows to easily setup breakpoints inside the code from the editor view in the center of Debug
perspective. To place a breakpoint, simply double-click on the blue stripe on the left of the editor,
near to the instruction where we want to halt the MCU execution. A blue bullet will appear, as
shown in Figure 15.
When the program counter reaches the first assembly instruction constituting to that line of code,
the execution is halted and Eclipse shows the corresponding line of code as shown in Figure 12.
Once we have inspected the code, we have several options to resume the execution.
¹¹A watchpoint, indeed, is a more advanced debugging primitive that allows to define conditional breakpoints, that is the MCU
halts its execution only if a variable satisfies an expression (e.g. var == 10). We will analyze watchpoints in a subsequent chapter.
Introduction to debugging 159
Figure 16 shows the Eclipse debug toolbar. The highlighted icons allow to control the debug process.
Let us see each of them in depth.
• Skip all breakpoints: this toggle icon allows to temporarily ignore all the breakpoint used.
This allows to run the firmware without interruption. We can resume breakpoints by
deactivating the icon.
• Resume execution: this icon restarts the execution of the firmware from the current PC. The
adjacent icon, the pause, will stop the execution on request.
• Stop debug: this icon causes the end of the debug session. GDB is terminated and the target
board is halted.
• Step into routine: this icon is the first one of two icons used to do step-by-step debugging.
When we execute the firmware line-by-line, it could be important to enter inside a called
routine. This icon allows to do this, otherwise the next icon is what needed to execute the
next instruction inside the current stack frame.
• Step over: the next icon of the debug toolbar has a counterintuitive name. It is called step
over, and its name might suggest “skip the next instruction” (that is, go over). But this icon
is the one used to execute the next instruction. Its name comes from the fact that, unlike the
previous icon, it executes a called routine without entering inside it.
• Reset MCU: this icon is used to do a soft reset of MCU, without stopping the debug and
relaunch it again.
Finally, another interesting pane of that view is the Registers one. It displays the content of all
Cortex-M registers and it is the equivalent of the reg OpenOCD command we have seen before. It
Introduction to debugging 160
can be really useful to understand the current state of the Cortex-M core. In a subsequent chapter
about debugging we will see how to deal with Cortex-M exceptions and we will learn how to
interpret the content of some important Cortex-M registers.
Go to File->New->C Project menu. Select the Hello World ARM Cortex-M C/C++ project type and
choose the project name you like. In the next step, compile the Cortex-M core related fields according
your target board. Choose “Freestanding (no POSIX system calls)” for the field Use system calls and
“Semihosting DEBUG channel” for the field Trace output. Continue with the project wizard until
it finishes. Next, import the ST HAL and project skeleton from CubeMX as described in Chapter 4.
Now we have a project ready to use semihosting. The tracing routines are available inside the
system/src/diag/Trace.c file. They are:
Filename: src/main-ex1.c
34 #include "stm32f4xx_hal.h"
35 #include "diag/Trace.h"
36
37 void SystemClock_Config(void);
38 static void MX_GPIO_Init(void);
39
40 int main(void)
41 {
42 char msg[] = "Hello STM32 lovers!\n";
43
44 HAL_Init();
45 SystemClock_Config();
46 MX_GPIO_Init();
47
48 trace_printf(msg);
49
50 while(1) {
51 HAL_GPIO_TogglePin(GPIOA, GPIO_PIN_5);
52 HAL_Delay(500);
53 }
54 }
First of all, to use tracing routines we have to correctly import the Trace.h header file, as done at
line 35. Next, at line 48, we call the trace_printf() function passing a string to print. The rest of
the main() simply blinks the Nucleo LD2 for ever.
Read carefully
Semihosting implementation in OpenOCD is designed so that every string must be termi-
nated with the newline character (\n) before the string appears on the OpenOCD console.
This is a really common error, and it leads to a lot of frustration the first times programmers
start using it. Never forget to terminate every string passed to trace_printf() or the C
printf() routine with the (\n).
To use semihosting we need one more important step: we have to instruct OpenOCD to enable
it. Create a new Debug configuration as shown in the previous paragraphs, but ensure that in the
Startup section the entry Enable ARM semihosting is checked, as shown in Figure 9 (this is the
default behavior, but it is better to give a look). Ok. Now we are ready to launch our firmware. The
“Hello STM32 lovers” string will appear on the OpenOCD console, as shown in Figure 18.
Introduction to debugging 163
Figure 18: the output string coming from the Nucleo routed on the OpenOCD console
Sometimes, it happens that the OpenOCD console isn’t shown automatically when
a message is printed, but the GDB console remains active. You can switch to
OpenOCD console clicking on the console icon (circled in red in the image below).
C run-time library provides several functions used to do I/O manipulation, like the printf()/scanf()
routines for terminal output/input management and the file manipulation functions (fopen(),
fseek() and so on). These functions are built around low-level services provided by the underlying
operating system, also called system calls.
Introduction to debugging 164
STM32 applications developed with GCC are automatically linked with the newlib-nano, a lightweight
version of the standard C/C++ library explicitly designed to work with microcontrollers. newlib-
nano does not provide an implementation of low-level system calls. It is our responsibility to provide
an implementation for those functions if we need to use them. Since the target board lacks of terminal
management capabilities (no screen and no input devices), we can use semihosting to route those
low-level functions to the host debugger, that is OpenOCD.
Figure 19 clearly shows the whole process. For example, let us consider the printf() function.
When we invoke it in our firmware, newlib transfers the control to the _write() routine. So, we
have to provide our implementation of this function that sends the string to OpenOCD, which in
turns display it on the Host PC console.
Liviu Ionescu has already packed in its plugin the most used low-levels system calls. We only need
to enable their compilation, and we can start using the classical C run-time I/O manipulation envi-
ronment. To enable it, go to Project->Properties menu. Next go inside the C/C++ Build->Settings
section. In the Optimization section, uncheck the entry Assume freestanding environment (-
ffreestanding). Click on the OK button, go to Project->Clean.. and rebuild the whole project.
Introduction to debugging 165
Filename: src/main-ex2.c
34 #include "stm32f4xx_hal.h"
35 #include <string.h>
36
37 void SystemClock_Config(void);
38 static void MX_GPIO_Init(void);
39
40 int main(void)
41 {
42 char msg[20], name[20];
43
44 HAL_Init();
45 SystemClock_Config();
46 MX_GPIO_Init();
47
48 printf("What's your name?: \r\n");
49 scanf("%s", name);
50 sprintf(msg, "Hello %s!\r\n", name);
51 printf(msg);
52
53 FILE *fd = fopen("/tmp/test.out", "w+");
54 fwrite(msg, sizeof(char), strlen(msg), fd);
55 fclose(fd);
Introduction to debugging 166
The code is really self-explaining. It uses standard C functions like printf() and scanf() to print
and to retrieve a string from the OpenOCD console (lines 48-51). Next it opens the test.out file in
the /tmp folder on the host PC¹² and it writes the same string inside it (lines 53-55).
This feature is extremely useful in many situations. For example, it can be used to log firmware
activities inside a file on the PC for debugging purpose. Another example is a web server running
on a target board, and all HTML files resides on the host PC: you are free to change them to test
how they render without the need to re-flash the target file every time you change it.
• If you want to use only the trace_printf() functions from Liviu Ionescu, then add the macros
TRACE and OS_USE_TRACE_SEMIHOSTING_DEBUG.
• If you want to use the C standard library I/O manipulation functions, then add the macro OS_-
USE_SEMIHOSTING and uncheck the flag Assume freestanding environment (-ffreestanding).
The second case is the more complex. You have an existing project imported in Eclipse that has not
been generated using the GNU ARM Eclipse plugin. If it is sufficient to use the trace_printf()
function, then you can import inside your project these files taken from a project generated with the
GNU ARM plugin:
• src/diag/trace_impl.c
• src/diag/Trace.c
• include/diag/Trace.h
Next, you have to define the macros TRACE and OS_USE_TRACE_SEMIHOSTING_DEBUG at project level
and to call the routine initialise_monitor_handles() in your main() routine.
In case you want to use all standard C library I/O routines, you need to:
SWP are implemented by adding a special bkpt instruction immediately before the code we want
to inspect. When the core executes the breakpoint instruction, it will be forced into debug state. The
debugger sees that the MCU is halted and start debugging the MCU. The bkpt instruction accepts
an immediate 8-bit opcode, which can be used to specify particular break condition. If you want to
halt the execution of your firmware and transfer the control to the debugger, then the instruction:
asm("bkpt #0")
is what you need. This technique is used to implement software conditional breakpoint. For example,
suppose you have a strange fault condition that you need to inspect. You could have a piece of code
like the following one:
if(cond == 0) {
...
} else if(cond > 0) {
...
} else { /* Abnormal state, let us debug it */
asm("bkpt #0");
}
In the above code, if cond variable assumes a negative value, the MCU is halted and the control is
transferred to GDB, which allows us to inspect the call stack and the current stack frame.
Semihosting is implemented using the special immediate opcode 0xAB. That is, the instruction:
asm("bkpt #0xAB")
Introduction to debugging 169
causes the MCU to stop, but this time OpenOCD sees the special opcode and interprets it as
semihosting operation. By convention, the r0 register contains the type of operation (_write(),
_read(), etc) and the r1 register contains the pointer to the region of memory containing the function
parameters. For example, if we want to write a null terminated string on the host PC console, then
we can write the following assembly instructions:
asm (
"mov r0, 0x4 \n" /* OPCODE for WRITE0 */
"mov r1, 0x20001400 \n" /* address of string to transfer to OpenOCD */
"bkpt #0xAB"
);
Table 3 summarizes the supported semihosting operations. Please, take note that OpenOCD
currently does not support all of them.
The following complete C code shows how to implement the trace_printf() function.
Filename: src/main-ex3.c
34 #include "stm32f4xx_hal.h"
35
36 void SystemClock_Config(void);
37 static void MX_GPIO_Init(void);
38
39 int main(void)
40 {
41 char msg[] = "Hello STM32 lovers!\n";
42
43 HAL_Init();
44 SystemClock_Config();
45 MX_GPIO_Init();
46
47 asm volatile (
48 " mov r0, 0x4 \n"
49 " mov r1, %[msg] \n"
50 " bkpt #0xAB"
51 :
52 : [msg] "r" (msg)
53 : "r0", "r1"
54 );
55
56 while(1);
57 }
Introduction to debugging 170
Here we use the capabilities of GCC asm() function to pass the pointer of the msg buffer, containing
the string “Hello STM32 lovers!\n”.
Now you can understand why semihosting causes the MCU to become stuck if the debugger is not
active. The bkpt instruction halts the MCU execution, and there is no way to restore it without
using an external debugger (or doing a hardware reset). Moreover, every time the bkpt instruction
is issued, the internal MCU activities are suspended until the control passes to the debugger. During
this time, important asynchronous events (like interrupts generated by peripherals) could be lost.
This interval time is totally unpredictable, and it depends on many factors (speed of the hardware
interface, current Host PC load, speed of ST-LINK firmware, etc., etc.).
172
6. GPIO management
With the advent of the STCube initiative, ST has decided to completely revamp the Hardware
Abstraction Layer (HAL) for its STM32 microcontrollers. Prior to the release of the STCube HAL,
the official library to develop STM32 applications was for a long time the Standard Peripheral
Library. Despite of the fact it is still widespread between STM32 developers, and you can find a
lot of examples on the web using this library, the STCube HAL is a great improvement respect of
the old Standard Peripheral Library. In fact, being the first library developed by ST, not all of its
parts were consistent between different STM32 families and a lot of bugs were present in the early
versions of the library. This caused the emergence of different alternatives to the Standard Peripheral
Library, and the official software from ST is still considered poor by many people.
So, ST has completely redesigned the HAL and, even if it still needs a little bit of tuning, it is what
ST will officially support in the future. Moreover, the new HAL simplifies a lot the porting of code
between the STM32 sub-families (F0, F1, etc.), reducing the effort needed to adapt your application to
a different MCU (without a good abstraction layer, the pin-to-pin compatibility is just an advantage
from the marketing point of view). For this and several other reasons, this book is based exclusively
on the STCube HAL.
This chapter starts our journey inside the HAL looking to one of its simplest modules: HAL_GPIO. We
have already used many functions from this module in the early examples in this book, but now it is
the right time to understand all possibilities offered by a so simple and commonly used peripheral.
However, before we can start describing HAL features, it is best to give a quick look to how the
STM32 peripherals are mapped to logical addresses and how they are represented inside the HAL
library.
173
GPIO management 174
• The System bus connects the system bus of the Cortex-M core to a BusMatrix, which manages
the arbitration between the core and the DMA. Both the core and the the DMA act as masters.
• The DMA bus connects the Advanced High-performance Bus(AHB) master interface of the
DMA to the BusMatrix, which manages the access of CPU and DMA to SRAM, Flash memory
and peripherals.
• The BusMatrix manages the access arbitration between the core system bus and the DMA
master bus. The arbitration uses a Round Robin algorithm. The BusMatrix is composed
of two masters (CPU, DMA) and four slaves (Flash interface, SRAM, AHB1 with AHB to
Advanced Peripheral Bus(APB) bridge and AHB2). AHB peripherals are connected on system
bus through a BusMatrix to allow DMA access.
• The AHB to APB bridge provides full synchronous connections between the AHB and the APB
bus, where the most of peripherals are connected.
As we will see in a later chapter, each of these buses is connected to different clock sources, which
determine the maximum speed for the peripheral connected to that bus².
In Chapter 1 we have learned that peripherals are mapped to a specific region of the 4GB address
space, starting from 0x4000 0000 and lasting up to 0x5FFF FFFF. This region is further divided in
several sub-regions, each one mapped to a specific peripheral, as shown in Figure 2.
²For some of you the above description may be unclear and too complex to understand. Don’t worry and keep reading the next
content in this chapter. They will become clear once you reach the chapter dedicated to the DMA.
GPIO management 175
The way this space is organized, and hence how peripherals are mapped, is specific of a given STM32
microcontroller. For example, in an STM32F030 microcontroller the AHB2 bus is mapped to the
region ranging from 0x4800 0000 to 0x4800 17FF. This means that the region is 6144 bytes wide.
This region is further divided in several sub-regions, each one corresponding to a specific peripheral.
Following the previous example, the GPIOA peripheral (which manages all pins connected to the
PORT-A) is mapped from 0x4800 0000 to 0x4800 03FF, which means that it occupies 1KB of aliased
peripheral memory. How this memory-mapped space is in turn organized depends on the specific
peripheral. Table 1³ shows the memory layout of a GPIO peripheral.
³Both Table 1 and Figure 1 are taken from the ST STM32F030 Reference Manual (http://bit.ly/1GfS3iC).
GPIO management 176
A peripheral is controlled modifying and reading each register of these mapped regions. For example,
GPIO management 177
continuing the example of the GPIOA peripheral, to enable PA5 pin as output pin we have to configure
the MODER register so that bits[11:10] are configured as 01 (which corresponds to General purpose
output mode), as shown in Figure 3. Next, to pull the pin high, we have to set the corresponding
bit[5] inside the Output Data Register(ODR), which according Table 1 is mapped to the GPIOA +
0x14 memory location, that is 0x4800 0000 + 0x14.
The following minimal example shows how to use pointers to access to the GPIOA peripheral
mapped memory in an STM32F030 MCU.
int main(void) {
volatile uint32_t *GPIOA_MODER = 0x0, *GPIOA_ODR = 0x0;
// This ensure that the peripheral is enabled and connected to the AHB1 bus
__HAL_RCC_GPIOA_CLK_ENABLE();
It is important to clarify once again that every STM32 family (F0, F1, etc.) and every member of
the given family (STM32F030, STM32F1, etc.) provides its subset of peripherals, which are mapped
to specific addresses. Moreover, the way how peripherals are implemented differs between STM32-
series.
One of the HAL roles is to abstract from the specific peripheral mapping. This is done by defining
several handlers for each peripheral. A handler is nothing more then a C struct, whose references
are used to point to real peripheral address. Let us see one of them.
In the previous chapters, we have configured the PA5 pin with the following code:
Here, the GPIOA variable is a pointer of type GPIO_TypeDef defined in this way:
GPIO management 178
typedef struct {
volatile uint32_t MODER;
volatile uint32_t OTYPER;
volatile uint32_t OSPEEDR;
volatile uint32_t PUPDR;
volatile uint32_t IDR;
volatile uint32_t ODR;
volatile uint32_t BSRR;
volatile uint32_t LCKR;
volatile uint32_t AFR[2];
volatile uint32_t BRR;
} GPIO_TypeDef;
The GPIOA pointer is defined so that it points⁴ to the address 0x4800 0000:
GPIOA->MODER |= 0x400;
GPIOA->ODR |= 0x20;
GPIOs are the way an MCU communicates with the external world. Every board uses a variable
number of I/Os to drive external peripherals (e.g. an LED) or to exchange data through several types
of communication peripherals (UART, USB, SPI, etc.). Every time we need to configure a peripheral
that uses MCU pins, we need to configure its corresponding GPIOs using the HAL_GPIO module.
As seen before, the HAL is designed so that it abstracts from the specific peripheral memory
mapping. But, it also provides a general and more user-friendly way to configure the peripheral,
without forcing the programmers to now how to configure its registers in detail.
To configure a GPIO we use the HAL_GPIO_Init(GPIO_TypeDef *GPIOx, GPIO_InitTypeDef *GPIO_-
Init) function. GPIO_InitTypeDef is the C struct used to configure the GPIO, and it is defined in
the following way:
⁴This not exactly true, since the HAL, to save RAM space, defines GPIOA as a macro (#define GPIOA ((GPIO_TypeDef *)
GPIOA_BASE)).
GPIO management 179
typedef struct {
uint32_t Pin;
uint32_t Mode;
uint32_t Pull;
uint32_t Speed;
uint32_t Alternate;
} GPIO_InitTypeDef;
• Pin: it is the number, starting from 0, of the pins we are going to configure. For example, for
PA5 pin it assumes the value GPIO_PIN_5⁵. We can use the same GPIO_InitTypeDef instance to
configure several pins at once, doing a bitwise OR (e.g., GPIO_PIN_1 | GPIO_PIN_5 | GPIO_-
PIN_6).
• Mode: it is the operating mode of the pin, and it can assume one of the values in Table 2. More
about this field soon.
• Pull: specifies the Pull-up or Pull-Down activation for the selected pins, according Table 3.
• Speed: defines the pin speed. More about this later.
• Alternate: specifies which peripheral to associate to the pin. More about this later.
⁵Take note that the GPIO_PIN_x is a bit mask, where the i-th pin corresponds to the i-th bit of a uint16_t datatype. For example,
the GPIO_PIN_5 has a value of 0x0020, which is 32 in base 10.
⁶During and just after reset, the alternate functions are not active and all the I/O ports are configured in Input Floating mode.
GPIO management 180
Depending on the GPIO GPIO_InitTypeDef.Mode field, the MCU changes the way the hardware of
an I/O works. Let us have a look to the main modes.
When the I/O is configured as GPIO_MODE_INPUT:
The GPIO modes GPIO_MODE_EVT_* are related to sleep modes. When an I/O is configured to work
in one of these modes, the CPU will be woken up (when placed in sleep mode with a WFE instruction)
if the corresponding I/O is triggered, without generating the corresponding interrupt (more about
this topic in a following chapter). The GPIO modes GPIO_MODE_IT_* modes are related to interrupts
management, and they will be analyzed in the next chapter.
GPIO management 182
However, keep in mind that this implementation scheme can vary between the STM32-families,
especially for the low-power series. Always refer to the reference manual of your MCU, which
exactly describes I/O modes and their impact on the MCU working and power consumption.
It is also important to remark that this flexibility represents an advantage for the hardware design
too. For example, there is no need to use external pull-up resistors to drive I2C devices, since the
corresponding GPIOs can be configured setting GPIO_InitTypeDef.Mode = GPIO_MODE_OUTPUT_PP
and GPIO_InitTypeDef.Pull = GPIO_PULLUP. This saves space on the PCB and simplifies the BOM.
I/O mode can be eventually configured using the CubeMX tool, as shown in Figure 5. Pin
Configuration dialog can be reached inside the Configuration view, clicking on the GPIO button.
To discover which peripherals can be bound to an I/O, you can refer to the MCU datasheet or simply
use the CubeMX tool. Clicking on a pin in the Pin View causes a pop-up menu to appear. In this
menu we can set the wanted alternate function. For example, in Figure 6 you can see that PA3 can
be used as USART2_RX (that is, it can be used as RX pin for USART/UART2 peripheral, and this is
possible for every STM32 MCU with LQFP48 package). CubeMX will automatically generate the
right initialization code for us, as shown below:
Those of you working on an STM32F1 MCU will notice that the GPIO_InitTypeDef.Alternate field
is missed in the CubeF1 HAL. This happens because STM32F1 MCUs have a less flexible way to
define alternate functions of a pin. While other STM32 microcontrollers define the possible alternate
functions at GPIO level (by configuring dedicated registers GPIOx_AFRL and GPIOx_AFRH), allowing
to have up to sixteen different alternate functions associated of a pin (this only happens in packages
with high pin count), GPIOs of an STM32F1 MCU have really limited remapping capabilities. For
example, in an STM32F103RB MCU only the USART3 can have two couple of pins that can be used as
peripheral I/O alternatively. Usually, two dedicated peripheral registers, AFIO_MAPR and AFIO_MAPR2
“remap” signal I/Os of those peripherals allowing this operation.
GPIO management 184
This is essentially the reason why that field is not available in CubeF1 HAL.
CubeF0/1/3/L0/L1 CubeF4/L4
GPIO_SPEED_LOW GPIO_SPEED_FREQ_LOW
GPIO_SPEED_MEDIUM GPIO_SPEED_FREQ_MEDIUM
GPIO_SPEED_FAST GPIO_SPEED_FREQ_HIGH
GPIO_SPEED_HIGH⁸ GPIO_SPEED_FREQ_VERY_HIGH
Speed. A so sweet word for anybody loving performances. But what does it exactly means when it
refers to a GPIO? Here a GPIO speed is not related to switching frequency, that is how many times a
pin goes from ON to OFF in a unit of time. The GPIO_InitTypeDef.Speed parameter, instead, defines
the slew rate of a GPIO, that is how fast it goes from the 0V level to VDD one, and vice versa.
Figure 7: slew rate effect on a square wave - red=desired output, green=actual output
Figure 7 clearly shows this phenomenon. The red wave is the one that we will get if the response
speed was maximum, and therefore there was no response delay. In practice, what we get is that
shown by the green wave.
But how much does this parameter impact on the slew rate of an STM32 I/O? First of all, we have to
say that every STM32 family has its I/O driving characteristics. So you need to check the datasheet
of your MCU inside the Absolute Maximum Ratings section. Next, we can use this simple test to
measure the slew rate (the test is conducted on a Nucleo-F446RE board).
⁸These modes are available only in some high performance version of STM32 MCUs. Check the reference manual for your
MCU.
GPIO management 185
int main(void) {
GPIO_InitTypeDef GPIO_InitStruct;
HAL_Init();
__HAL_RCC_GPIOC_CLK_ENABLE();
while(1) {
GPIOC->ODR = 0x110;
GPIOC->ODR = 0;
}
}
The code is really self-explaining. We are configuring two pins as output I/Os. One of them, PC4, is
configured with a GPIO_SPEED_FREQ_LOW speed. The other one, PC8, with GPIO_SPEED_FREQ_VERY_-
HIGH. Figure 8 shows the difference between the two pins. As we can see, the PC4 speed is about
25MHz, while the speed of PC8 pin is about 50MHz⁹.
⁹Unfortunately, my oscilloscope probes have a load capacitance two high to conduct a precise measurement. According to
STM32F446RE datasheet, its maximum switching frequency is 90MHz, when CL = 30 pF, VDD ≥ 2.7 V and the compensation cell
is activated. But I was not able to obtain those results, due the poor oscilloscope and probably thanks to the length of the traces
connecting the Nucleo morpho header and the MCU pins.
GPIO management 186
Figure 8: the top figure shows the slew rate of PC4 pin and the one below the slew rate of PC8 pin
However, keep in mind that driving a pin “too hard” impacts on the overall EMI emissions of your
board. Professional design is nowadays all about EMI minimizing. Unless differently required, it is
strongly suggested you leave the default GPIO speed parameter to the minimum level.
What about the effective switching frequency? ST claims in its datasheets that the fastest toggle
speed of an output pin is every two clock cycles. The AHB1 bus, where the GPIO peripheral is
connected, runs at 42MHz for an STM32F446 MCU. So a pin should toggle in about 20MHz. However,
we have to add an additional overhead related to the memory transfer between the GPIO->ODR
register and the value we are going to store inside it (0x110), which costs another CPU cycle. So the
expected GPIO maximum switching speed is ∼14MHz. The oscilloscope confirms this, as shown in
Figure 9¹⁰.
Curiously, driving an I/O through the bit-banding region, using the same number of assembly
instructions, dramatically reduces the switching frequency down to 4MHz, as shown in Figure 10.
¹⁰Tests were conducted toggling the maximum GCC optimization level (-O3), prefetch enabled and all internal caches enabled.
This justifies that the detected speed is a little bit higher then 14MHz.
GPIO management 187
Figure 10: switching frequency when toggling an I/O through bit-banding region
The code used to drive the test is the following (non relevant code was omitted):
which accepts the GPIO descriptor and the pin number. It returns GPIO_PIN_RESET when the I/O is
low or GPIO_PIN_SET when high. Conversely, to change the I/O state, we have the function:
which accepts the GPIO descriptor, the pin number and the desired state. If we want to simply invert
the I/O state, then we can use this convenient routine:
GPIO management 188
Finally, one feature of the GPIO peripheral is that we can lock the configuration of an I/O.
Any subsequent attempt to change its configuration will fail, until a reset occurs. To lock a pin
configuration we can use this routine:
Eclipse intermezzo
It is possible to heavily customize the Eclipse interface by installing custom themes. A theme
essentially allows to change the appearance of the Eclipse user interface. This may seem a non
essential feature, but nowadays a lot of programmers prefer to customize colors, fonts type and
size and so on of their favorite development environment. That is one of the success reasons of
some minimal yet highly customizable source code editors, like TextMate and Sublime Text.
There are several theme packs available for Eclipse, but it is strongly suggested to install a plug-in
which automatically installs several other plug-ins useful for this scope: its name is Color IDE Pack
and it is available through the Eclipse Marketplace. The most relevant plug-ins installed are:
This author prefers a mixed approach between a full-dark theme and a full-light one: he prefers a
dark theme for the source editor, and a white background for other parts of IDE, as shown below.
http://marketplace.eclipse.org/content/color-ide-pack
7. Interrupts management
Hardware management is all about dealing with asynchronous events. The most of these come
from hardware peripherals. For example, a timer reaching a configured period value, or a UART
that warns about the arrival of data. Others are originated by the “world outside” our board. For
example, the user presses that damned switch that causes your board to hang, and you will spend a
whole day understanding what’s wrong.
All microcontrollers provide a feature called interrupts. An interrupt is an asynchronous event
that causes stopping the execution of the current code on a priority basis (the more important the
interrupt is, the higher its priority; this will cause that a lower-priority interrupt is suspended). The
code that services the interrupt is called Interrupt Service Routine (ISR).
Interrupts are a source of multiprogramming: the hardware knows about them and it is responsible
of saving the current execution context (that is, the stack frame, the current Program Counter and
few other things) before switching to the ISR. They are exploited by Real Time Operating Systems to
introduce the notion of tasks. Without help by the hardware it is impossible to have a true preemptive
system, which allows switching between several execution contexts without irreparably losing the
current execution flow.
Interrupts can originate both by the hardware and the software itself. ARM architecture distin-
guishes between the two types: interrupts originate by the hardware, exceptions by the software
(e.g., an access to invalid memory location). In ARM terminology, an interrupt is a type of exception.
Cortex-M processors provide a unit dedicated to exceptions management. This is called Nested
Vectored Interrupt Controller (NVIC) and this chapter is about programming this really fundamental
hardware component. However, here we deal only with interrupts management. Exceptions han-
dling will be treated in a subsequent chapter about advanced debugging.
190
Interrupts management 191
Figure 1: the relation between the NVIC controller, the Cortex-M core and the STM32 peripherals
As stated before, ARM distinguishes between system exceptions, which originate inside the CPU
core, and hardware exceptions coming from external peripherals, also called Interrupt Requests
(IRQ). Programmers manage exceptions through the use of specific ISRs, which are coded at higher
level (most often using C language). The processor knows where to locate these routines thanks to
an indirect table containing the addresses in memory of Interrupt Service Routines. This table is
commonly called vector table, and every STM32 microcontrollers defines its own. Let us analyze
this in depth.
• Reset: this exception is raised just after the CPU resets. Its handler is the real entry point of
the running firmware. In an STM32 application all starts from this exception. The handler
contains some assembly-coded functions designed to initialize the execution environment,
such as the main stack, the .bss area, etc. A subsequent chapter dedicated to the booting
process will explain this deeply.
• NMI: this is a special exception, which has the highest priority after the Reset one. Like the
Reset exception, it cannot be masked (that is disabled), and it can be associated to critical and
non-deferrable activities. In STM32 microcontrollers it is linked to the Clock Security System
(CSS). CSS is a self-diagnostic peripheral that detects the failure of the HSE. If this happens,
HSE is automatically disabled (this means that the internal HSI is automatically enabled) and
a NMI interrupt is raised to inform the software that something is wrong with the HSE. More
about this feature in Chapter 10.
Interrupts management 192
• Hard Fault: is the generic fault exception, and hence related to software interrupts. When
the other fault exceptions are disabled, it acts as a collector for all types of exceptions (e.g., a
memory access to an invalid location raised the Hard Fault exceptions if the Bus Fault one is
not enabled).
• Memory Management Fault¹: it occurs when executing code attempts to access an illegal
location or violates a rule of the Memory Protection Unit (MPU).
• Bus Fault¹: it occurs when AHB interface receives an error response from a bus slave (also
called prefetch abort if it is an instruction fetch, or data abort if it is a data access). Can also
be caused by other illegal accesses (e.g. an access to a non existent SRAM memory location).
• Usage Fault¹: it occurs when there is a program error such as an illegal instruction, alignment
problem, or attempt to access a non-existent co-processor.
• SVCCall: this is not a fault condition, and it is raised when the Supervisor Call (SVC)
instructions is called. This is used by Real Time Operating Systems to execute instructions in
privileged state (a task needing to execute privileged operations executes the SVC instruction,
and the OS performs the requested operations - this is the same behavior of a system call in
other OS).
• Debug Monitor¹: this exception is raised when a software debug event occurs while the
processor core is in Monitor Debug-Mode. It is also used as exception for debug events like
breakpoints and watchpoints when software based debug solution is used.
• PendSV: this is another exception related to RTOS. Unlike the SVCall exception, which is
executed immediately after a SVC instruction is executed, the PendSV can be delayed. This
allows the RTOS to complete tasks with higher priorities.
• SysTick: this exception is also usually related to RTOS activities. Every RTOS needs a timer to
periodically interrupt the execution of current code and to switch to another task. All STM32
microcontrollers provide a SysTick timer, internal to the Cortex-M core. Even if every other
timer may be used to schedule system activities, the presence of a dedicated timer ensures
portability among all STM32 families (due to optimization reasons related to the internal die
of the MCU, not all timers could be available as external peripheral). Moreover, even if we
aren’t using an RTOS in our firmware, it is important to keep in mind that the ST CubeHAL
uses the SysTick timer to perform internal time-related activities (and it also assumes that
the SysTick timer is configured to generate an interrupt every 1ms).
The remaining exceptions that can be defined for a given MCU are related to IRQ handling.
Cortex-M0/0+ cores allows up to 32 external interrupts, while Cortex-M3/4/7 cores allows silicon
manufacturers to define up to 240 interrupts.
Where can we find the list of usable interrupts for a given STM32 microcontrollers? The datasheet
of that MCU is certainly the main source about available interrupts. However, we can simply refer
to the vector table provided by ST in its HAL. This table is defined inside the startup file for our
MCU, the assembly file ending with .S we have learned to import in our Eclipse project in Chapter
¹This exception is not available in Cortex-M0/0+ based microcontrollers.
Interrupts management 193
4 (for example, for an STM32F030R8 MCU the file name is startup_stm32f030x8.S). Opening that
file we can find the whole vector table for that MCU, starting about at line 140.
Even if the vector table contains the addresses of the handler routines, the Cortex-M core needs a
way to find the vector table inside memory. By convention, the vector table starts at the hardware
address 0x0000 0004 in all Cortex-M based processors. If the vector table resides in the internal
FLASH memory (this is what usually happens), and since the FLASH in all STM32 MCUs is mapped
from 0x0800 0000 address, it is placed starting from the 0x0800 0004 address, which is aliased to
0x0000 0004.
Figure 2 shows how the vector table is organized in memory. Entry zero of this array is the address
of the Main Stack Pointer (MSP) inside the SRAM. Usually, this address corresponds to the end of
the SRAM, that is its base address + its size (more about memory layout of an STM32 application
in a following chapter). Starting from the second entry of this table, we can find all exceptions and
interrupts handler. This means that the vector table has a length equal to 48 for Cortex-M0/0+ based
microcontrollers and a length equal to 256 for Cortex-M3/4/7.
Interrupts management 194
Figure 2: the minimal layout of the vector table in an STM32 MCU based on a Cortex-M3/4/7 core
1. The name of the exception handlers is just a convention, and you are totally free to rename
them if you like a different one. They are just symbols (as are variables and functions inside a
program). However, keep in mind that the CubeMX software is designed to generate ISR with
those names, which are an ST convention. So, you have to rename the ISR name too.
2. As said before, the vector table must be placed at the beginning of the flash memory, where
the processor expects to find it. This is a Link Editor job that places the vector table at
the beginning of the flash data during the generation of the absolute file, which is the
binary file we upload to the flash. In a following chapter we will study the content of
ldscripts/sections.ld file, which contains the directives to instruct GNU LD about this.
3. vector table can be relocated to other addresses. We will study how this is performed in a
following chapter.
where the IRQn_Type is an enumeration of all exceptions and interrupts defined for that specific
MCU. The IRQn_Type enum is part of the ST Device HAL, and it is defined inside a header file
specific for the given STM32 MCU in the Eclipse folder system/include/cmsis/. These files are
named stm32fxxxx.h. For example, for an STM32F030R8 MCU the right filename is stm32f030x8.h
(the pattern name of these files is the same of start-up files).
The corresponding function to disable an IRQ is the:
• Only one PxY pin can be a source of interrupt. For example, we cannot define both PA0 and
PB0 as input interrupt pins.
• For EXTI lines sharing the same IRQ inside the NVIC controller, we have to code the
corresponding ISR so that we must be able to discriminate which lines generated the interrupt.
²Sometimes, it also happens that different peripherals share the same request line, even in Cortex-M3/4/7 based MCUs where
up to 240 configurable request lines are available. For example, in an STM32F446RE MCU, timer TIM6 shares its global IRQ with
DAC1 and DAC2 under-run error interrupts.
Interrupts management 196
Figure 3: the relation between GPIO, EXTI lines and corresponding ISR in an STM32F4 MCU
The following example³ shows how to use interrupts to toggle the LD2 LED every time we press
the user-programmable button, which is connected to the PC13 pin. First, we configure in the GPIO
PC13 to fire an interrupt every time it goes from the low level to the high one (lines 49:52). This
is accomplished setting GPIO .Mode to be equal to GPIO_MODE_IT_RISING (for the complete list of
available interrupt related modes, refer to Table 2 in Chapter 6). Next, we enable the interrupt of the
EXTI line associated with the Px13 pins, that is EXTI15_10_IRQn.
³The example is designed to work with a Nucleo-F401RE board. Please, refer to other book examples if you have a different
Nucleo board.
Interrupts management 197
Filename: src/main-ex1.c
39 int main(void) {
40 GPIO_InitTypeDef GPIO_InitStruct;
41
42 HAL_Init();
43
44 /* GPIO Ports Clock Enable */
45 __HAL_RCC_GPIOC_CLK_ENABLE();
46 __HAL_RCC_GPIOA_CLK_ENABLE();
47
48 /*Configure GPIO pin : PC13 - USER BUTTON */
49 GPIO_InitStruct.Pin = GPIO_PIN_13;
50 GPIO_InitStruct.Mode = GPIO_MODE_IT_RISING;
51 GPIO_InitStruct.Pull = GPIO_PULLDOWN;
52 HAL_GPIO_Init(GPIOC, &GPIO_InitStruct);
53
54 /*Configure GPIO pin : PA5 - LD2 LED */
55 GPIO_InitStruct.Pin = GPIO_PIN_5;
56 GPIO_InitStruct.Mode = GPIO_MODE_OUTPUT_PP;
57 GPIO_InitStruct.Pull = GPIO_NOPULL;
58 GPIO_InitStruct.Speed = GPIO_SPEED_LOW;
59 HAL_GPIO_Init(GPIOA, &GPIO_InitStruct);
60
61 HAL_NVIC_EnableIRQ(EXTI15_10_IRQn);
62
63 while(1);
64 }
65
66 void EXTI15_10_IRQHandler(void) {
67 __HAL_GPIO_EXTI_CLEAR_IT(GPIO_PIN_13);
68 HAL_GPIO_TogglePin(GPIOA, GPIO_PIN_5);
69 }
Finally, we need to define the function void EXTI15_10_IRQHandler()⁴, which is the ISR routine
associated to the IRQ for the EXTI15_10 line inside the vector table (lines 66:69). The content of
the ISR is really simple. We toggle the PA5 I/O every time the ISR fires. We also need to clear the
pending bit associated to the EXTI line (more about this next).
⁴Another feature of the ARM architectures is the ability to use conventional C functions as ISRs. When an interrupt fires, the
CPU switches from the Threaded mode (that is, the main execution flow) to the Handler mode. During this switching process, the
current execution context is saved thanks to a procedure named stacking. The CPU itself is responsible of storing the previous saved
context when the ISR terminates the execution (unstacking). The explanation of this procedure is outside from the scope of this
book. For more information about these aspects, refer to the Joseph Yiu book.
Interrupts management 198
Fortunately, the ST HAL provides an abstraction mechanism that avoids us to deal with these details,
unless we really need to take care of them. The previous example can be rewritten in the following
way:
Filename: src/main-ex2.c
This time we have configured as interrupt source both pin PC13 and PC12. When the EXTI15_10_-
IRQHandler() ISR is called, we transfer the control to the HAL_GPIO_EXTI_IRQHandler() function
inside the HAL. This will perform for us all the interrupt related activities, and it will call the HAL_-
GPIO_EXTI_Callback() routine passing the GPIO that has generated the IRQ. Figure 4 clearly shows
the call sequence that generates from the IRQ⁵.
⁵Don’t consider those time intervals related to the CPU cycles, they are just used to indicate “subsequent” events.
Interrupts management 199
This mechanism is used by almost all IRQ routines inside the HAL.
Please, take note that, since EXTI12 and EXTI13 lines are connected to the same IRQ, we need to
discriminate in our code which of the two pins generated the interrupt. This work is done for us by
the HAL, passing the GPIO_Pin parameter when the callback function is called.
Once we have enabled an IRQ, we need to instruct CubeMX to generate the corresponding ISR. This
configuration is done through the Configuration view, clicking on the NVIC button. A list of ISRs
that can be enabled appears, as shown in Figure 6.
Interrupts management 200
Figure 6: the NVIC configuration view allows to enable the corresponding ISR
CubeMX will automatically add the enabled ISRs inside the src/stm32fxxx_it.c file, and it will take
care of enabling the IRQs. Moreover, it adds for us the corresponding HAL handler routine to call,
as shown below:
/**
* @brief This function handles EXTI line[15:10] interrupts.
*/
void EXTI15_10_IRQHandler(void) {
/* USER CODE BEGIN EXTI15_10_IRQn 0 */
We only need to add the corresponding callback function (for example the HAL_GPIO_EXTI_Call-
back() routine) inside our application code.
We have already seen the first case in the previous paragraph. Now it is important to study what
happens when an interrupt occurs.
When an interrupt fires, it is marked as pending until the processor can serve it. If no other interrupts
are currently being processed, its pending state is automatically cleared by the processor, which
almost immediately starts serving it.
Figure 7: the relation between the pending bit and the interrupt active status
Figure 7 shows how this works. Interrupt A fires at the time t0 and, since the CPU isn’t servicing
another interrupt, its pending bit is cleared and its execution starts immediately⁷ (the interrupt
becomes active). At the time t1 the B interrupt fires, but here we suppose that it has a lower priority
than A. So it is leaved in pending state until the A ISR concludes its operations. When this happens,
the pending bit is automatically cleared and the ISR become active.
⁶http://amzn.to/1P5sZwq
⁷Here, it is important to understand the with the word “immediately” we are not saying that the interrupt execution starts
without delay. If no other interrupts are running, Cortex-M3/4/7 cores serve an interrupt in 12 CPU cycles, while Cortex-M0 does
it in 15 cycles and Cortex-M0+ in 16 cycles.
Interrupts management 202
Figure 8: the relation between the active status and interrupts priority
Figure 8 shows another important case. Here we have that the A interrupt fires, and the CPU can
immediately serve it. The interrupt B fires while A is serviced, so it remains in pending state until
A finishes. When this happens, the pending bit of B interrupt is cleared, and it becomes active.
However, after a while, A interrupt fires again, and since it has a higher priority, B interrupt is
suspended (becomes inactive) and the execution of A starts immediately. When this finishes, the B
interrupt becomes active again, and it completes its job.
Figure 9: an interrupt can be forced to fire again setting its pending bit
NVIC provides a high degree of flexibility for programmers. An interrupt can be forced to fire again
during its execution, simply setting its pending bit again, as shown in Figure 9⁸. In the same way,
the execution of an interrupt can be canceled clearing its pending bit while it is in pending state, as
shown in Figure 10.
⁸For the sake of completeness, it is important to specify that Cortex-M architecture is designed so that if an interrupt fires
while the processor is already servicing another interrupt, this will be serviced without restoring the previous application doing
the unstacking (refer to note 3 in this chapter for the definition of stacking/unstacking). This technique is called tail chaining and
it allows to speed up interrupt management and reduce power consumption.
Interrupts management 203
Figure 10: IRQ servicing can be canceled clearing its pending bit before it is executed
Here it is important to clarify an important aspect related to how peripherals warn the NVIC
controller about the interrupt request. When an interrupt takes place, the most of STM32 peripherals
assert a specific signal connected to the NVIC, which is mapped in the peripheral memory through
a dedicated bit. This peripheral Interrupt Request bit will be held high until it is manually cleared
by the application code. For example, in the Example 1 we had to expressly clear the EXTI line IRQ
pending bit using the macro __HAL_GPIO_EXTI_CLEAR_IT(). If we do not de-assert that bit, a new
interrupt will be fired until it is cleared.
Figure 11: the relation between the peripheral IRQ and the corresponding interrupt
The Figure 11 clearly shows the relation between the peripheral IRQ pending state and the ISR
pending state. Signal I/O is the external peripheral driving the I/O (e.g. a tactile switch connected to
a pin). When the signal level changes, the EXTI line connected to that I/O generates an IRQ and the
corresponding pending bit is asserted. As consequence, the NVIC generates the interrupt. When the
processor starts servicing the ISR, the ISR pending bit is cleared automatically, but the peripheral
IRQ pending bit will be held high until it is cleared by the application code.
Interrupts management 204
Figure 12: when an interrupt is forced setting its pending bit, the corresponding peripheral IRQ remains unset
The Figure 12 shows another case. Here we force the execution of the ISR setting its pending bit.
Since this time the external peripheral is not involved, there is no need to clear the corresponding
IRQ pending bit.
Since the presence of the IRQ pending bit is peripheral dependent, it is always opportune to
use the ST HAL functions to manage interrupts, leaving all the underlying details to the HAL
implementation (unless we want to have full control, but this is not case of this book). However, take
in mind that to avoid losing important interrupts, it is a good design practice to clear peripherals
IRQ pending status bit as their ISR start to be serviced. The processor core does not keep track of
multiple interrupts (it does not queue interrupts), so if we clear the peripheral pending bit at the end
of an ISR, we may lose important IRQs that fire in the middle.
To see if an interrupt is pending (that is, fired but not running), we can use the HAL function:
This will cause the interrupt to fire, as it would be generated by the hardware. A distinctive feature
of Cortex-M processors it that it is possible to programmatically fire an interrupt inside the ISR
routine of another interrupt.
Instead, to programmatically clear the pending bit of an IRQ, we can use the function:
Interrupts management 205
Once again, it is also possible to clear the execution of a pending interrupt inside the ISR servicing
another IRQ.
To check if an ISR is active (IRQ being serviced), we can use the function:
NVIC priority mechanism is substantially different between Cortex-M0/0+ and Cortex-M3/4/7 cores.
For this reason we are going to explain them in two separated subparagraphs.
7.4.1 Cortex-M0/0+
Cortex-M0/0+ based microcontrollers have a simpler interrupt priority mechanism. This means that
STM32F0 and STM32L0 MCUs have a different behavior from the rest of STM32 microcontrollers.
And you have to pay special attention if you are porting your code between the STM32 series.
In Cortex-M0/0+ cores the priority of each interrupt is defined through an 8-bit register, called IPR.
In the ARMv6-M core architecture only 4 bits of this register are used, allowing up to 16 different
priority levels. However, in practice, STM32 MCUs implementing these cores use only the two upper
bits of this register, seeing all other bits equal to zero.
Figure 13: the content of IPR register on an STM32 MCU based on Cortex-M0
Figure 13 shows how the content of IPR is interpreted. This means that we have only four maximum
priority levels: 0x00, 0x40, 0x80, 0xC0. The lower this number is, the higher the priority is. That is,
an IRQ having a priority equal to 0x40 has a higher priority than an IRQ with a priority level equal
Interrupts management 206
to 0xC0. If two interrupts fire at the same time, the one with the higher priority will be served
first. If the processor is already servicing an interrupt and a higher priority interrupt fires, then the
current interrupt is suspended and the control passes to the higher priority interrupt. When this
is completed, the execution goes back to the previous interrupt, if no other interrupt with higher
priority occurs in the meantime. This mechanism is called interrupt preemption.
Figure 14 shows an example of interrupt preemption. A is an IRQ with lower priority that fires at
time t0 . The ISR starts the execution but the IRQ B, which has a higher priority (lower priority level),
fires at time t1 and the execution of A ISR is stopped. When B finishes its job, the execution of A
ISR is resumed until it finishes. This “nested” mechanism induced by interrupt priorities leads to the
name of the NVIC controller, which is Nested Vectored Interrupt Controller.
Cortex-M0/0+ has an important difference compared to Cortex-3/4/7 cores. The interrupt priority is
static. This means that once an interrupt is enabled its priority can no longer be changed, until we
disable the IRQ again.
The CubeHAL provides the following function to assign a priority to an IRQ:
The HAL_NVIC_SetPriority() function accepts the IRQ we are going to configure and the Preempt-
Priority, which is the preemption priority we are going to assign to the IRQ. The CMSIS API, and
hence the CubeHAL library, is designed so that PreemptPriority is specified with a priority level
number ranging from 0 to 4. The value is internally shifted to the most significant bits automatically.
This simplifies the porting of code to other MCU with a different number of priority bits (this is the
reason why only the left part of IPR register is used by silicon vendors).
Interrupts management 207
As you can see, the function accepts also the additional parameter SubPriority, which
is simply ignored in CubeF0 and CubeL0 HALs since the underlying Cortex-M processor
does not support interrupt sub-priority. Here ST engineers have decided to use the same
API available in the other HALs for Cortex-M3/4/7 based processors. Probably they decided
to do so to simplify porting code between the different STM32 MCUs.
Curiously, they have decided to define the corresponding function to retrieve the priority
of an IRQ in the following way:
which is completely different from the one defined in the HALs for Cortex-M3/4/7 based
processors⁹.
The following example¹⁰ shows how the interrupt priority mechanism works.
Filename: src/main-ex3.c
39 uint8_t blink = 0;
40
41 int main(void) {
42 GPIO_InitTypeDef GPIO_InitStruct;
43
44 HAL_Init();
45
46 /* GPIO Ports Clock Enable */
47 __HAL_RCC_GPIOC_CLK_ENABLE();
48 __HAL_RCC_GPIOB_CLK_ENABLE();
49 __HAL_RCC_GPIOA_CLK_ENABLE();
50
51 /*Configure GPIO pin : PC13 - USER BUTTON */
52 GPIO_InitStruct.Pin = GPIO_PIN_13 ;
53 GPIO_InitStruct.Mode = GPIO_MODE_IT_RISING;
54 GPIO_InitStruct.Pull = GPIO_PULLDOWN;
55 HAL_GPIO_Init(GPIOC, &GPIO_InitStruct);
56
57 /*Configure GPIO pin : PB2 */
58 GPIO_InitStruct.Pin = GPIO_PIN_2 ;
59 GPIO_InitStruct.Mode = GPIO_MODE_IT_FALLING;
60 GPIO_InitStruct.Pull = GPIO_PULLUP;
61 HAL_GPIO_Init(GPIOB, &GPIO_InitStruct);
62
⁹I have opened a dedicated thread on the official ST Forum, but there is still no answer from ST at the time of writing this
chapter.
¹⁰The example is designed to work with a Nucleo-F030R8 board. Please, refer to other book examples if you have a different
Nucleo board.
Interrupts management 208
The code should be really easy to understand if the previous explanation is clear for you. Here
we have two IRQs associated to EXTI lines 2 and 13. The corresponding ISRs call the HAL HAL_-
GPIO_EXTI_IRQHandler() which in turn calls the HAL_GPIO_EXTI_Callback() callback passing the
GPIO involved in the interrupt. When the user button connected to PC13 signal is pushed, the
ISR starts an infinite loop until the blink global variables is >0. This loop makes the LD2 LED
blinking quickly. When the PB2 pin is asserted low (use the pinout diagram for your Nucleo from
Interrupts management 209
Appendix C to identify PB2 pin position), the EXTI2_3_IRQHandler()¹¹ fires and this causes the
HAL_GPIO_EXTI_IRQHandler() to set the blink variable to 0. The EXTI4_15_IRQHandler() can now
end. The priority of each interrupt is configured at lines 70 and 73: as you can see, since the interrupt
priority is static in Cortex-M0/0+ based MCUs, we have to set it before we enable the corresponding
interrupt.
Please, take note that this is a really bad way to deal with interrupts. Locking the MCU
inside an interrupt is a poor programming style, and it is the root of all evil in embedded
programming. Unfortunately, this is the only example that came up to the author’s mind,
considering that at this point the book still covers few topics. Every ISR must be designed
to last as little as possible, otherwise other fundamental ISRs could be masked for a long
time loosing important information coming from other peripherals.
As exercise, try to play with interrupt priorities, and see what happens if both interrupts
have the same priority.
You may notice that often the interrupt fires by simply touching the wire, even if it isn’t
tied to the ground. Why does this happen? There are essentially two reasons that cause
the interrupt to “accidentally” trigger. First of all, modern microcontrollers try to minimize
the power leakages connected with the usage of internal pull-up/down resistors. So, the
value of these resistors is chosen really high (something around 50kΩ). If you play with the
voltage divider equation, you can figure out that it is really easy to pull an I/O low or high
when a pull-up/down resistor has a high resistance value. Secondly, here we are not doing
adequate debouncing of the input pin. Debouncing is the process of minimizing the effect
of bounces produced by “unstable” sources (e.g. a mechanical switch). Usually debouncing
is performed in hardware¹² or in software, by counting how much time is elapsed from
the first variation of the input state: in our case, if the input remains low for more than
a given period (usually something between 100ms and 200ms is sufficient), then we can
says that the input has been effectively tied to the ground). As we will see in Chapter
11, we can also use one channel of a timer configured to work in input capture mode to
detect when a GPIO changes state. This gives us the ability to automatically count how
much time is elapsed from the first event. Moreover, timer channels support integrated
and programmable hardware filters, which allow us to reduce the number of external
components to debounce the I/Os.
¹¹Please, take note that for STM32F302 MCUs the default name of the IRQ associated to EXTI line 2 is EXTI2_TSC_IRQHandler.
Refer to book examples if you are working with this MCU.
¹²Usually, a capacitor and a resistor in parallel with the switch contacts are sufficient in most cases. For example, you can take
a look to schematics of the Nucleo board to see how ST engineers have debounced the USER button connected to PC13 GPIO.
Interrupts management 210
7.4.2 Cortex-M3/4/7
Interrupt priority mechanism in Cortex-M3/4/7 is more advanced than the one available in Cortex-
M0/0+ based microcontrollers. Developers have a higher degree of flexibility, and this is often source
of several headaches for novices. Moreover, the way interrupt priority is presented both in the ARM
and ST documentation is a little bit counterintuitive.
In Cortex-M3/4/7 cores the priority of each interrupt is defined through the IPR register. This is a 8bit
register in the ARMv7-M core architecture that allows up to 255 different priority levels. However,
in practice, STM32 MCUs implementing these cores use only the four upper bits of this register,
seeing all other bits equal to zero.
Figure 15: the content of IPR register on an STM32 MCU based on Cortex-M3/4/7 core
Figure 15 clearly shows how the content of IPR is interpreted. This means that we have the only
sixteen maximum priority levels: 0x00, 0x10, 0x20, 0x30, 0x40, 0x50, 0x60, 0x70, 0x80, 0x90, 0xA0,
0xB0, 0xC0, 0xD0, 0xE0, 0xF0. The lower this number is, the higher the priority is. That is, an IRQ
having a priority equal to 0x10 has a higher priority than an IRQ with a priority level equal to
0xA0. If two interrupts fire at the same time, the one with the higher priority will be served first. If
the processor is already servicing an interrupt and a higher priority interrupts fires, then the current
interrupt is suspended and the control passes to the higher priority interrupt. When this is completed,
the execution goes back to the previous interrupt, if no other interrupts with higher priority occurs
in the meantime.
So far, the mechanism is substantially the same of Cortex-M0/0+. The complication arises from
the fact that the IPR register can be logically subdivided in two parts: a series of bits defining the
preemption priority¹³ and a series of bits defining the sub-priority. The first priority level rules the
preemption priorities between ISRs. If an ISR has a priority higher than another one, it will preempt
the execution of the lower priority ISR in case it fires. The sub-priority determines what ISR will be
executed first, in case of multiple pending ISR, but it will not act on ISR preemption.
¹³What complicates the understanding of interrupt priorities is the fact that in the official documentation sometimes the
preemption priority is also called group priority. This leads to a lot of confusion, since novices tends to imagine that this bits
define a sort of Access Control List (ACL) privileges. Here, to simplify the understanding of this matter, we will only speak about
preemption priority level.
Interrupts management 211
Figure 16 shows an example of interrupt preemption. A is an IRQ with the lowest priority that fires
at time t0 . The ISR starts the execution but the IRQ B, which has a higher priority (lower priority
level), fires at time t1 and the execution of A ISR is stopped. After a while, C IRQ fires at time t2
and the B ISR is stopped and the C ISR starts execution. When this finishes, the execution of B ISR
is resumed until it finishes. When this happens, the execution of A ISR is resumed. This “nested”
mechanism induced by interrupt priorities leads to the name of the NVIC controller, which is Nested
Vectored Interrupt Controller.
Figure 17: if two interrupts with the same priority are pending, the one with the higher sub-priority is executed
first
Figure 17 shows how the sub-priority affects the execution of multiple pending ISRs. Here we have
three interrupts, all with the same maximum priority. At time t0 the IRQ A fires and it is serviced
immediately. At the time t1 B IRQ fires, but since it has the same priority level of other IRQs, it is
leaved in pending state. At time t2 also C IRQ fires, but for the same reason as before it is leaved
in pending state by the processor. When The A ISR finishes, the C IRQ is served first, since it has a
higher sub-priority than B. Only when the C ISR finishes the B IRQ can be served.
The way how IPR bits are logically subdivided is defined by the SCB->AIRCR register (a sub-group
of bits of the System Control Block (SCB) register), and it is important to stress right from the start
that this way to interpret the content of the IPR register is global to all ISRs. Once we have defined
a priority scheme (also called priority grouping in the HAL), this is common to all interrupts used
in the system.
Interrupts management 212
Figure 18: the subdivision of IPR bits between preemption priority and sub-priority
Figure 18 shows all five possible subdivisions of IPR register, while Table 2 shows the maximum
number of preemption priority levels and sub-priority levels that each subdivision scheme allows.
Table 2: the number of preemption priority level available based on the current priority grouping schema
NVIC Priority Group Number of preemption priority levels Number of sub-priority levels
NVIC_PRIORITYGROUP_0 0 16
NVIC_PRIORITYGROUP_1 2 8
NVIC_PRIORITYGROUP_2 4 4
NVIC_PRIORITYGROUP_3 8 2
NVIC_PRIORITYGROUP_4 16 0
The HAL library is designed so that the PreemptPriority and SubPriority can be configured with a
priority level number ranging from 0 to 16. The value is internally shifted to the most significant bits
automatically. This simplifies the porting of code to other MCU with a different number of priority
bits (this is the reason why only the left part of IPR register is used by silicon vendors).
Instead, to define the priority grouping, that is how to subdivide the IPR register between the
preemption priority and sub-priority, the following function can be used:
Interrupts management 213
where the PriorityGroup parameter is one of the macros from the column NVIC Priority Group
in Table 2.
The following example¹⁴ shows how the interrupt priority mechanism works.
Filename: src/main-ex3.c
59 uint8_t blink = 0;
60
61 int main(void) {
62 GPIO_InitTypeDef GPIO_InitStruct;
63
64 HAL_Init();
65
66 /* GPIO Ports Clock Enable */
67 __HAL_RCC_GPIOC_CLK_ENABLE();
68 __HAL_RCC_GPIOB_CLK_ENABLE();
69 __HAL_RCC_GPIOA_CLK_ENABLE();
70
71 /*Configure GPIO pin : PC13 */
72 GPIO_InitStruct.Pin = GPIO_PIN_13 ;
73 GPIO_InitStruct.Mode = GPIO_MODE_IT_RISING;
74 GPIO_InitStruct.Pull = GPIO_PULLDOWN;
75 HAL_GPIO_Init(GPIOC, &GPIO_InitStruct);
76
77 /*Configure GPIO pin : PB2 */
78 GPIO_InitStruct.Pin = GPIO_PIN_2 ;
79 GPIO_InitStruct.Mode = GPIO_MODE_IT_FALLING;
80 GPIO_InitStruct.Pull = GPIO_PULLUP;
81 HAL_GPIO_Init(GPIOB, &GPIO_InitStruct);
82
83 /*Configure GPIO pin : PA5 */
84 GPIO_InitStruct.Pin = GPIO_PIN_5;
85 GPIO_InitStruct.Mode = GPIO_MODE_OUTPUT_PP;
86 GPIO_InitStruct.Pull = GPIO_NOPULL;
87 GPIO_InitStruct.Speed = GPIO_SPEED_LOW;
88 HAL_GPIO_Init(GPIOA, &GPIO_InitStruct);
89
90 HAL_NVIC_SetPriority(EXTI15_10_IRQn, 0x1, 0);
91 HAL_NVIC_EnableIRQ(EXTI15_10_IRQn);
92
¹⁴The example is designed to work with a Nucleo-F401RE board. Please, refer to other book examples if you have a different
Nucleo board.
Interrupts management 214
The code should be really easy to understand if the previous explanation is clear for you. Here
we have two IRQs associated to EXTI lines 2 and 13. The corresponding ISRs call the HAL HAL_-
GPIO_EXTI_IRQHandler() which in turn calls the HAL_GPIO_EXTI_Callback() callback passing the
GPIO involved in the interrupt. When the user button connected to PC13 signal is pushed, the
ISR starts an infinite loop until the blink global variables is >0. This loop makes the LD2 LED
blinking quickly. When the PB2 pin is asserted low (use the pinout diagram for your Nucleo from
Appendix C to identify its position), the EXTI2_IRQHandler() fires and this causes the HAL_GPIO_-
EXTI_IRQHandler() to set the blink variable to 0. The EXTI15_10_IRQHandler() can now end.
Please, take note that this is a really bad way to deal with interrupts. Locking the MCU
inside an interrupt is a poor programming style, and it is the root of all evil in embedded
programming. Unfortunately, this is the only example that came up to the author’s mind,
considering that at this point the book still covers few topics. As we will see soon, every
ISR must be designed to last as little as possible, otherwise other fundamental ISRs could
be masked for a long time loosing important information coming from other peripherals.
Interrupts management 215
As exercise, try to play with interrupt priorities, and see what happens if both interrupts
have the same priority.
You may notice that often the interrupt fires by simply touching the wire, even if it isn’t
tied to the ground. Why does this happen? There are essentially two reasons that cause
the interrupt to “accidentally” trigger. First of all, modern microcontrollers try to minimize
the power leakages connected with the usage of internal pull-up/down resistors. So, the
value of these resistors is chosen really high (something around 50kΩ). If you play with the
voltage divider equation, you can figure out that it is really easy to pull an I/O low or high
when a pull-up/down resistor has a high resistance value. Secondly, here we are not doing
adequate debouncing of the input pin. Debouncing is the process of minimizing the effect
of bounces produced by “unstable” sources (e.g. a mechanical switch). Usually debouncing
is performed in hardware¹⁵ or in software, by counting how much time is elapsed from
the first variation of the input state: in our case, if the input remains low for more than
a given period (usually something between 100ms and 200ms is sufficient), then we can
says that the input has been effectively tied to the ground). As we will see in Chapter
11, we can also use one channel of a timer configured to work in input capture mode to
detect when a GPIO changes state. This gives us the ability to automatically count how
much time is elapsed from the first event. Moreover, timer channels support integrated
and programmable hardware filters, which allow us to reduce the number of external
components to debounce the I/Os.
It is important to remark some fundamental things. First of all, different from Cortex-M0/0+ based
microcontrollers, Cortex-M3/4/7 cores allow to dynamically change the priority of an interrupt,
even if this is already enabled. Secondly, care must be taken when the priority grouping is
lowered dynamically. Let us consider the following example. Suppose that we have three ISRs
with three decreasing priorities (the priority is specified inside the parenthesis): A(0x0), B(0x10),
C(0x20). Suppose that we have defined these priorities when the priority grouping was equal to
NVIC_PRIORITYGROUP_4. If we lower it to the NVIC_PRIORITYGROUP_1 level, the current preemption
levels will be interpreted as sub-priorities. This will cause that interrupt service routines A, B and
C have the same preemption level (that is, 0x0), and it will not be possible to preempt them. For
example, looking at Figure 20 we can see what happens to the priority of the ISR C when the
priority grouping is lowered from 4 to 1. When the priority grouping is set to 4, the priority of C ISR
is just two levels under the maximum priority level, which is 0 (the next highest level is 0x10, which
is the B’s priority). This means that C can be preempted both by A and B. However, if we lower
the priority grouping to 1, then the priority of C becomes 0x0 (only bit 7 acts as priority) and the
remaining bits are interpreted by the NVIC controller as sub-priority. This can lead to the following
scenario:
2. if C interrupt is triggered, and the CPU isn’t servicing another interrupt, C is serviced
immediately;
3. if CPU is servicing C ISR and then after a short while A and B are triggered, CPU will service
A and then B after it completes to service C;
4. if CPU is servicing another ISR, if C triggers and then after a short while A and B are triggered,
A will be serviced firstly, followed by B then C.
Figure 20: what happens to the C ISR priority when the priority gruping is lowered from 4 to 1
Before that the interrupt priority mechanism becomes clear, you will have to do several
experiments by yourself. So, try to modify the Example 3 so that changing the priority
grouping causes that the preemption priority is the same for both the IRQs.
To obtain the priority of an interrupt, the HAL defines the following function:
I have to admit that the signature of this function is a little bit fuzzy, since it differs from the
HAL_NVIC_SetPriority(): here we have to specify also the PriorityGroup, while the HAL_NVIC_-
SetPriority() function computes it internally. I do not know why ST has decided to use this
signature, and I cannot see the reason to make it different from the HAL_NVIC_SetPriority().
The current priority grouping can be obtained using the following function:
uint32_t HAL_NVIC_GetPriorityGrouping(void);
Figure 20: the NVIC configuration view allows to set the ISR priority
Using the Priority Group combo box we can set the priority grouping schema, and then assign
the individual priority and sub-priority to each interrupt. CubeMX will automatically generate the
corresponding C code to setup the IRQ priority inside the MX_GPIO_Init() function. Instead, the
global priority grouping schema is configured inside the HAL_MspInit() function.
¹⁶Joseph Yiu shows a way to bypass this limitation in his books. However, I strongly discourage from using these tricky techniques
unless you really need interrupt re-entrancy in your application.
¹⁷The example is designed to work with a Nucleo-F401RE board. Please, refer to other book examples if you have a different
Nucleo board.
Interrupts management 218
Filename: src/main-ex4.c
50 /*Configure GPIO pin : PC12 & PC13 */
51 GPIO_InitStruct.Pin = GPIO_PIN_12 | GPIO_PIN_13;
52 GPIO_InitStruct.Mode = GPIO_MODE_IT_RISING;
53 GPIO_InitStruct.Pull = GPIO_PULLDOWN;
54 HAL_GPIO_Init(GPIOC, &GPIO_InitStruct);
55
56 /*Configure GPIO pin : PA5 */
57 GPIO_InitStruct.Pin = GPIO_PIN_5;
58 GPIO_InitStruct.Mode = GPIO_MODE_OUTPUT_PP;
59 GPIO_InitStruct.Pull = GPIO_NOPULL;
60 GPIO_InitStruct.Speed = GPIO_SPEED_LOW;
61 HAL_GPIO_Init(GPIOA, &GPIO_InitStruct);
62
63 HAL_NVIC_SetPriorityGrouping(NVIC_PRIORITYGROUP_1);
64 HAL_NVIC_EnableIRQ(EXTI15_10_IRQn);
65 HAL_NVIC_SetPriority(EXTI15_10_IRQn, 0x0, 0);
66
67 while(1) {
68 if(blink) {
69 HAL_GPIO_TogglePin(GPIOA, GPIO_PIN_5);
70 for(int i = 0; i < 100000; i++);
71 }
72 }
73 }
74
75 void EXTI15_10_IRQHandler(void) {
76 HAL_GPIO_EXTI_IRQHandler(GPIO_PIN_12);
77 HAL_GPIO_EXTI_IRQHandler(GPIO_PIN_13);
78 }
79
80 void HAL_GPIO_EXTI_Callback(uint16_t GPIO_Pin) {
81 if(GPIO_Pin == GPIO_PIN_13)
82 blink = 1;
83 else
84 blink = 0;
85 }
disabling one by one. Two special registers, named PRIMASK and FAULTMASK allow to disable all
interrupts and exceptions respectively.
Even if these registers are 32-bit wide, just the first bit is used to enable/disable interrupts and
exceptions. The ARM assembly instruction CPSID i disables all interrupt by setting the PRIMASK bit
to 1, while the CPSIE i instructions enables them by setting PRIMASK to zero. Instead, the instruction
CPSID f disables all exceptions (except for the NMI one) by setting the FAULTMASK bit to 1, while the
CPSIE f instructions enables them.
The CMSIS-Core package provides several macros that we can use to perform these operation: __-
disable_irq() and __enable_irq() automatically set and clear the PRIMASK. Any critical task can
be placed between these two macros, as shown below:
...
__disable_irq();
/* All exceptions with configurable priority are temporarily disabled.
You can place critical code here */
...
__enable_irq();
However, take in mind that, as general rule, interrupt must be masked only for really short time,
otherwise you could lose important interrupts. Remember that interrupts are not queued.
Another macro we can use is the __set_PRIMASK(x) one, where x is the content of the PRIMASK
register (0 or 1). The macro __get_PRIMARK() returns the content of the PRIMASK register. Instead, the
macros __set_FAULTMASK(x) and __get_FAULTMASK() allow to manipulate the FAULTMASK register.
It is important to remark that, once the PRIMASK register is again set to zero, all pending interrupts
are serviced according their priority: PRIMASK causes that the the interrupt pending bit is set but
the ISR is not serviced. This is the reason why we say that interrupt are masked and not disabled.
Interrupts start to be serviced as soon as the PRIMASK is cleared.
Cortex-M3/4/7 cores allow to selectively mask interrupts on a priority basis. The BASEPRI register
masks exceptions or interrupts on a priority level. The width of the BASEPRI register is the same of
the IPR one, which lasts for the upper 4 bits in STM32 MCUs based on these cores. When BASEPRI
is set to 0, it is disabled. When it is set to a non-zero value, it blocks exceptions (including interrupts)
that have the same or lower priority level, while still allowing exceptions with a higher priority level
Interrupts management 220
to be accepted by the processor. For example, if the BASEPRI register is set to 0x60, then all interrupts
with a priority between 0x60-0xFF are disabled. Remember that in Cortex-M cores the higher is the
priority number the lower is the interrupt priority level. The __set_BASEPRI(x) macro allows to set
the content of the BASEPRI register: remember, again, that the HAL automatically shifts the priority
levels to the MSB bits. So, if we want to disable all interrupts with a priority higher than 2, then we
have to pass to the __set_BASEPRI() macro the value 0x20. Alternatively, we can use the following
code:
221
Universal asynchronous serial communications 222
Figure 1: a serial communication between two devices using a shared clock source
In Figure 1 we have a typical timing diagram¹ showing the Device A sending one byte (0b01101001)
serially to the Device B using a common reference clock. The common clock is also used to agree on
when to start sampling the sequence of bits: when the master device starts clocking the dedicated
line, it means that it is going to send a sequence of bits.
In a synchronous transmission the transmission speed and duration are defined by the clock: its
frequency determines how fast we can transmit a single byte on the communication channel². But
if both devices involved in data transmission agree on how long it takes to transmit a single bit and
when to start and finish to sample transmitted bits, than we can avoid to use a dedicated clock line.
In this case we have an asynchronous transmission.
Figure 2: the timing diagram of a serial communication without a dedicated clock line
Figure 2 shows the timing diagram of an asynchronous transmission. The idle state (that is, no
transmission occurring) is represented by the high signal. Transmission begins with a START bit,
which is represented by the low level. The negative edge is detected by the receiver and 1.5 bit periods
after this (indicated in Figure 1s T1.5bit ), the sampling of bits begins. Eight data bits are sampled. The
least significant bit (LSB) is typically transmitted first. An optional parity bit is then transmitted (for
error checking of the data bits). Often this bit is omitted if the transmission channel is assumed to be
noise free or if there are error checking higher up in the protocol layers. The transmission is ended
by a STOP bit, which last 1.5 bits.
¹A Timing Diagram is a representation of a set of signals in the time domain.
²However, keep in mind that the maximum transmission speed is determined by a lot of other things, like the characteristics of
the electrical channel, the ability of each device involved in transmission to sample fast signals, and so on.
Universal asynchronous serial communications 223
Figure 4: a typical circuit based on FT232RL used to convert a 3.3V TTL UART interface to USB
STM32 microcontrollers provide a variable number of USARTs, which can be configured to work
both in synchronous and asynchronous mode. Some STM32 MCUs also provide interfaces only able
to act as UART. Table 1 lists the UART/USARTs provided by STM32 MCUs equipping all Nucleo
boards. The most of USARTs are also able to automatically implement Hardware Flow Control, both
for the RS232 and the RS485 standards.
All Nucleo boards are designed so that the USART2 of the target MCU is linked to the ST-LINK
interface. When we install the ST-LINK drivers, an additional driver for the Virtual COM Port(VCP)
is also installed: this allows us to access to the target MCU USART2 using the USB interface, without
using a dedicated TTL/USB converter. Using a terminal emulation program we can exchange
messages and data with our Nucleo.
The CubeHAL separates the API for the management of UART and USART interfaces. All functions
and C type handlers used for the handling of USARTs start with the HAL_USART prefix and are
contained inside the files stm32xxx_hal_usart.{c,h}, while those related to UARTs management
start with the HAL_UART prefix and are contained inside the files stm32xxx_hal_uart.{c,h}. Since
both the modules are conceptually identical, and since the UART is the most common form of serial
interconnection between different modules, this book will only cover the features of the HAL_UART
module.
Universal asynchronous serial communications 225
Table 1: the list of available USARTs and UARTs on all Nucleo boards
However, all the HAL functions related to UART management are designed so that they accept as
first parameter an instance of the C struct UART_HandleTypeDef, which is defined in the following
way:
typedef struct {
USART_TypeDef *Instance; /* UART registers base address */
UART_InitTypeDef Init; /* UART communication parameters */
UART_AdvFeatureInitTypeDef AdvancedInit; /* UART Advanced Features initialization
parameters */
uint8_t *pTxBuffPtr; /* Pointer to UART Tx transfer Buffer */
uint16_t TxXferSize; /* UART Tx Transfer size */
uint16_t TxXferCount; /* UART Tx Transfer Counter */
uint8_t *pRxBuffPtr; /* Pointer to UART Rx transfer Buffer */
uint16_t RxXferSize; /* UART Rx Transfer size */
uint16_t RxXferCount; /* UART Rx Transfer Counter */
DMA_HandleTypeDef *hdmatx; /* UART Tx DMA Handle parameters */
DMA_HandleTypeDef *hdmarx; /* UART Rx DMA Handle parameters */
HAL_LockTypeDef Lock; /* Locking object */
__IO HAL_UART_StateTypeDef State; /* UART communication state */
__IO HAL_UART_ErrorTypeDef ErrorCode; /* UART Error code */
} UART_HandleTypeDef;
Let us see more in depth the most important fields of this struct.
• Instance: is the pointer to the USART descriptor we are going to use. For example, USART2 is
the descriptor of the UART associated to the ST-LINK interface of every Nucleo board.
• Init: is an instance of the C struct UART_InitTypeDef, which is used to configure the UART
interface. We will study it more in depth in a while.
• AdvancedInit: this field is used to configure more advanced UART features like the automatic
BaudRate detection and the TX/RX pin swapping. Some HALs do not provide this additional
field. This happens because USART interfaces are not equal for all STM32 MCUs. This is an
important aspect to keep in mind while choosing the right MCU for your application. The
analysis of this field is outside the scope of this book.
• pTxBuffPtr and pRxBuffPtr: these fields point to the transmit and receive buffer respectively.
They are used as source to transmit TxXferSize bytes over the UART and to receive
RxXferSize when the UART is configured in Full Duplex Mode. The TxXferCount and
RxXferCount fields are used internally by the HAL to take count of transmitted and received
bytes.
• Lock: this field is used internally by the HAL to lock concurrent accesses to UART interfaces.
Universal asynchronous serial communications 227
As said above, the Lock field is used to rule concurrent accesses in almost all HAL routines.
If you take a look to the HAL code, you can see several uses of the __HAL_LOCK() macro,
which is expanded in this way:
#define __HAL_LOCK(__HANDLE__) \
do{ \
if((__HANDLE__)->Lock == HAL_LOCKED) \
{ \
return HAL_BUSY; \
} \
else \
{ \
(__HANDLE__)->Lock = HAL_LOCKED; \
} \
}while (0)
It is not clear why ST engineers decided to take care of concurrent accesses to the HAL
routines. Probably they decided to have a thread safe approach, freeing the application
developer from the responsibility of managing multiple accesses to the same hardware
interface in case of multiple threads running in the same application.
However, this has an annoying side effect for all HAL users: even if my application does
not perform concurrent accesses to the same peripheral, my code will be poor optimized by
a lot of checks about the state of the Lock field. Moreover, that way to lock is intrinsically
thread unsafe, because there is no critical section used to prevent race conditions in case a
more privileged ISR preempts the running code. Finally, if my application uses an RTOS,
it is much better to use native OS locking primitives (like semaphores and mutexes which
are not only atomic, but also correctly manages the task scheduling avoiding the busy
waiting) to handle concurrent accesses, without the need to check for a particular return
value (HAL_BUSY) of the HAL functions.
A lot of developers have disapproved this way to lock peripherals since the first release of
the HAL. ST engineers have recently announced that they are actively working on a better
solution.
All the UART configuration activities are performed by using an instance of the C struct UART_-
InitTypeDef, which is defined in the following way:
Universal asynchronous serial communications 228
typedef struct {
uint32_t BaudRate;
uint32_t WordLength;
uint32_t StopBits;
uint32_t Parity;
uint32_t Mode;
uint32_t HwFlowCtl;
uint32_t OverSampling;
} UART_InitTypeDef;
• BaudRate: this parameter refers to the connection speed, expressed in bits for seconds. Even
if the parameter can assume an arbitrary value, usually the BaudRate comes from a list
of well-known and standard values. This because it is a function of the peripheral clock
associated to the USART (that is derived from the main HSI or HSE clock by a complex chain
of PLLs and multipliers in some STM32 MCU), and not all BaudRates can be easily achieved
without introducing sampling errors, and hence communication errors. Table 2 shows the list
of common BaudRates, and the related error calculation, for an STM32F030 MCU. Always
consult the reference manual for your MCU to see which peripheral clock frequency best fits
the needed BaudRate on the given STM32 microcontroller.
Table 2: Error calculation for programmed baud rates at 48 MHz in both cases of oversampling by 16 or by 8
Universal asynchronous serial communications 229
• WordLength: it specifies the number of data bits transmitted or received in a frame. This field
can assume the value UART_WORDLENGTH_8B or UART_WORDLENGTH_9B, which means that we can
transmit over a UART packets containing 8 or 9 data bits. This number does not include the
overhead bits transmitted, such as the start and stop bits.
• StopBits: this field specifies the number of stop bits transmitted. It can assume the value
UART_STOPBITS_1 or UART_STOPBITS_2, which means that we can use one or two stop bits to
signal the end of the frame.
• Parity: it indicates the parity mode. This field can assume the values from Table 3. Take
note that, when parity is enabled, the computed parity is inserted at the MSB position of the
transmitted data (9th bit when the word length is set to 9 data bits; 8th bit when the word
length is set to 8 data bits). Parity is a very simple form of error checking. It comes in two
flavors: odd or even. To produce the parity bit, all data bits are added up, and the evenness of
the sum decides whether the bit is set or not. For example, assuming parity is set to even and
was being added to a data byte like 0b01011101, which has an odd number of 1’s (5), the parity
bit would be set to 1. Conversely, if the parity mode was set to odd, the parity bit would be 0.
Parity is optional, and not very widely used. It can be helpful for transmitting across noisy
mediums, but it will also slow down data transfer a bit and requires both sender and receiver
to implement error-handling (usually, received data that fails must be re-sent). When a parity
error occurs, all STM32 MCUs generate a specific interrupt, as we will see next.
• Mode: it specifies whether the RX or TX mode is enabled or disabled. This field can assume
one of the values from Table 4.
• HwFlowCtl: it specifies whether the RS232⁵ Hardware Flow Control mode is enabled or
disabled. This parameter can assume one of the values from Table 5.
⁵this field is only used to enable the RS232 flow control. To enable the RS485 flow control, the HAL provides a specific function,
HAL_RS485Ex_Init(), defined inside the stm32xxxx_hal_uart_ex.c file.
Universal asynchronous serial communications 230
• OverSampling: when the UART receives a frame from the remote peer, it samples the signals
in order to compute the number of 1 and 0 constituting the message. Oversampling is the tech-
nique of sampling a signal with a sampling frequency significantly higher than the Nyquist
rate. The receiver implements different user-configurable oversampling techniques (except
in synchronous mode) for data recovery by discriminating between valid incoming data and
noise. This allows a trade-off between the maximum communication speed and noise/clock
inaccuracy immunity. The OverSampling field can assume the value UART_OVERSAMPLING_16
to perform 16 samples for each frame bit or UART_OVERSAMPLING_8 to perform 8 samples. Table
2 shows the error calculation for programmed baud rates at 48 MHz in an STM32F030 MCU
in both cases of oversampling by 16 or by 8.
Now it is a good time to start writing down a bit of code. Let us see how to configure the USART2
of the MCU equipping our Nucleo to exchange messages through the ST-LINK interface.
int main(void) {
UART_HandleTypeDef huart2;
The first step is to configure the USART2 peripheral. Here we are using this configuration: 38400, N,
1. That is, a BaudRate equal to 38400 Bps, no parity check and just one stop bit. Next, we disable any
form of Hardware Flow Control and we choose the highest oversampling rate, that is 16 clock ticks
for each transmitted bit. The call to the HAL_UART_Init() function ensures that the HAL initializes
the USART2 according the given options.
However, the above code is still not sufficient to exchange messages through the Nucleo Virtual
COM Port. Don’t forget that every peripheral designed to exchange data with the outside world
must be properly bound to corresponding GPIOs, that is we have to configure the USART2 TX and
RX pins. Looking to the Nucleo schematics, we can see that USART2 TX and RX pins are PA2 and
PA3 respectively. Moreover, we have already seen in Chapter 4 that the HAL is designed so that
HAL_UART_Init() function automatically calls the HAL_UART_MspInit() (see Figure 19 in Chapter
4) to properly initialize the I/Os: it is our responsibility to write this function in our application code,
which we will be automatically called by the HAL.
The function attribute __weak is a GCC way to declare a symbol (here, a function name)
with a weak scope visibility, which we will be overwritten if another symbol with the same
name with a global scope (that is, without the __weak attribute) is defined elsewhere in
the application (that is, in another relocatable file). The linker will automatically substitute
the call to the function HAL_UART_MspInit() defined inside the HAL if we implement it in
our application code.
The code below shows how to correctly code the HAL_UART_MspInit() function.
GPIO_InitStruct.Pin = USART_TX_Pin|USART_RX_Pin;
GPIO_InitStruct.Mode = GPIO_MODE_AF_PP;
GPIO_InitStruct.Pull = GPIO_NOPULL;
GPIO_InitStruct.Speed = GPIO_SPEED_LOW;
GPIO_InitStruct.Alternate = GPIO_AF1_USART2; /* WARNING: this depends on
the specific STM32 MCU */
HAL_GPIO_Init(GPIOA, &GPIO_InitStruct);
}
}
As you can see, the function is designed so that it is common for every USART used inside the
application. The if statement disciplines the initialization code for the given USART (in our case,
USART2). The remaining of code configures the PA2 and PA3 pins. Please, take note that the
alternate function may change for the MCU equipping your Nucleo. Consult the book examples
to see the right initialization code for your Nucleo.
Once we have configured the USART2 interface, we can start exchanging messages with our PC.
Please, take note that the code presented before could not be sufficient to correctly initialize
the USART peripheral for some STM32 MCUs. Some STM32 microcontrollers, like the
STM32F334R8, allow the developer to choose the clock source for a given peripheral (for
example, the USART2 in an STM32F334R8 MCU can be optionally clocked from SYSCLK,
HSI, LSE or PCLK1). It is strongly suggested to use CubeMX the first time you configure the
peripherals for your MCU and to check carefully the generated code looking for this kind
of exceptions. Otherwise, the datasheet is the only source for this information.
Once we have configured the USART interface, we can generate the C code. You will notice that
CubeMX places all the USART2 initialization code inside the MX_USART2_UART_Init() (which is
contained in the main.c file). Instead, all the code related to GPIO configuration is placed into the
HAL_UART_MspInit() function, which is contained inside the stm32xxxx_hal_msp.c file.
• In polling mode, also called blocking mode, the main application, or one of its threads,
synchronously waits for the data transmission and reception. This is the most simple form
of data communication using this peripheral, and it can be used when the transmit rate is not
too much low and when the UART is not used as critical peripheral in our application (the
classical example is the usage of the UART as output console for debug activities).
Universal asynchronous serial communications 234
• In interrupt mode, also called non-blocking mode, the main application is freed from waiting
for the completion of data transmission and reception. The data transfer routines terminate
as soon as they complete to configure the peripheral. When the data transmission ends, a
subsequent interrupt will signal the main code about this. This mode is more suitable when
communication speed is low (below 38400 Bps) or when it happens “rarely”, compared to other
activities performed by the MCU, and we do not want to stuck it waiting for data transmission.
• DMA mode offers the best data transmission throughput, thanks to the direct access of the
UART peripheral to MCU internal RAM. This mode is best for high-speed communications
and when we totally want to free the MCU from the overhead of data transmission. Without
the DMA mode, it is almost impossible to reach the fastest transfer rates that the USART
peripheral is capable to handle. In this chapter we will not see this USART communication
mode, leaving it to the next chapter dedicated to DMA management.
To transmit a sequence of bytes over the USART in polling mode the HAL provides the function
where:
• huart: it the pointer to an instance of the struct UART_HandleTypeDef seen before, which
identifies and configures the UART peripheral;
• pData: is the pointer to an array, with a length equal to the Size parameter, containing the
sequence of bytes we are going to transmit;
• Timeout: is the maximum time, expressed in milliseconds, we are going to wait for the transmit
completion. If the transmission does not complete in the specified timeout time, the function
aborts and returns the HAL_TIMEOUT value; otherwise it returns the HAL_OK value if no other
errors occur. Moreover, we can pass a timeout equal to HAL_MAX_DELAY (0xFFFFFFFF) to wait
indefinitely for the transmit completion.
Conversely, to receive a sequence of bytes over the USART in polling mode the HAL provides the
function
where:
• huart: it the pointer to an instance of the struct UART_HandleTypeDef seen before, which
identifies and configures the UART peripheral;
Universal asynchronous serial communications 235
• pData: is the pointer to an array, with a length at lest equal to the Size parameter, containing
the sequence of bytes we are going to receive. The function will block until all bytes specified
by the Size parameter are received.
• Timeout: is the maximum time, expressed in milliseconds, we are going to wait for the receive
completion. If the transmission does not complete in the specified timeout time, the function
aborts and returns the HAL_TIMEOUT value; otherwise it returns the HAL_OK value if no other
errors occur. Moreover, we can pass a timeout equal to HAL_MAX_DELAY (0xFFFFFFFF) to wait
indefinitely for the receive completion.
Read carefully
It is important to remark that the timeout mechanism offered by the two functions works
only if the HAL_IncTick() routine is called every 1ms, as done by the code generated by
CubeMX (the function that increments the HAL tick counter is called inside the SysTick
timer ISR).
Filename: src/main-ex1.c
21 int main(void) {
22 uint8_t opt = 0;
23
24 /* Reset of all peripherals, Initializes the Flash interface and the SysTick. */
25 HAL_Init();
26
27 /* Configure the system clock */
28 SystemClock_Config();
29
30 /* Initialize all configured peripherals */
31 MX_GPIO_Init();
32 MX_USART2_UART_Init();
33
34 printMessage:
35
36 printWelcomeMessage();
37
38 while (1) {
39 opt = readUserInput();
40 processUserInput(opt);
41 if(opt == 3)
42 goto printMessage;
43 }
44 }
Universal asynchronous serial communications 236
45
46 void printWelcomeMessage(void) {
47 HAL_UART_Transmit(&huart2, (uint8_t*)"\033[0;0H", strlen("\033[0;0H"), HAL_MAX_DELAY);
48 HAL_UART_Transmit(&huart2, (uint8_t*)"\033[2J", strlen("\033[2J"), HAL_MAX_DELAY);
49 HAL_UART_Transmit(&huart2, (uint8_t*)WELCOME_MSG, strlen(WELCOME_MSG), HAL_MAX_DELAY);
50 HAL_UART_Transmit(&huart2, (uint8_t*)MAIN_MENU, strlen(MAIN_MENU), HAL_MAX_DELAY);
51 }
52
53 uint8_t readUserInput(void) {
54 char readBuf[1];
55
56 HAL_UART_Transmit(&huart2, (uint8_t*)PROMPT, strlen(PROMPT), HAL_MAX_DELAY);
57 HAL_UART_Receive(&huart2, (uint8_t*)readBuf, 1, HAL_MAX_DELAY);
58 return atoi(readBuf);
59 }
60
61 uint8_t processUserInput(uint8_t opt) {
62 char msg[30];
63
64 if(!opt || opt > 3)
65 return 0;
66
67 sprintf(msg, "%d", opt);
68 HAL_UART_Transmit(&huart2, (uint8_t*)msg, strlen(msg), HAL_MAX_DELAY);
69
70 switch(opt) {
71 case 1:
72 HAL_GPIO_TogglePin(GPIOA, GPIO_PIN_5);
73 break;
74 case 2:
75 sprintf(msg, "\r\nUSER BUTTON status: %s",
76 HAL_GPIO_ReadPin(GPIOC, GPIO_PIN_13) == GPIO_PIN_RESET ? "PRESSED" : "RELEASED\
77 ");
78 HAL_UART_Transmit(&huart2, (uint8_t*)msg, strlen(msg), HAL_MAX_DELAY);
79 break;
80 case 3:
81 return 2;
82 };
83
84 return 1;
85 }
The example is a sort of bare-bone management console. The application starts printing a welcome
message (lines 36) and then entering in a loop waiting for the user choice. The first option allows
Universal asynchronous serial communications 237
to toggle the LD2 LED, while the second to read the status of the USER button. Finally, the option 3
causes that the welcome screen is printed again.
The two strings "\033[0;0H" and "\033[2J" are escape sequences. They are standard
sequences of chars used to manipulate the terminal console. The first one places the cursor
in the top-left part of the available console screen, and the second one clears the screen.
To interact with this simple management console, we need a serial communication program. There
are several options available. The easy one is to use a standalone program like putty⁸ for the
Windows platform (if you have an old Windows version, you can also consider to use the classical
HyperTerminal tool), or kermit⁹ for Linux and MacOS. However, we will now introduce a solution
to have an integrated serial communication tool inside the Eclipse IDE. As usual, the instructions
differ between Windows, Linux and MacOS.
Click on OK and install the release RXTX 2.1-7r4 following the instructions.
Once, the installation has been completed, go to Help->Eclipse Marketplace…. In the Find text box
write “tcf”. After a while, the TM Terminal plug-in should appear, as shown in Figure 7. Click on
the Install button and follow the instructions. Restart Eclipse when requested.
⁸http://bit.ly/1jsQjnt
⁹http://www.columbia.edu/kermit/
¹⁰http://rxtx.qbang.org/
Universal asynchronous serial communications 238
To open the Terminal panel you can simply press Ctrl+Alt+T, or you can go to Window->Show
View->Other… menu and search for Terminal view.
By default, the Terminal pane opens a new command line prompt. Click on the Open a Terminal
Universal asynchronous serial communications 239
icon (the one circled in red in Figure 8). In the Launch Terminal dialog (see Figure 9) select Serial
Terminal as terminal type, and then select the COM Port corresponding to the Nucleo VCP, and
38400Bps as Baud Rate. Click on the OK button.
Now you can reset the Nucleo. The management console we have programmed using the HAL_UART
library should appear in the serial console window, as shown in Figure 10.
Figure 10: the Nucleo management console shown in the terminal view
Once, the installation has been completed, switch to Eclipse and go to Help->Eclipse Marketplace….
In the Find text box write “tcf”. After a while, the TM Terminal plug-in should appear, as shown in
Figure 7. Click on the Install button and follow the instructions. Restart Eclipse when requested.
To open the Terminal panel you can simply press Ctrl+Alt+T, or you can go to Window->Show
View->Other… menu and search for Terminal view. The command line prompt appears. Before we
can connect to the Nucleo VCP, we have to identify the corresponding device under the /dev path.
Usually, on UNIX like systems the USB serial devices are mapped with a device name similar to
/dev/tty.usbmodem1a1213. Take a look to your /dev folder. Once you grab the device filename, you
can launch the kermit tool and execute the commands shown below at the kermit console:
$ kermit
C-Kermit 9.0.302 OPEN SOURCE:, 20 Aug 2011, for Mac OS X 10.9 (64-bit)
Copyright (C) 1985, 2011,
Trustees of Columbia University in the City of New York.
Type ? or HELP for help.
(/Users/cnoviello/) C-Kermit>set line /dev/tty.usbmodem1a1213
(/Users/cnoviello/) C-Kermit>set speed 38400
/dev/tty.usbmodem1a1213, 38400 bps
(/Users/cnoviello/) C-Kermit>set carrier-watch off
(/Users/cnoviello/) C-Kermit>c
Connecting to /dev/tty.usbmodem1a1213, speed 38400
Escape character: Ctrl-\ (ASCII 28, FS): enabled
Type the escape character followed by C to get back,
or followed by ? to see other options.
----------------------------------------------------
To avoid retyping the above commands every time you launch kermit, you can create a
file named ∼/.kermrc inside your home directory, and put inside it the above commands.
kermit will load those commands automatically when it is executed.
Now you can reset the Nucleo. The management console we have programmed using the HAL_UART
library should appear in the serial console window, as shown in Figure 10.
Universal asynchronous serial communications 241
38 while (1) {
39 opt = readUserInput();
40 processUserInput(opt);
41 if(opt == 3)
42 goto printMessage;
43
44 performCriticalTasks();
45 }
In this case we cannot block the execution of function processUserInput() waiting for the user
choice, but we have to specify a much more short timeout value to the HAL_UART_Receive()
function, otherwise performCriticalTasks() is never executed. However, this could cause the loss
of important data coming from the UART peripheral (remember that the UART interface has a one
byte wide buffer).
To address this issue the HAL offers another way to exchange data over a UART peripheral: the
interrupt mode. To use this mode, we have to accomplish the following tasks:
Before we rearrange the code from the first example, it is best to take a look to the available UART
interrupts and to the way HAL routines are designed.
¹¹If we use CubeMX to enable the USARTx_IRQn from the NVIC configuration section (as shown in Chapter 7), it will
automatically place the call to the HAL_UART_IRQHandler() from the ISR.
Universal asynchronous serial communications 242
• IRQs generated during transmission: Transmission Complete, Clear to Send or Transmit Data
Register empty interrupt.
• IRQs generated while receiving: Idle Line detection, Overrun error, Receive Data register not
empty, Parity error, LIN break detection, Noise Flag (only in multi buffer communication) and
Framing Error (only in multi buffer communication).
These events generate an interrupt if the corresponding Enable Control Bit is set (third column of
Table 6). However, STM32 MCUs are designed so that all these IRQs are bound to just one ISR
for every USART peripheral (see Figure 11¹²). For example, the USART2 defines only the USART2_-
IRQn as IRQ for all interrupts generated by this peripheral. It is up to the user code to analyze the
corresponding Event Flag to infer which interrupt has generated the request.
¹²The Figure 9s taken from the STM32F030 Reference Manual (RM0390).
Universal asynchronous serial communications 243
Figure 11: how the USART interrupt events are connected to the same interrupt vector
The CubeHAL is designed to automatically do this job for us. The user is warned about the interrupt
generation thanks to a series of callback functions invoked by the HAL_UART_IRQHandler(), which
must be called inside the ISR.
From a technical point of view, there is not so much difference between UART transmission in
polling and in interrupt mode. Both the methods transfer an array of bytes using the UART Data
Register (DR) with the following algorithm:
• For data transmission, place a byte inside the USART->DR register and wait until the Transmit
Data Register Empty(TXE) flag is asserted true.
• For data reception, wait until the Received Data Ready to be Read(RXNE) is not asserted true,
and then store the content of the USART->DR register inside the application memory.
The difference between the two methods consists in how they wait for the completion of data trans-
mission. In polling mode, the HAL_UART_Receive()/HAL_UART_Transmit() functions are designed
so that it waits for the corresponding event flag to be set, for every byte we want to transmit.
In interrupt mode, the function HAL_UART_Receive_IT()/HAL_UART_Transmit_IT() are designed so
that they do not wait for data transmission completion, but the dirty job to place a new byte inside
the DR register, or to load its content inside the application memory, is accomplished by the ISR
routine when the RXNEIE/TXEIE interrupt is generated¹³.
To transmit a sequence of bytes in interrupt mode, the HAL defines the function:
¹³This is the reason why transferring a sequence of bytes in interrupt mode is not a smart thing when the communication speed
is too high, or when we have to transfer a great amount of data very often. Since the transmission of each byte happens quickly, the
CPU will be congested by the interrupts generated by the UART for every byte transmitted. For continuous transmission of great
sequences of bytes at high speed is best to use the DMA mode, as we will see in the next chapter.
Universal asynchronous serial communications 244
where:
• huart: it is the pointer to an instance of the struct UART_HandleTypeDef seen before, which
identifies and configures the UART peripheral;
• pData: it is the pointer to an array, with a length equal to the Size parameter, containing the
sequence of bytes we are going to transmit; the function will not block waiting for the data
transmission, and it will pass the control to the main flow as soon as it completes to configure
the UART.
Conversely, to receive a sequence of bytes over the USART in interrupt mode the HAL provides the
function:
where:
• huart: it is the pointer to an instance of the struct UART_HandleTypeDef seen before, which
identifies and configures the UART peripheral;
• pData: it is the pointer to an array, with a length at lest equal to the Size parameter, containing
the sequence of bytes we are going to receive. The function will not block waiting for the data
reception, and it will pass the control to the main flow as soon as it completes to configure
the UART.
Filename: src/main-ex2.c
48 if(opt == 3)
49 goto printMessage;
50 }
51 performCriticalTasks();
52 }
53 }
54
55 int8_t readUserInput(void) {
56 int8_t retVal = -1;
57
58 if(UartReady == SET) {
59 UartReady = RESET;
60 HAL_UART_Receive_IT(&huart2, (uint8_t*)readBuf, 1);
61 retVal = atoi(readBuf);
62 }
63 return retVal;
64 }
65
66
67 uint8_t processUserInput(int8_t opt) {
68 char msg[30];
69
70 if(!(opt >=1 && opt <= 3))
71 return 0;
72
73 sprintf(msg, "%d", opt);
74 HAL_UART_Transmit(&huart2, (uint8_t*)msg, strlen(msg), HAL_MAX_DELAY);
75 HAL_UART_Transmit(&huart2, (uint8_t*)PROMPT, strlen(PROMPT), HAL_MAX_DELAY);
76
77 switch(opt) {
78 case 1:
79 HAL_GPIO_TogglePin(GPIOA, GPIO_PIN_5);
80 break;
81 case 2:
82 sprintf(msg, "\r\nUSER BUTTON status: %s",
83 HAL_GPIO_ReadPin(GPIOC, GPIO_PIN_13) == GPIO_PIN_RESET ? "PRESSED" : "RELEASED");
84 HAL_UART_Transmit(&huart2, (uint8_t*)msg, strlen(msg), HAL_MAX_DELAY);
85 break;
86 case 3:
87 return 2;
88 };
89
90 return 1;
91 }
92
Universal asynchronous serial communications 246
As you can see in the above code, the first step is to enable the USART2_IRQn and to assign it a
priority¹⁴. Next, we define the corresponding ISR inside the stm32xxxx_it.c file (not shown here)
and we add the call to the HAL_UART_IRQHandler() function inside it. The remaining part of the
example file is all about restructuring the readUserInput() and processUserInput() functions to
deal with asynchronous events.
The function readUserInput() now checks for the value of the global variable UartReady. If it is
equal to SET, it means that the user has sent a char to the management console. This character is
contained inside the global array readBuf. The function then calls the HAL_UART_Receive_IT() to
receive another character in interrupt mode. When readUserInput() returns a value greater than
0, the function processUserInput() is called. Finally, the function HAL_UART_RxCpltCallback(),
which is automatically called by the HAL when one byte is received, is defined: it simply sets the
global UartReady variable, which in turn is used by the readUserInput() as seen before.
It is important to clarify that the function HAL_UART_RxCpltCallback() is called only when all
the bytes specified with the Size parameter, passed to the HAL_UART_Receive_IT() function, are
received.
What about the HAL_UART_Transmit_IT() function? It works in a way similar to the HAL_UART_-
Receive_IT(): it transfers the next byte in the array every time the Transmit Data Register
Empty(TXE) interrupt is generated. However, special care must be taken when calling it multiple
times. Since the function returns the control to the caller as soon as it finishes to setup the UART, a
subsequent call of the same function will fail and it will return the HAL_BUSY value.
Suppose to rearrange the function printWelcomeMessage() from the previous example in the
following way:
void printWelcomeMessage(void) {
HAL_UART_Transmit_IT(&huart2, (uint8_t*)"\033[0;0H", strlen("\033[0;0H"));
HAL_UART_Transmit_IT(&huart2, (uint8_t*)"\033[2J", strlen("\033[2J"));
HAL_UART_Transmit_IT(&huart2, (uint8_t*)WELCOME_MSG, strlen(WELCOME_MSG));
HAL_UART_Transmit_IT(&huart2, (uint8_t*)MAIN_MENU, strlen(MAIN_MENU));
HAL_UART_Transmit_IT(&huart2, (uint8_t*)PROMPT, strlen(PROMPT));
}
The above code will never work correctly, since each call to the function HAL_UART_Transmit_IT() is
much faster than the UART transmission, and the subsequent calls to the HAL_UART_Transmit_IT()
will fail.
¹⁴The example is designed for an STM32F4. Please, refer to the book examples for your specific Nucleo.
Universal asynchronous serial communications 247
If speed is not a strict requirement for your application, and the use of the HAL_UART_Transmit_IT()
is limited to few parts of your application, the above code could be rearranged in the following way:
void printWelcomeMessage(void) {
char *strings[] = {"\033[0;0H", "\033[2J", WELCOME_MSG, MAIN_MENU, PROMPT};
Here we transfer each string using the HAL_UART_Transmit_IT() but, before we transfer the next
string, we wait to the transmission completion. However, this is just a variant of the HAL_UART_-
Transmit(), since we have a busy wait for every UART transfer.
A more elegant and performing solution is to use a temporary memory area where to store the byte
sequences and to let the ISR to execute the transfer. A queue is the best options to handle FIFO
events. There are several ways to implement a queue, both using static and dynamic data structure.
If we decide to implement a queue with a predefined area of memory, a circular buffer is the data
structure suitable for this kind of applications.
Figure 12: a circular buffer implemented using an array and two pointers
Universal asynchronous serial communications 248
A circular buffer is nothing more than an array with a fixed size where two pointers are used to keep
track of the head and the tail of data that still needs to be processed. In a circular buffer, the first and
the last position of the array are seen “contiguous” (see Figure 12). This is the reason why this data
structure is called circular. Circular buffers have an important feature too: unless our application
has up to two concurrent execution streams (in our case, the main flow that places chars inside the
buffer and the ISR routine that sends these chars over the UART), they are intrinsically thread safe,
since the “consumer” thread (the ISR in our case) will update only the tail pointer and the producer
(the main flow) will update only the head one.
Circular buffers can be implemented in several ways. Some of them are faster, others are more safe
(that is, they add an extra overhead ensuring that we handle the buffer content correctly). You will
find a simple and quite fast implementation in the book examples. Explaining how it is coded is
outside the scope of this book.
Using a circular buffer, we can define a new UART transmit function in the following way:
The function does just two things: it tries to send the buffer over the UART in interrupt mode; if
the HAL_UART_Transmit_IT() function fails (which means that the UART is already transmitting
another message), then the byte sequence is placed inside a circular buffer.
It is up to the HAL_UART_TxCpltCallback() to check for pending bytes inside the circular buffer:
Then, you could call this function from the main application code or in a lower privileged
task if you are using an RTOS.
The HAL_UART_IRQHandler() is designed so that we should not care with the implementation details
of UART error management. The HAL code will automatically perform all needed steps to handle
the error (like clearing event flags, pending bit and so on), leaving to us the responsibility to handle
the error at application level (for example, we may ask to the other peer to resend a corrupted frame).
Universal asynchronous serial communications 250
Read carefully
At the time of writing this chapter, December 2nd 2015, a subtle bug prevents the right
management of the Overrun error. You can read more about it on the official ST forum¹⁵.
You can reproduce this bug even with the second example of this chapter. Run the example
on your Nucleo, and hit the key ‘3’ on your keyboard leaving it pressed. After a while,
the firmware will hang. This happens because, after the Overrun error occurs, the HAL
does not restart the receiving process again. You can address this bug implementing the
HAL_UART_ErrorCallback() function in the following way:
Now that we are familiar with the UART management, we can redefine the needed system calls
(_write(), _read() and so on) to retarget the STDIN, STDOUT and STDERR standard streams to the
Nucleo USART2. This can be easily done in the following way:
¹⁵http://bit.ly/1Pvim7X
Universal asynchronous serial communications 251
Filename: system/src/retarget/retarget.c
14 #if !defined(OS_USE_SEMIHOSTING)
15
16 #define STDIN_FILENO 0
17 #define STDOUT_FILENO 1
18 #define STDERR_FILENO 2
19
20 UART_HandleTypeDef *gHuart;
21
22 void RetargetInit(UART_HandleTypeDef *huart) {
23 gHuart = huart;
24
25 /* Disable I/O buffering for STDOUT stream, so that
26 * chars are sent out as soon as they are printed. */
27 setvbuf(stdout, NULL, _IONBF, 0);
28 }
29
30 int _isatty(int fd) {
31 if (fd >= STDIN_FILENO && fd <= STDERR_FILENO)
32 return 1;
33
34 errno = EBADF;
35 return 0;
36 }
37
38 int _write(int fd, char* ptr, int len) {
39 HAL_StatusTypeDef hstatus;
40
41 if (fd == STDOUT_FILENO || fd == STDERR_FILENO) {
42 hstatus = HAL_UART_Transmit(gHuart, (uint8_t *) ptr, len, HAL_MAX_DELAY);
43 if (hstatus == HAL_OK)
44 return len;
45 else
46 return EIO;
47 }
48 errno = EBADF;
49 return -1;
50 }
51
52 int _close(int fd) {
53 if (fd >= STDIN_FILENO && fd <= STDERR_FILENO)
54 return 0;
55
56 errno = EBADF;
Universal asynchronous serial communications 252
57 return -1;
58 }
59
60 int _lseek(int fd, int ptr, int dir) {
61 (void) fd;
62 (void) ptr;
63 (void) dir;
64
65 errno = EBADF;
66 return -1;
67 }
68
69 int _read(int fd, char* ptr, int len) {
70 HAL_StatusTypeDef hstatus;
71
72 if (fd == STDIN_FILENO) {
73 hstatus = HAL_UART_Receive(gHuart, (uint8_t *) ptr, 1, HAL_MAX_DELAY);
74 if (hstatus == HAL_OK)
75 return 1;
76 else
77 return EIO;
78 }
79 errno = EBADF;
80 return -1;
81 }
82
83 int _fstat(int fd, struct stat* st) {
84 if (fd >= STDIN_FILENO && fd <= STDERR_FILENO) {
85 st->st_mode = S_IFCHR;
86 return 0;
87 }
88
89 errno = EBADF;
90 return 0;
91 }
92
93 #endif //#if !defined(OS_USE_SEMIHOSTING)
To retarget the standard streams in your firmware, you have to remove the macro OS_USE_SEMI-
HOSTING at project level, and to initialize the library calling the RetargetInit() passing the pointer
to the UART_HandleTypeDef instance of the UART2. For example, the following code shows how to
use printf()/scanf() functions in your firmware:
Universal asynchronous serial communications 253
int main(void) {
char buf[20];
HAL_Init();
SystemClock_Config();
MX_GPIO_Init();
MX_USART2_UART_Init();
RetargetInit(&huart2);
If you are going to use printf()/scanf() functions to print/read float datatypes on the serial
console (but also if you are going to use sprintf() and similar routines), you need to explicitly
enable float support in newlib-nano, which is the more compact version of the C runtime library
for embedded systems. To do this, go to Project->Properties… menu, then go to C/C++ Build-
>Settings->Cross ARM C++ Linker->Miscellaneous and check Use float with nano printf/scanf
according the feature you need, as shown in Figure 13. This will increase the firmware binary size.
254
DMA Management 255
most important aspects to keep in mind during the study of this peripheral. Moreover, they try to
address the implementation differences between STM32F2/4/7 and other STM32 families.
9.1.1 The need of a DMA and the role of the internal buses
Why the DMA is a so important feature? Every peripheral in an STM32 microcontroller needs to
exchange data with the internal Cortex-M core. Some of them translate this data in electrical I/O
signals to exchange it to the outside world according a given communication protocol (this is the
case, for example, of UART or SPI interfaces). Others are just designed so that the access to their
registers inside the peripheral memory mapped region (from 0x4000 0000 to 0x5FFF FFFF) causes
a changing to their state (for example, the GPIOx->ODR register drives the state of all I/Os connected
to that port). However, keep in mind that from the CPU point of view this also implies a memory
transfer between the core and the peripheral.
The MCU core, in theory, could be designed so that every peripheral would have its own storage area,
and it in turn could be tightly coupled with the CPU core to minimize the costs related to memory
transfers³. This, however, complicates the MCU architecture, requiring a lot of more silicon and more
“active components” that consume power. So, the approach used in all embedded microcontrollers
is to use some portions of the internal memory (SRAM as well FLASH) as temporary area storage
for different peripherals. It is up to the user to decide how much room to dedicate to these areas. For
example, let us consider this code fragment:
uint8_t buf[20];
...
HAL_UART_Receive(&huart2, buf, 20, HAL_MAX_DELAY);
Here we are going to read twenty bytes from the UART2 interface, and hence we allocate an array
(the temporary storage) of the same size inside the SRAM. The HAL_UART_Receive() function will
access twenty times to the huart2.Instance->DR data register to transfer bytes from the peripheral
to the internal memory, plus it will poll the UART RXNE flag to detect when the new data is ready
to be transferred. The CPU will be involved during these operations (see Figure 1), even if its role
is “limited” to move data from the peripheral to the SRAM⁴.
³This is what happens in some vector processors equipping really expensive supercomputers, but this is not the case of 32 cents
CPUs like the STM32.
⁴Keep in mind that using the UART in interrupt mode does not change the story. Once the UART generates the interrupt to
signal the core that new data is arriving, it is always up to the CPU to “move” this data byte-by-byte from UART data register to
the SRAM. That’s the reason why from the performance point of view there is no difference between UART management in polling
and interrupt mode.
DMA Management 256
While this approach simplifies the design of the hardware on the one hand, it introduces performance
penalties on the other. The Cortex-M core is “responsible” to load data from peripheral memory to
the SRAM, and this is a blocking operation, which not only prevents the CPU from doing other
activities but it also requires the CPU to wait for “slower” units completing their job (some STM32
peripheral are connected to the core by slower buses, as we will see in a subsequent chapter). This is
the reason why high performance microcontrollers provide hardware units dedicated to the transfer
of data between peripherals and centralized buffer storage, that is the SRAM.
Before we go more in depth inside the DMA details, it is better to take an overview of all components
involved in the transfer process of data from a peripheral to the SRAM memory and vice versa. We
have already seen in Chapter 6 the bus architecture of the STM32F030 MCU, one of the simplest
STM32 microcontrollers. The bus architecture is shown again in Figure 2⁵ for convenience. It differs
a lot from other more performant STM32 families. We will analyze them later in this chapter, since
it is best to keep it simple in this phase.
The figure says to us some important things:
• Both the Cortex-M core and the DMA1 controller interact with the other MCU peripherals
through a series of buses. If it is still unclear, it is important to remark that also the FLASH
and SRAM memories are components outside the MCU core, and hence they need to interact
each other through a bus interconnection.
• Both the Cortex-M core and the DMA1 controller are masters. This means they are the only
units that can start a transaction on a bus. However, the access to the bus must be regulated
so that they cannot access to the same slave peripheral at the same time.
• The BusMatrix manages the access arbitration between the Cortex-M core the DMA1
controller. The arbitration uses a Round Robin algorithm to rule the access to the bus. The
BusMatrix is composed of two masters (CPU, DMA) and four slaves (Flash interface, SRAM,
AHB1 with AHB to Advanced Peripheral Bus (APB) bridge and AHB2). The BusMatrix also
allows to automatically interconnect several peripherals between them. This topic will be
analized in a subsequent chapter.
⁵Figure 1 is taken from the ST STM32F030 Reference Manual (http://bit.ly/1GfS3iC).
DMA Management 257
The acronyms AHB, AHB1, AHB2, APB and so on are always confusing terms in the STM32 world.
They represent two things at the same time:
• They are hardware components used to connect different units inside the MCU to allow them
exchanging data. They can be clocked by different clock sources, with different speeds (more
about this in a subsequent chapter). This means that the access to slower buses can introduce
bottlenecks in your application.
• They are part of a more general specification, the ARM Advanced Microcontroller Bus
Architecture (AMBA) that defines the way different functional blocks interact each other
inside an MCU. The AMBA is an open-standard, and it is implemented in different releases
(and flavors) in all ARM Cortex processors (Cortex-A and Cortex-R included).
We left off one other thing in Figure 2: the DMA requests arrow that goes from the peripherals
block (white rectangle) to the DMA1 controller. What does it accomplish? In Chapter 7 we have
DMA Management 258
seen that the NVIC controller notifies the Cortex-M core about asynchronous interrupt requests
(IRQs) coming from peripherals. When a peripheral is ready to do something (e.g., the UART is
ready to receive data or a timer overflows), it asserts a dedicated IRQ line. The core executes in a
given number of cycles a dedicated ISR, which contains the code necessary to handle the IRQ. Don’t
forget that the peripherals are slave units: they cannot access the bus independently. A master is
always needed to start a transaction. But, since peripherals are slave units, if we use the DMA to
transfer data from the peripherals to memory we have a way to notify it that the peripherals are
ready to exchange data. That is the reason why a dedicated number of requests lines are available
from peripherals to the DMA controller. We will see in the next paragraph how they are organized
and how we can program them.
• has two master ports, named peripheral and memory port respectively, connected to the
Advanced High-performance Bus (AHB), one able to interface a slave peripheral and the other
one a memory controller (SRAM, FLASH, FSMC, etc.); in some DMA controllers a peripheral
port is also able to interface a memory controller, allowing memory-to-memory transfers;
• has one slave port, connected to the AHB bus, used to program the DMA controller from the
other master, that is the CPU;
• has a number of independent and programmable channels (request sources), each one
connectable to a given peripheral request line (UART_TX, TIM_UP etc. - the number and type of
requests for a channel is established during the MCU design);
• allows to assign different priorities to channels, in order to arbitrate the access to the memory
giving higher priority to more fast and important peripherals;
• allows the data to flow in both directions, that is from memory-to-peripheral and from
peripheral-to- memory.
Each STM32 MCU provides a variable number of DMAs and Channels according its family and sales
type. The Table 1 reports their exact number for STM32 MCUs equipping all Nucleo boards.
These characteristics are broadly common to all STM32 microcontrollers. However, STM32F2/F4/F7
families provide a more advanced DMA controller in conjunction with a multilayer BusMatrix that
allows boosting and parallelizing DMA transfers. This is the reason why we are going to treat them
separately⁶.
⁶However, keep in mind that this book does not aim to be an exhaustive source of hardware details of each STM32 family.
Always keep in your hands the reference manual for the MCU you are considering, and look carefully to the chapter related to the
DMA.
DMA Management 259
The Figure 3 shows a representation of the DMA in F0/F1/F3/L1 MCUs. Here, for simplicity, only
one request line is shown, but each DMA implements a request line for each channel. Each request
line has a variable number of peripheral request sources connected to it. A channel is bound during
the chip design to a fixed set of peripherals. However, only one peripheral at once can be active
in the same channel. For example, Table 2⁷ shows how channels are bound to peripherals in an
STM32F030 MCU. Every request line can be also triggered by “software”. This ability is used to
perform memory-to-memory transfers.
Each channel has a configurable priority that allows to rule the access to the AHB bus. An internal
arbiter rules the requests coming from the channels according a user configurable priority. If two
request lines activate a request and their channels have the same priority, the channel with the lower
number wins the contention.
⁷The Table is taken from the ST RM0360 reference manual (http://bit.ly/1GfS3iC)
DMA Management 260
Depending on the sales type used (Value Line, Performance Line, etc.) one ore two DMA controllers
are available, for a total of 12 independent channels (5 for DMA1 and 7 for DMA2). For example, as
shown in Table 2, the STM32F030 provides only DMA1, with 5 channels.
DMA Management 261
We have already seen in Figure 2 the bus architecture of an STM32F030. For the sake of completeness,
Figure 4⁸ shows the bus architecture of more performant MCUs with the same DMA implementation
(e.g., the STM32F1). As you can see, the two families have a quite different internal bus organization.
You can see two additional buses named ICode and DCode. Why this difference?
The most of STM32 MCUs share the same computer architecture except for STM32F0 and STM32L0
that are based on the Cortex-M0/0+ cores. They, in fact, are the only Cortex-M cores based on the
von Neumann architecture, compared to the other Cortex-M cores that are based on the Harvard
architecture⁹. The fundamental distinction between the two architectures is that Cortex-M0/0+ cores
access to FLASH, SRAM and peripherals using one common bus, while the other Cortex-M cores
have two separated bus lines for the access to the FLASH (one for the fetch of instructions called
instruction bus, or simply I-bus, and one for the access to const data called data bus, or simply D-bus)
and one dedicated line for the access to SRAM and peripherals (also called system bus, or simply
S-bus). What advantages gives this to our applications?
⁸The figure is taken from the RM0008 reference manual from ST (http://bit.ly/1TNekGo)
⁹For the sake of completeness, we have to say that they are based on a modified Harvard architecture
(https://en.wikipedia.org/wiki/Modified_Harvard_architecture), but let us leave the distinction to historians of computer
science.
DMA Management 262
Figure 4: the bus architecture in an STM32F1 MCU from the Connectivity Line
In Cortex-M0/0+ cores the DMA and the Cortex core contend the access to memories and peripherals
using the BusMatrix. Suppose that the CPU is performing math operations on data contained in its
internal registers (R0-R14). If the DMA is transferring data to the SRAM, the BusMatrix arbitrates
the access from the Cortex core to the FLASH memory to load the next instruction to execute. So
the core is stalled waiting for its turn (more about this in a while). In the other Cortex-M cores, the
CPU can access to the FLASH memory independently, boosting the overall performances. This is a
fundamental difference that justifies the price of STM32F0 MCU: they not only can have less SRAM
and FLASH and run at lower frequencies, but they face a simpler and intrinsically less performant
architecture.
However, it is important to remark that the BusMatrix implements scheduling policies to avoid that a
given master (CPU and DMA in Value Lines MCUs, or CPU, DMA, Ethernet and USB in Connectivity
Lines MCUs) stalls for too much time. Each DMA transfer is made up of four phases: a sample and
arbitration phase, an address computation phase, bus access phase and a final acknowledgement
phase (which is used to signal that the transfer has been completed). Each phase takes a single cycle,
with the exception of the bus access phase, which can last for a higher number of cycles. However,
its duration is fixed, and the BusMatrix guarantees that at the end of the acknowledgement phase
DMA Management 263
another master will be scheduled for the access to the bus. As we will see in the next paragraph, the
STM32F2/F4/F7 allow a more advanced parallelism while accessing to slave devices. The details of
these aspects, however, are outside of the scope of this book. It is strongly suggested to have a look
to the AN4031 (http://bit.ly/1n66sW7¹⁰) from ST to better understand them.
Finally, the DMA can also perform peripheral-to-peripheral transfers under particular conditions,
as we will see next.
STMF2/F4/F7 MCUs implement a more advanced DMA controller, as shown in Figure 5. It offers
a higher degree of flexibility compared to the DMA found in other STM32 MCUs. Every DMA
implements 8 different streams. Each stream is dedicated to managing memory access requests from
one or more peripherals. Each stream can have up to 8 channels (requests) in total (but keep in mind
that only one channel/request can be active at the same time in a stream), and it has an arbiter for
handling the priority between DMA requests. Moreover, every stream can optionally provide (is a
configuration option) a four-word depth 32-bit first-in/first-out (FIFO) memory buffer. The FIFO
is used to temporarily store data coming from the source before transmitting it to the destination.
Every stream can be also triggered by “software”. This ability is used to perform memory-to-memory
transfers, but it is limited only to DMA2 as highlighted in Table 1.
Every STM32F2/F4/F7 MCU provides two DMA controllers, for a total of 16 independent streams.
Like in the other STM32 microcontrollers, a channel is bound to a fixed set of peripherals during the
chip design. Table 3 shows the DMA1 stream/channel request mapping in an STM32F401RE MCU.
STM32F2/F4/F7 MCUs embed a multi-masters/multi-slaves architecture made of:
• Eight masters:
– Cortex core I-bus
– Cortex core D-bus
– Cortex core S-bus
– DMA1 memory bus
– DMA2 memory bus
– DMA2 peripheral bus
– Ethernet DMA bus (if available)
– USB high-speed DMA bus (if available)
¹⁰http://bit.ly/1n66sW7
DMA Management 264
• Eight slaves:
– Internal Flash memory I-Code bus
– Internal Flash memory D-Code bus
– Main internal SRAM1
– Auxiliary internal SRAM2 (if available)
– Auxiliary internal SRAM3 (if available)
– AHB1 peripherals including AHB-to-APB bridges and APB peripherals
– AHB2 peripherals
– AHB3 peripheral (FMC) (if available)
Masters and slaves are connected via a multi-layer BusMatrix ensuring concurrent access from
separated masters and efficient operations, even when several high-speed peripherals work simulta-
neously. This architecture is shown in Figure 6¹¹ for the case of STM32F405/415 and STM32F407/417
lines.
The multi-layer Bus Matrix allows different masters to perform data transfers concurrently as long
as they are addressing different slave modules (but for a given DMA, only “one stream” at time can
access to the bus). On top of the Cortex-M Harvard architecture and dual AHB port DMAs, this
structure enhances data transfer parallelism, thus contributing to reduce the execution time, and
optimizing the DMA efficiency and power consumption.
¹¹The figure is taken from the AN4031 application note from ST (http://bit.ly/1n66sW7)
DMA Management 266
The DMA implementation in STM32L0/L4 MCUs has a hybrid approach between the DMA im-
plementation found in F0/F1/F3/L1 and F2/F4/F7 MCUs. In fact, it provides a multi-stream/channel
approach but without the support to internal FIFOs for each stream.
ST has adopted a different nomenclature to indicate streams and channels in these DMA controllers.
Here streams are called channels and channels are called requests (probably this nomenclature is
clearer than the stream/channel one used in F2/F4/F7 MCUs). The Table 4¹² shows the channels/re-
quests map in an STM32L053 MCU. This nomenclature impacts also on the HAL, as we will see
next.
¹²The Table 10s taken from the RM0367 reference manual from ST (http://bit.ly/1Q3yKtW)
DMA Management 267
typedef struct {
DMA_Channel_TypeDef *Instance; /* Register base address */
DMA_InitTypeDef Init; /* DMA communication parameters */
HAL_LockTypeDef Lock; /* DMA locking object */
__IO HAL_DMA_StateTypeDef State; /* DMA transfer state */
void *Parent; /* Parent object state */
void (* XferCpltCallback)( struct __DMA_HandleTypeDef * hdma);
void (* XferHalfCpltCallback)( struct __DMA_HandleTypeDef * hdma);
void (* XferErrorCallback)( struct __DMA_HandleTypeDef * hdma);
__IO uint32_t ErrorCode; /* DMA Error code */
} DMA_HandleTypeDef;
Let us see more in depth the most important fields of this struct.
• Instance: is the pointer to the DMA/Channel pair descriptor we are going to use. For example,
DMA1_Channel5 indicates the fifth channel of DMA1. Remember that channels are bound to
peripherals during the MCU design, so consult the datasheet for your MCU to see the channel
bound to the peripheral you want to use in DMA mode.
• Init: is an instance of the C struct DMA_InitTypeDef, which is used to configure the
DMA/Channel pair. We will study it more in depth in a while.
• Parent: this pointer is used by the HAL to keep track of the peripheral handlers associated to
the current DMA/Channel. For example, if we are using an UART in DMA mode, this field
will point to an instance of UART_HandleTypeDef. We will see soon how peripheral handlers
are “linked” to this field.
DMA Management 268
All the DMA/Channel configuration activities are performed by using an instance of the C struct
DMA_InitTypeDef, which is defined in the following way:
typedef struct {
uint32_t Direction;
uint32_t PeriphInc;
uint32_t MemInc;
uint32_t PeriphDataAlignment;
uint32_t MemDataAlignment;
uint32_t Mode;
uint32_t Priority;
} DMA_InitTypeDef;
• Direction: it defines the DMA transfer direction and it can assume one of the values reported
in Table 5.
• PeriphInc: as said in previous paragraphs, a DMA controller has one peripheral port used to
specify the address of the peripheral register involved in the memory transfer (for example,
for a UART interface the address of its Data Register (DR)). Since a DMA memory transfer
usually involves several bytes, the DMA can be programmed to automatically increment the
peripheral register for every byte transmitted. This is true both when a memory-to-memory
transfer is performed and when the peripheral is byte, half-word and word addressable (like an
external SRAM memory). In this case the field assume the value DMA_PINC_ENABLE, otherwise
DMA_PINC_DISABLE.
• MemInc: this field has the same meaning of the PeriphInc field, but it involves the memory
port. It can assume the value DMA_MINC_ENABLE to signal that the specified memory address
has to be incremented after each byte transmitted, or the value DMA_MINC_DISABLE to leave it
unchanged after each transfer.
• PeriphDataAlignment: transfer data sizes of the peripheral and memory are fully pro-
grammable through this field and the next one. It can assume a value from Table 6. The
DMA controller is designed to automatically perform data alignment (packing/unpacking)
when source and destination data sizes differ. This topic is outside the scope of this book.
Please, refer to the Reference Manual of your MCU.
• MemDataAlignment: it specifies memory transfer data size and it can assume a value from
Table 7.
• Mode: the DMA controller in STM32 MCUs has two working modes: DMA_NORMAL and
DMA_CIRCULAR. In normal mode the DMA sends the specified amount of data from source
to destination port and stops the activities. It must be re-armed again to do another transfer.
DMA Management 269
In circular mode, at the end of transmission, it automatically resets the transfer counter and
starts transmitting again from the first byte of source buffer (that is, it treats the source buffer
as a ring buffer). This mode is also called continuous mode, and it is the only way to achieve
really high transmission speed in some peripheral (e.g. high speed SPI devices).
• Priority: one important feature of the DMA controller is the ability to assign priorities to each
channel, in order to rule concurrent requests. This field can assume a value from Table 8. In
case of concurrent requests from peripherals connected to channels with the same priority,
the channel with lower number fires first.
typedef struct {
DMA_Stream_TypeDef *Instance; /* Register base address */
DMA_InitTypeDef Init; /* DMA communication parameters */
HAL_LockTypeDef Lock; /* DMA locking object */
__IO HAL_DMA_StateTypeDef State; /* DMA transfer state */
void *Parent; /* Parent object state */
void (* XferCpltCallback)( struct __DMA_HandleTypeDef * hdma);
void (* XferHalfCpltCallback)( struct __DMA_HandleTypeDef * hdma);
void (* XferM1CpltCallback)( struct __DMA_HandleTypeDef * hdma);
void (* XferErrorCallback)( struct __DMA_HandleTypeDef * hdma);
__IO uint32_t ErrorCode; /* DMA Error code */
uint32_t StreamBaseAddress; /* DMA Stream Base Address */
uint32_t StreamIndex; /*!< DMA Stream Index */
} DMA_HandleTypeDef;
Let us see more in depth the most important fields of this struct.
• Instance: is the pointer to the stream descriptor we are going to use. For example, DMA1_-
Stream6 indicates the seventh¹³ stream of DMA1. Remember that a stream must be bound
to a channel before it can be used. This is achieved through the Init field, as we will see in
a while. Remember also that channels are bound to peripherals during the MCU design, so
consult the datasheet for your MCU to see the channel bound to the peripheral you want to
use in DMA mode.
• Init: is an instance of the C struct DMA_InitTypeDef, which is used to configure the
DMA/Channel/Stream triple. We will study it more in depth in a while.
• Parent: this pointer is used by the HAL to keep track of the peripheral handlers associated to
the current DMA/Channel. For example, if we are using an UART in DMA mode, this field
will point to an instance of UART_HandleTypeDef. We will see soon how peripheral handlers
are “linked” to this field.
• XferCpltCallback, XferHalfCpltCallback, XferM1CpltCallback, XferErrorCallback: these
are pointers to functions used as callbacks to signal the user code that a DMA transfer
is completed, half-completed, the transmission of first buffer in a multi-buffer transfer is
completed or an error occurred. They are automatically called by the HAL when a DMA
interrupt is faired, by the function HAL_DMA_IRQHandler(), as we will see next.
All the DMA/Channel configuration activities are performed by using an instance of the C struct
DMA_InitTypeDef, which is defined in the following way:
typedef struct {
uint32_t Channel;
uint32_t Direction;
uint32_t PeriphInc;
uint32_t MemInc;
uint32_t PeriphDataAlignment;
uint32_t MemDataAlignment;
uint32_t Mode;
uint32_t Priority;
uint32_t FIFOMode;
uint32_t FIFOThreshold;
uint32_t MemBurst;
uint32_t PeriphBurst;
} DMA_InitTypeDef;
• Channel: it specifies the DMA channel used for the given stream. It can assume the values
DMA_CHANNEL_0, DMA_CHANNEL_1 up to DMA_CHANNEL_7. Remember that peripherals are bound
to streams and channels during the MCU design, so consult the datasheet for your MCU to
see the stream bound to the peripheral you want to use in DMA mode.
• Direction: it defines the DMA transfer direction and it can assume one of the values reported
in Table 5.
• PeriphInc: as said in previous paragraphs, a DMA controller has one peripheral port used to
specify the address of the peripheral register involved in the memory transfer (for example,
for a UART interface the address of its Data Register (DR)). Since a DMA memory transfer
usually involves several bytes, the DMA can be programmed to automatically increment the
peripheral register for every byte transmitted. This is true both when a memory-to-memory
transfer is performed and when the peripheral is byte, half-word and word addressable (like
an external SRAM memory is). In this case the field assumes the value DMA_PINC_ENABLE,
otherwise DMA_PINC_DISABLE.
• MemInc: this field has the same meaning of the PeriphInc field, but it involves the memory
port. It can assume the value DMA_MINC_ENABLE to signal that the specified memory address
has to be incremented after each byte transmitted, or the value DMA_MINC_DISABLE to leave it
unchanged after each transfer.
• PeriphDataAlignment: transfer data sizes of the peripheral and memory are fully pro-
grammable through this field and the next one. It can assume a value from Table 6. The
DMA controller is designed to automatically perform data alignment (packing/unpacking)
when source and destination data sizes differ. This topic is outside the scope of this book.
Please, refer to the Reference Manual of your MCU.
• MemDataAlignment: it specifies memory transfer data size and it can assume a value from
Table 7.
• Mode: the DMA controller in STM32 MCUs has two working modes: DMA_NORMAL and
DMA_CIRCULAR. In normal mode the DMA sends the specified amount of data from source
to destination port and stops the activities. It must be re-armed again to do another transfer.
DMA Management 272
In circular mode, at the end of transmission, it automatically resets the transfer counter and
starts transmitting again from the first byte of the source buffer (that is, it treats the source
buffer as a ring buffer). This mode is also called continuous mode, and it is the only way to
achieve really high transmission speeds in some peripheral (e.g. really fast SPI devices).
• Priority: one important feature of the DMA controller is the ability to assign priorities to
each stream, in order to rule concurrent requests. This field can assume a value from Table 8.
In case of concurrent requests from peripherals connected to streams with the same priority,
the stream with lower number fires first.
• FIFOMode: it is used to enable/disable the DMA FIFO mode using DMA_FIFOMODE_ENABLE/DMA_-
FIFOMODE_DISABLE macros. In STM32F2/F4/F7 MCUs, each stream has an independent 4-word
(4 * 32 bits) FIFO. The FIFO is used to temporarily store data coming from the source before
transmitting it to the destination. When disabled, the Direct mode is used (this is the “normal”
mode available in other STM32 MCUs).
The FIFO mode introduces several advantages: it reduces SRAM access and so give more
time for the other masters to access the Bus Matrix without additional concurrency; it allows
software to do burst transactions which optimize the transfer bandwidth (more about this in
a while); it allows packing/unpacking data to adapt source and destination data width with
no extra DMA access.
If DMA FIFO is enabled, data packing/unpacking and/or Burst mode can be used. The FIFO
is automatically emptied according a threshold level. This level is software-configurable
between 1/4, 1/2, 3/4 or full size.
• FIFOThreshold: it specifies the FIFO threshold level and it can assume a value from Table 9.
• MemBurst: a round robin scheduling policy rules the access of a DMA stream before it can
transfer a sequence of bytes through the AHB bus. This “slows” down the transfer operations,
and for some high-speed peripherals it can be a bottleneck. A burst transfer allows a DMA
stream to transmit data repeatedly without going through all the steps required to transmit
each piece of data in a separate transaction. The burst mode works in conjunction with FIFOs
and it says nothing about the amount of bytes transferred. This is based on the settings
of MemDataAlignment field (when we are doing a memory-to-peripheral transfer). MemBurst
indicates the number of “shoots” performed by the stream, and it is made of bytes, half-word
and word depending the source configuration. The MemBurst field can assume one value from
Table 10.
• PeriphBurst: This field has the same meaning of the previous one, but it is related to
peripheral-to-memory transfers. It cam assume a value from Table 11.
• the DMA_HandleTypeDef.Instance is the channel number, and it can assume the values DMA1_-
Channel1..DMA1_Channel7;
• the DMA_HandleTypeDef.Init.Request is the request line, and it can assume the values DMA_-
REQUEST_0..DMA_REQUEST_11;
• this DMA implementation does not support FIFO and burst mode.
while the fourth point is peripheral dependent, and we have to consult our specific MCU datasheet.
However, as we will see later, the HAL also abstracts this point (for example, if we use the
corresponding HAL_UART_Transmit_DMA() function when configuring an UART in DMA mode).
Now we should have all the elements to see a fully working application. What we are going to do in
the next example is just sending a string over the UART2 peripheral using DMA mode. The involved
steps are:
• The UART2 is configured using the HAL_UART module, as we have seen in the previous chapter.
• The DMA1 channel (or the DMA1 channel/stream couple for STM32F4 based Nucleo boards)
is configured to do a memory-to-peripheral transfer (see Table 12)
• The corresponding channel is armed to execute the transfer and UART is enabled in DMA
mode.
Table 12: how USART_TX/USART_RX DMA channels are mapped in the MCUs equipping Nucleo boards
The following example, which is designed to work on a Nucleo-F030 (refer to book samples for the
other Nucleo boards), shows how to do this easily.
DMA Management 275
Filename: src/main-ex1.c
43 MX_DMA_Init();
44 MX_USART2_UART_Init();
45
46 hdma_usart2_tx.Instance = DMA1_Channel4;
47 hdma_usart2_tx.Init.Direction = DMA_MEMORY_TO_PERIPH;
48 hdma_usart2_tx.Init.PeriphInc = DMA_PINC_DISABLE;
49 hdma_usart2_tx.Init.MemInc = DMA_MINC_ENABLE;
50 hdma_usart2_tx.Init.PeriphDataAlignment = DMA_PDATAALIGN_BYTE;
51 hdma_usart2_tx.Init.MemDataAlignment = DMA_MDATAALIGN_BYTE;
52 hdma_usart2_tx.Init.Mode = DMA_NORMAL;
53 hdma_usart2_tx.Init.Priority = DMA_PRIORITY_LOW;
54 HAL_DMA_Init(&hdma_usart2_tx);
55
56 HAL_DMA_Start(&hdma_usart2_tx, (uint32_t)msg, (uint32_t)&huart2.Instance->TDR, strlen(\
57 msg));
58 //Enable UART in DMA mode
59 huart2.Instance->CR3 |= USART_CR3_DMAT;
60 //Wait for transfer complete
61 HAL_DMA_PollForTransfer(&hdma_usart2_tx, HAL_DMA_FULL_TRANSFER, HAL_MAX_DELAY);
62 //Disable UART DMA mode
63 huart2.Instance->CR3 &= ~USART_CR3_DMAT;
64 //Turn LD2 ON
65 HAL_GPIO_WritePin(LD2_GPIO_Port, LD2_Pin, GPIO_PIN_SET);
The hdma_usart2_tx variable is an instance of the DMA_HandleTypeDef struct seen before. Here we
configure DMA1_Channel4 to do a memory-to-peripheral transfer. Since the USART peripheral has
a Transmit Data Register (TDR) one byte wide, we configure the DMA so that the peripheral address
is not automatically incremented (DMA_PINC_DISABLE), while we want that the source memory
address is automatically incremented at every byte sent (DMA_MINC_ENABLE). Once the configuration
is completed, we call the HAL_DMA_Init() which performs the DMA interface configuration
according the information provided inside the hdma_usart2_tx.Init structure. Next, at line 56,
we invoke the HAL_DMA_Start() routine, which configures the source memory address (that is
the address of the msg array), the destination peripheral address (that is the address of USART2-
>TDR register) and the amount of data we are going to transmit. The DMA is now ready to shoot,
and we start the transmission setting the corresponding bit of USART2 peripheral, as shown in
line 62. Finally, take note that the function MX_DMA_Init() (invoked at line 43) uses the macro
__HAL_RCC_DMA1_CLK_ENABLE() to enable the DMA1 controller (remember that almost every STM32
internal module must be enabled by using the __HAL_RCC_<PERIPHERAL>_CLK_ENABLE() macro).
Since we do not know how long it takes to complete the transfer procedure, we use the function:
DMA Management 276
which automatically waits for full transfer completion. This way to send data in DMA mode is
called “polling mode” in the official ST documentation. Once the transfer is completed, we disable
the UART2 DMA mode and turn on the LD2 LED.
• define three functions acting as callback routines and pass them to function pointers XferC-
pltCallback, XferHalfCpltCallback and XferErrorCallback in a DMA_HandleTypeDef han-
dler (it is ok to define only the functions we are interested in, but set the corresponding
pointer to NULL, otherwise strange faults may occur);
• write down the ISR for the IRQ associated to the channel you are using and do a call to the
HAL_DMA_IRQHandler() passing the reference to the DMA_HandleTypeDef handler;
• enable the corresponding IRQ in the NVIC controller;
• use the function HAL_DMA_Start_IT(), which automatically performs all the necessary setup
steps for you, passing to it the same arguments of the HAL_DMA_Start().
Purists of performances will be disappointed by the way the HAL manages DMA interrupts.
In fact, it enables by default all the available IRQs for a given channel, even if we are
not interested to some of them (for example, we might not interested in capturing the
half transfer interrupt). If performances are fundamental for you, then take a look to the
HAL_DMA_Start_IT() code and rearrange it at your needs. Unfortunately, ST has decided
to design the HAL in a way that it abstracts a lot of detail to the user, at the expense of
speed.
DMA Management 277
The following example shows how to a DMA memory-to-peripheral transfer in interrupt mode.
Filename: src/main-ex2.c
47 hdma_usart2_tx.Instance = DMA1_Channel4;
48 hdma_usart2_tx.Init.Direction = DMA_MEMORY_TO_PERIPH;
49 hdma_usart2_tx.Init.PeriphInc = DMA_PINC_DISABLE;
50 hdma_usart2_tx.Init.MemInc = DMA_MINC_ENABLE;
51 hdma_usart2_tx.Init.PeriphDataAlignment = DMA_PDATAALIGN_BYTE;
52 hdma_usart2_tx.Init.MemDataAlignment = DMA_MDATAALIGN_BYTE;
53 hdma_usart2_tx.Init.Mode = DMA_NORMAL;
54 hdma_usart2_tx.Init.Priority = DMA_PRIORITY_LOW;
55 hdma_usart2_tx.XferCpltCallback = &DMATransferComplete;
56 HAL_DMA_Init(&hdma_usart2_tx);
57
58 /* DMA interrupt init */
59 HAL_NVIC_SetPriority(DMA1_Channel4_5_IRQn, 0, 0);
60 HAL_NVIC_EnableIRQ(DMA1_Channel4_5_IRQn);
61
62 HAL_DMA_Start_IT(&hdma_usart2_tx, (uint32_t)msg, \
63 (uint32_t)&huart2.Instance->TDR, strlen(msg));
64 //Enable UART in DMA mode
65 huart2.Instance->CR3 |= USART_CR3_DMAT;
66
67 /* Infinite loop */
68 while (1);
69 }
70
71 void DMATransferComplete(DMA_HandleTypeDef *hdma) {
72 if(hdma->Instance == DMA1_Channel4) {
73 //Disable UART DMA mode
74 huart2.Instance->CR3 &= ~USART_CR3_DMAT;
75 //Turn LD2 ON
DMA Management 278
Filename: src/main-ex3.c
45 hdma_usart2_rx.Instance = DMA1_Channel5;
46 hdma_usart2_rx.Init.Direction = DMA_PERIPH_TO_MEMORY;
47 hdma_usart2_rx.Init.PeriphInc = DMA_PINC_DISABLE;
48 hdma_usart2_rx.Init.MemInc = DMA_MINC_DISABLE;
49 hdma_usart2_rx.Init.PeriphDataAlignment = DMA_PDATAALIGN_BYTE;
50 hdma_usart2_rx.Init.MemDataAlignment = DMA_MDATAALIGN_BYTE;
51 hdma_usart2_rx.Init.Mode = DMA_CIRCULAR;
52 hdma_usart2_rx.Init.Priority = DMA_PRIORITY_LOW;
53 HAL_DMA_Init(&hdma_usart2_rx);
54
55 __HAL_RCC_DMA1_CLK_ENABLE();
56
57 HAL_DMA_Start(&hdma_usart2_rx, (uint32_t)&huart2.Instance->RDR, (uint32_t)&GPIOA->ODR,\
58 1);
59 //Enable UART in DMA mode
60 huart2.Instance->CR3 |= USART_CR3_DMAR;
This time we configure the channel to do a transfer from peripheral-to-memory, without incre-
menting neither the source peripheral register (UART data register) nor the target memory location,
¹⁴The example is designed to run on a Nucleo-F030. For Nucleo boards based on F2/F4/L1/L4 MCUs, the example is designed to
work with the UART1, whose DMA requests are bound to the DMA2.
DMA Management 279
which in our case is the address of the GPIOA->ODR register. Finally, the channel is configured to
work in circular mode: this will cause that all bytes transmitted over the UART will be stored inside
the GPIOA->ODR register continuously.
To test the example, we can simply use the following Python script:
Filename: src/uartsend.py
1 #!/usr/bin/env python
2 import serial, time
3
4 SERIAL_PORT = "/dev/tty.usbmodem1a1213" #Windows users, replace with "COMx"
5 ser = serial.Serial(SERIAL_PORT, 38400)
6
7 while True:
8 ser.write((0xff,))
9 time.sleep(0.05)
10 ser.write((0,))
11 time.sleep(0.05)
12
13 ser.close()
The code is really self-explaining. We use the pyserial module to open a new serial connection
on the Nucleo VCP. Then we start an infinite-loop that sends the 0xFF and 0x0 bytes alternatively.
This will cause that the GPIOA->ODR assumes the same value (that is, the first eight I/Os goes HIGH
and LOW alternatively) and the Nucleo LD2 LED blinks. As you can see, the Cortex-M core knows
nothing about what’s happening between the UART2 and the GPIOA peripheral.
• configure the DMA channel/stream hardwired to the UART you are going to use, as seen in
this chapter;
• link the UART_HandleTypeDef to the DMA_HandleTypeDef using the __HAL_LINKDMA();
• enable the DMA interrupt related to the channel/stream you are using and call the HAL_DMA_-
IRQHandler() routine from its ISR;
• enable the UART related interrupt and call the HAL_UART_IRQHandler() routine from its ISR
(this is really important, do not skip this step);
DMA Management 280
The following code shows how to receive three bytes from UART2 in DMA mode in an STM32F030
MCU¹⁵:
uint8_t dataArrived = 0;
int main(void) {
HAL_Init();
Nucleo_BSP_Init(); //Configure the UART2
//Configure the DMA1 Channel 5, which is wired to the UART2_RX request line
hdma_usart2_rx.Instance = DMA1_Channel5;
hdma_usart2_rx.Init.Direction = DMA_PERIPH_TO_MEMORY;
hdma_usart2_rx.Init.PeriphInc = DMA_PINC_DISABLE;
hdma_usart2_rx.Init.MemInc = DMA_MINC_ENABLE;
hdma_usart2_rx.Init.PeriphDataAlignment = DMA_PDATAALIGN_BYTE;
hdma_usart2_rx.Init.MemDataAlignment = DMA_MDATAALIGN_BYTE;
hdma_usart2_rx.Init.Mode = DMA_NORMAL;
hdma_usart2_rx.Init.Priority = DMA_PRIORITY_LOW;
HAL_DMA_Init(&hdma_usart2_rx);
/* Infinite loop */
while (1);
¹⁵Arranging the DMA initialization code for other STM32 MCUs is left as exercise to the reader.
DMA Management 281
//This callback is automatically called by the HAL when the DMA transfer is completed
void HAL_UART_RxCpltCallback(UART_HandleTypeDef *huart) {
dataArrived = 1;
}
void DMA1_Channel4_5_IRQHandler(void) {
HAL_DMA_IRQHandler(&hdma_usart2_rx); //This will automatically call the HAL_UART_RxCpltC\
allback()
}
As you can seen, once mastered how the DMA controller works, is really simple to use a peripheral
using this transfer mode.
This function disables the DMA stream/channel. If a stream is disabled while a data transfer is
ongoing, the current data will be transferred and the stream will be effectively disabled only after
the transfer of this single data is finished.
Some STM32 MCUs can perform multi-buffer DMA transfers, which allow to use two separated
buffers during the transfer process: the DMA will automatically “jump” from the first buffer (named
memory0) to the second one (named memory1) when the end of the first one is reached. This
especially useful when DMA works in circular mode. The function:
is used to setup multi-buffer DMA transfers. It is available only in F2/F4/F7 HALs. A corresponding
HAL_DMAEx_MultiBufferStart_IT() is also available, which also takes care of enabling DMA
interrupts.
The function:
changes the memory0 or memory1 address on the fly in a multi-buffer DMA transaction.
The dialog contains two or three tabs (according the number of DMA controllers provided by your
MCU). The first two are related to peripheral requests. For example, if you want to enable a DMA
request for USART2 in transmit mode (to do a memory-to-peripheral transfer), click on the Add
button, and select the USART2_TX entry. CubeMX will automatically fill the remaining fields for you,
selecting the right channel. You can then assign a priority to the request, and to set other things
like the DMA mode, peripheral/memory increment, and so on. Once completed, click on the OK
button. In the same way, it is possible to configure DMA channels/streams to do memory-to-memory
transfers.
CubeMX will automatically generate the right initialization code for the used channels inside the
stm32xxxx_hal_msp.c file.
as that stack frame is active. When the called function exits, the stack area where the variable has
been allocated is reassigned to other uses (to store the arguments or other local variables of the next
called function). If we use a local variable as buffer for DMA transfers (that is, we pass to the DMA
memory port the address of a memory location in the stack), then it will be very likely that DMA
will access to a memory region containing other data, corrupting that memory area if we are doing
a peripheral-to-memory transfer, unless we are sure that the function is never popped from the stack
(this could be the case of a variable declared inside the main() function).
Figure 9:
The Figure 9 clearly shows the difference between a variable allocated locally (lbuf) and one
allocated at the global scope (gbuf). lbuf will be active as long the func1() is on the stack.
If you want to avoid global variables in your application, another solution is represented by declaring
it as static. As we will discover in a following chapter, static variables are automatically allocated
inside the .data region (Global Data region in Figure 9), even if their “visibility” is limited at the
local scope.
The majority of these tests claim that usually the DMA is much more slower than the Cortex-M
core. Is this true? The answer is: it depends. So, why would you use the DMA when you actually
have already those routines?
The story behind these tests is much more complicated, and it involves several factors like the
memory align, the C library used and the right DMA settings. Let us consider the following test
application (the code is designed to run on an STM32F4 MCU) divided in several stages:
Filename: src/mem2mem.c
12 DMA_HandleTypeDef hdma_memtomem_dma2_stream0;
13
14 const uint8_t flashData[] = {0xe7, 0x49, 0x9b, 0xdb, 0x30, 0x5a, ...};
15 uint8_t sramData[1000];
16
17 int main(void) {
18 HAL_Init();
19 Nucleo_BSP_Init();
20
21 hdma_memtomem_dma2_stream0.Instance = DMA2_Stream0;
22 hdma_memtomem_dma2_stream0.Init.Channel = DMA_CHANNEL_0;
23 hdma_memtomem_dma2_stream0.Init.Direction = DMA_MEMORY_TO_MEMORY;
24 hdma_memtomem_dma2_stream0.Init.PeriphInc = DMA_PINC_ENABLE;
25 hdma_memtomem_dma2_stream0.Init.MemInc = DMA_MINC_ENABLE;
26 hdma_memtomem_dma2_stream0.Init.PeriphDataAlignment = DMA_PDATAALIGN_BYTE;
27 hdma_memtomem_dma2_stream0.Init.MemDataAlignment = DMA_MDATAALIGN_BYTE;
28 hdma_memtomem_dma2_stream0.Init.Mode = DMA_NORMAL;
29 hdma_memtomem_dma2_stream0.Init.Priority = DMA_PRIORITY_LOW;
30 hdma_memtomem_dma2_stream0.Init.FIFOMode = DMA_FIFOMODE_ENABLE;
31 hdma_memtomem_dma2_stream0.Init.FIFOThreshold = DMA_FIFO_THRESHOLD_FULL;
32 hdma_memtomem_dma2_stream0.Init.MemBurst = DMA_MBURST_SINGLE;
33 hdma_memtomem_dma2_stream0.Init.PeriphBurst = DMA_MBURST_SINGLE;
34 HAL_DMA_Init(&hdma_memtomem_dma2_stream0);
35
36 GPIOC->ODR = 0x100;
37 HAL_DMA_Start(&hdma_memtomem_dma2_stream0, (uint32_t)&flashData, (uint32_t)sramData, 1\
38 000);
39 HAL_DMA_PollForTransfer(&hdma_memtomem_dma2_stream0, HAL_DMA_FULL_TRANSFER, HAL_MAX_DELA\
40 Y);
41 GPIOC->ODR = 0x0;
42
43 while(HAL_GPIO_ReadPin(B1_GPIO_Port, B1_Pin));
44
45 hdma_memtomem_dma2_stream0.Init.PeriphDataAlignment = DMA_PDATAALIGN_WORD;
46 hdma_memtomem_dma2_stream0.Init.MemDataAlignment = DMA_MDATAALIGN_WORD;
47
DMA Management 286
48 HAL_DMA_Init(&hdma_memtomem_dma2_stream0);
49
50 GPIOC->ODR = 0x100;
51 HAL_DMA_Start(&hdma_memtomem_dma2_stream0, (uint32_t)&flashData, (uint32_t)sramData, 2\
52 50);
53 HAL_DMA_PollForTransfer(&hdma_memtomem_dma2_stream0, HAL_DMA_FULL_TRANSFER, HAL_MAX_DELA\
54 Y);
55 GPIOC->ODR = 0x0;
56
57 HAL_Delay(1000); /* This is a really primitive form of debouncing */
58
59 while(HAL_GPIO_ReadPin(B1_GPIO_Port, B1_Pin));
60
61 GPIOC->ODR = 0x100;
62 memcpy(sramData, flashData, 1000);
63 GPIOC->ODR = 0x0;
Here we have two quite large arrays. One of these, flashData, is allocated inside the FLASH memory
thanks to the const modifier¹⁷. We want to copy its content inside the sramData array, which is
stored inside the SRAM as the name suggests, and we want to test how long it takes using DMA and
memcpy() function.
First we start testing the DMA. The hdma_memtomem_dma2_stream0 handle is used to configure the
DMA2 stream0/channel0 to execute a memory-to-memory transfer. In the first stage we configure the
DMA stream to perform a byte-aligned memory transfer. Once the DMA configuration is completed,
we start the transfer. Using an oscilloscope attached to Nucleo PC8 pin, we can measure how long
the transfer takes. Pressing the Nucleo USER button (connected to PC13) causes the start of another
test stage. This time we configure the DMA so that a word-aligned transfer is executed. Finally, at
line 58 we test how long it takes to copy the array using memcpy().
¹⁷The reason why this happens will be explained in a following chapter.
DMA Management 287
The Table 13 shows the results obtained for every Nucleo board. Let us focus on the the Nucleo-
F401RE board. As you can see, the DMA M2M byte-aligned transfer takes ∼42μS, while the DMA-
M2M word-aligned transfer takes ∼14 μS. This is a great speed-up, which proves that using the right
DMA configuration can give us the best transfer performance, since we are moving 4 bytes at once
for each DMA shoot. What about the memcpy()? As you can see from Table 13, it depends on the
C library used. The GCC tool-chain we are using provides two C run-time libraries: one is named
newlib and one newlib-nano. The first one is the most complete and speed-optimized of the two,
while the second one is a reduced-size version. The memcpy() in the newlib library is designed to
provide the fastest copy speed, at the expense of code size. It automatically detects word-aligned
transfers, and it equals the DMA when doing word-aligned M2M transfers. So, it is much faster
than DMA when doing byte-aligned M2M transfers and that is the reason why someone claims that
memcpy() is always faster then DMA. On the other hand, both Cortex-M core and the DMA need to
access FLASH and SRAM memory using the same bus. So there are no reasons why the core should
be faster than the DMA¹⁸.
As you can see, the fastest transfer speed is achieved when the DMA stream/channel disables the
internal FIFO buffer (∼12 μS). It is important to remark that for STM32 MCUs with smaller FLASH
¹⁸Here I am clearly excluding some “privileged paths” between the Cortex-M core and SRAM. This is the role of the Core-Coupled
Memory (CCM), a feature available in some STM32 MCUs and that we will explore better in a following chapter.
DMA Management 288
memories the newlib-nano it is almost an unavoidable choice, unless the code can fit the FLASH
space. But again, using the right DMA settings we can achieve the same performances of the speed-
optimized version available in newlib library.
The last thing we have to analize is the last column in Table 13. It shows how long it takes to do a
memory transfer using a simple loop like the following one:
...
GPIOC->ODR = 0x100;
for(int i = 0; i < 1000; i++)
sramData[i] = flashData[i];
GPIOC->ODR = 0x0;
...
As you can see, with the maximum optimization level (-O3) it takes exactly the same time of
memcpy(). Why does this happen?
...
GPIOC->ODR = 0x100;
8001968: f44f 7380 mov.w r3, #256 ; 0x100
800196c: 6163 str r3, [r4, #20]
800196e: 4807 ldr r0, [pc, #28] ; (800198c <main+0x130>)
8001970: 4907 ldr r1, [pc, #28] ; (8001990 <main+0x134>)
8001972: f44f 727a mov.w r2, #1000 ; 0x3e8
8001976: f000 f92d bl 8001bd4 <memcpy>
for(int i = 0; i < 1000; i++)
sramData[i] = flashData[i];
GPIOC->ODR = 0x0;
800197a: 6165 str r5, [r4, #20]
...
Looking at generated assembly code above you can see that the compiler automatically transforms
the loop in a call to the memcpy() function. This clearly explains why they have the same
performances.
Table 13 shows another interesting result. For an STM32F152RE MCU, the memcpy() in newlib is
always twice faster than the DMA M2M. I do not know why this happens, but I have executed
several tests and I can confirm the result.
Finally, other tests not reported here show that it is convenient to use DMA to do M2M transfers
when the array has more than 30-50 elements, otherwise the DMA setup costs outweigh the benefits
related to its usage. However, it is important to remark that the other advantage in using the DMA
M2M transfer is that the CPU is free to accomplish other tasks while the DMA performs the transfer,
even if its access to the bus slows the overall DMA performances.
DMA Management 289
How to switch to the newlib run-time library? This can be easily done in Eclipse, going in the project
settings (Project->Properties menu), then going into C/C++ Build->Settings section and selecting
the Miscellaneous entry inside the Cross ARM C++ Linker section. Unchecking the entry Use
newlib-nano (see Figure 8) will automatically cause that the final binary is linked with the newlib
library.
A clock signal oscillates between VL and VH voltage levels, which for STM32 microcontrollers are
a fraction of the VDD supply voltage. The most fundamental parameter of a clock is the frequency,
which indicates how many times it switches from VL to VH in a second. The frequency is expressed
in Hertz.
¹It is important to remark that the square wave represented in Figure 1 is “ideal”. The real square wave of a clock source has a
trapezoidal form.
290
Clock tree 291
All STM32 MCUs can be clocked by two distinct clock sources alternatively: an internal RC oscillator²
(named High Speed Internal (HSI)) or an external dedicated crystal oscillator³ (named High Speed
External (HSE)). There are several reasons to prefer an external crystal to the internal RC oscillator:
• An external crystal offers a higher precision compared to the internal RC network, which is
rated of a 1% accuracy⁴, especially when PCB operative temperatures are far from the ambient
temperature of 25°C.
• Some peripherals, especially high speed ones, can be clocked only by a dedicated external
crystal running at a given frequency.
Together with the high-speed oscillator⁵, another clock source can be used to bias the low-speed
oscillator, which in turn can be clocked by an external crystal (named Low Speed External (LSE)) or
the internal dedicated RC oscillator (named Low Speed Internal (LSI)). The low-speed oscillator is
used to drive the Real Time Clock (RTC) and the Independent Watchdog (IWDT) peripheral.
The frequency of the high-speed oscillator does not establish the actual frequency neither of the
Cortex-M core nor of the other peripherals. A complex distribution network, also called clock
tree, is responsible for the propagation of the clock signal inside an STM32 MCU. Using several
programmable Phase-Locked Loops (PLL) and prescalers, it is possible to increase/decrease the source
frequency at needs (see Figure 2), depending on the performances we want to reach, the maximum
speed for a given peripheral or bus and the overall global power consumption⁶.
Figure 2: how the source clock signal frequency is increased/decreased using PLLs and prescalers
Multiplexer (also known as System Clock Switch (SW)) can be fed by several alternate sources.
Table 1: the maximum clock speeds for AHB, APB1 and APB2 buses of the MCUs equipping all Nucleo boards
Moreover, explaining in depth the clock tree of every STM32 family is a complex task, which also
requires we focus our attention on a specific part number. In fact, the clock tree structure is affected
mainly by the following key aspects:
• The STM32 main family of the microcontroller. For example, all STM32F0 MCUs provide
just one peripheral bus (APB1), which can be clocked at the same Cortex-M core maximum
frequency. Other STM32 microcontrollers usually provide two peripheral buses, and only one
of these (APB2) can reach the maximum CPU clock speed. Instead, none of the peripheral
buses available in an STM32F7 microcontroller can reach the maximum core frequency⁷. Table
1 reports the maximum clock speed for AHB, APB1 and APB2 buses (with related timers clock
speed) of the MCUs equipping all Nucleo boards: you can note that, for some STM32 MCUs,
it is possible to reach the maximum clock speed only by using an external HSE oscillator.
• The type and number of peripherals provided by the MCU. The complexity of the clock
tree increases with the number of available peripherals. Moreover, some peripherals require
dedicated clock sources and speeds, which impact on the number of PLL stages.
• The sales type and package of the MCU, which determines the effective type and number of
provided peripherals.
⁷Except for timers on the APB2 bus (at least at the time of writing this chapter - February 2016).
Clock tree 293
Even restricting our focus only on the sixteen MCUs equipping the Nucleo boards, this would require
a long and tedious work, which involve a deep knowledge of all peripherals implemented by the
given MCU. For these reasons, we will give a quick overview to the STM32 clock tree, leaving to the
reader the responsibility to deepen the particular MCU he is considering. Moreover, as we will see
in a while, thanks to CubeMX it is possible to abstract from the specific clock tree implementation,
unless we need to deal with specific PLL configurations for performance and power management
reasons.
Figure 3 shows the clock tree of one of the simplest STM32 microcontrollers: the STM32F030R8. It
Clock tree 294
is extracted from the related reference manual⁸ provided by ST. For a lot of novices of the STM32
platform that figure is completely meaningless and quite hard to decode, especially if they are also
new to embedded microcontrollers. The most relevant path has been outlined in red: the one that
goes from the HSI oscillator to the Cortex-M0 core, AHB bus and DMA. This is the path we have
“used” since here silently, without dealing too much with its possible configurations. Let us introduce
the most relevant parts of that path.
The path starts from the internal 8MHz oscillator. As said before, it is an RC oscillator factory-
calibrated by ST for 1% accuracy at a ambient temperature of 25 °C. The HSI clock can then be used
to feed the System Clock Switch (SW) as is (path highlighted in blue in Figure 3) or it can be used
to feed the PLL multiplier after it has been divided by two thanks to an intermediate prescaler⁹.
The main PLL so can multiply the 4MHz clock up to 12 times to obtain the maximum System Clock
Frequency (SYSCLK) of 48MHz. The SYSCLK source can be used to feed the I2C1 peripheral (in
alternative to the HSI) and another intermediate prescaler, the AHB prescaler, which can be used to
lower the High (speed) Clock (HCLK), which in turn biases the AHB bus, the core and the SysTimer.
The clock tree configuration is performed through a dedicated peripheral¹⁰ named Reset and Clock
Control (RCC), and it is a process essentially composed by three steps:
1. The high-speed oscillator source is selected (HSI or HSE) and properly configured, if the HSE
is used.¹¹
2. If we want to feed the SYSCLK with a frequency higher than the one provided by the high-
speed oscillator, then we need to configure the main PLL (which provides the PLLCLK signal).
Otherwise we can skip this step.
3. The System Clock Switch (SW) is configured choosing the right clock source (HSI, HSE, or
PLLCLK). Then we select the right AHB, APB1 and APB2 (if available) prescaler settings to
⁸http://bit.ly/1GfS3iC
⁹A prescaler is an “electronic counter” used to reduce high frequencies. In this case, the “/2” prescaler reduces the main 8MHz
frequency to 4MHz.
¹⁰Sometimes, ST defines in its documents the RCC as “peripheral”. Sometimes no. I am not sure that if it is properly a peripheral,
but I will define it in the same way ST does. Sometimes.
¹¹In STM32L0/1/4 MCUs, the SYSCLK can be also fed by another dedicated and low-power clock source, named MSI. We will
talk about this clock source next.
Clock tree 295
reach the wanted frequency of the High-speed clock (HCLK - that is the one that feeds the
core, DMAs and AHB bus), and the frequencies of Advanced Peripheral Bus 1 (APB1) and
APB2 (if available) buses.
Knowing the admissible values for PLLs and prescalers can be a nightmare, especially for more
complex STM32 MCUs. Only some combinations are valid for a given STM32 microcontroller, and
their improper configuration could potentially damage the MCU or at least cause malfunctions
(a wrong clock configuration could lead to abnormal behaviour, strange and unpredictable resets
and so on). Luckily for us, the STM32 engineers have provided a great tool to simplify the clock
configuration: CubeMX.
The clock source and its distribution network have a non-negligible impact on the overall power
consumption of the MCU. If we need a SYSCLK frequency higher or lower than the internal HSI
clock source (which is 8MHz for the most of STM32 MCUs and 16MHz for some others), we have to
increase/reduce it by using the PLL Source Mux and intermediate prescalers. Unfortunately, these
components consume energy, and this can have a dramatic impact on battery-powered devices.
STM32L0/1/4 MCUs are explicitly designed for low-power applications, and they address this
specific issue by supplying a dedicated internal clock source, named MultiSpeed Internal (MSI)
RC oscillator. MSI is a low-power RC oscillator, with a ±1%@25°C factory pre-calibrated accuracy,
which can increase up to ±3% in the 0-85°C range. The main characteristic of the MSI is that it supplies
up to twelve different frequencies, without adding any external component. For example, the MSI
in an STM32F476 provides an internal clock source ranging from 100KHz up to 48MHz. The MSI
clock is used as SYSCLK after restart from Reset, wakeup from Standby and Shutdown low-power
modes. After restart from Reset, the MSI frequency is set to its default (for example, the default
MSI frequency in an STM32F476 is 4MHz). Table 2 summarizes the most relevant characteristics of
Clock tree 296
all possible clock sources in an STM32L476 MCU. As you can see, the best power consumption is
achieved while the MCU is clocked by the MSI (without using the PLL Multiplexer). Moreover, this
clock source guarantees the shortest startup time, if compared with the HSI. It is interesting to see
that up to two seconds are required to stabilize the LSE clock: if startup speed is really important for
your application, then using a separated thread to start the LSE is an option to consider.
In addition to the advantages related to low-power, when the MSI is used as source for the PLL
Source Mux with the LSE, it provides a very accurate clock source which can be used by the USB
OTG FS device without using an external dedicated crystal, while feeding the main PLL to run the
system at the maximum speed of 80MHz.
Even in this case, the most relevant paths of the clock tree have been highlighted in red and blue.
This should simplify the comparison with the Figure 3. When a new project is created, by default
CubeMX chooses the HSI oscillator as default clock source. HSI is also chosen as default clock source
for the System Clock Switch (path in blue), as shown in Figure 4. This means that, for the MCU we
are considering here, the Cortex-M core frequency will be equal to 8MHz.
CubeMX also advises us about two things: the maximum frequency for the High (speed) Clock
(HCLK) and the APB1 bus is equal to 48MHz in this MCU (labels in blue). To increase the CPU
core frequency we first need to select the PLLCLK as the source clock for the System Clock Switch
and then choose the right PLL multiplier factor. However, CubeMX offers a quick way to do this:
you can simply write “48” inside the HCLK field and hit the enter key. CubeMX will automatically
arrange the settings, choosing the right clock tree path (the red one in Figure 4)
If your board relies on an external HSE/LSE crystal, you have to enable it in the RCC peripheral
before you can use it as main clock source for the corresponding oscillator (we will see in a while
how to do this step-by-step). Once the external oscillator is enabled, it is possible to specify its
frequency (inside the blue box labeled “input frequency”) and to configure the main PLL to achieve
the desired SYSCLK speed (see Figure 5). Otherwise, the external oscillator input frequency can be
used directly as source clock for the System Clock Switch.
Figure 5: CubeMX allow to select the HSE oscillator once it is enabled using the RCC peripheral
We need to configure the RCC peripheral accordingly to enable an external clock source. This can
be done from the Pinout view in CubeMX, as shown in Figure 6.
For both HSE and LSE oscillators, CubeMX offers three configuration options:
• Disable: the external oscillator is not available/used, and the corresponding internal oscillator
Clock tree 298
is used.
• Crystal/Ceramic Resonator: an external crystal/ceramic resonator is used and the corre-
sponding main frequency is derived from it. This implies that RCC_OSC_IN and RCC_OSC_-
OUT pins are used to interface the HSE, and the corresponding signal I/Os are unavailable
for other usages (if we are using an external low-speed crystal, then the corresponding
RCC_OSC32_IN and RCC_OSC32_OUT I/Os are used too).
• BYPASS Clock Source: an external clock source is used. The clock source is generated by
another active device. This means that the RCC_OSC_OUT is leaved unused, and it is possible
to use it as regular GPIO. In almost all development board from ST (included the Nucleo ones)
the Master Clock Output (MCO) pin of the ST-LINK interface is used as external clock source
for the target STM32 MCU. Enabling this option allows to use the ST-LINK MCO as HSE.
The RCC peripheral also allows to enable the Master Clock Output (MCO), which is a pin that can
be connected to a clock source. It can be used to clock another external device, allowing to save on
the external crystal for this other IC. Once the MCO is enabled, it is possible to choose its clock
source using the Clock Configuration view, as shown in Figure 7.
Figure 7: how to select the clock source for the MCO pin
There are four ways to configure the pins corresponding to external high-speed clock external high-
speed clock (HSE):
• MCO from ST-LINK: MCO output of ST-LINK MCU is used as input clock. This frequency
cannot be changed, it is fixed at 8 MHz and connected to PF0/PD0/PH0-OSC_IN of target
STM32 MCU.
The following configuration is needed:
Clock tree 299
– SB55 OFF
– SB16 and SB50 ON
– R35 and R37 removed
• HSE oscillator on-board from X3 crystal (not provided): for typical frequencies and its
capacitors and resistors, refer to STM32 microcontroller datasheet. Please refer to the AN2867
for oscillator design guide for STM32 microcontrollers.
The following configuration is needed:
– SB54 and SB55 OFF
– R35 and R37 soldered
– C33 and C34 soldered
– SB16 and SB50 OFF
• Oscillator from external PF0/PD0/PH0: from an external oscillator through pin 29 of the
CN7 connector.
The following configuration is needed:
– SB55 ON
– SB50 OFF
– R35 and R37 removed
• HSE not used: PF0/PD0/PH1 and PF1/PD1/PH1 are used as GPIO instead of Clock
The following configuration is needed:
– SB54 and SB55 ON
– SB16 and SB50 (MCO) OFF
– R35 and R37 removed
There are two possible default configurations of the HSE pins depending on the version of NUCLEO
board hardware. The board version MB1136 C-01/02/03 is mentioned on sticker placed on bottom
side of the PCB.
• The board marking MB1136 C-01 corresponds to a board, configured for HSE not used.
• The board marking MB1136 C-02 (or higher) corresponds to a board, configured to use STLINK
MCO as clock input.
Read carefully
For Nucleo-L476RG the ST-LINK MCO output is not connected to OSCIN, to reduce power
consumption in low power mode. Consequently, the HSE in a Nucleo-L476RG cannot be
used unless an external crystal is mounted on X3 pad, as described before.
Clock tree 300
There are three ways to configure the pins corresponding to low-speed clock (LSE):
• On-board oscillator: X2 crystal. Please refer to the AN2867 for oscillator design guide for
STM32 microcontrollers. The oscillator P/N is ABS25-32.768KHZ-6-T and it is manufactured
by Abracon corporation.
• Oscillator from external PC14: from external oscillator through the pin 25 of CN7 connector.
The following configuration is needed:
– SB48 and SB49 ON
– R34 and R36 removed
• LSE not used: PC14 and PC15 are used as GPIOs instead of low speed Clock.
The following configuration is needed:
– SB48 and SB49 ON
– R34 and R36 removed
There are two possible default configurations of the LSE depending on the version of NUCLEO
board hardware. The board version MB1136 C-01/02/03 is mentioned on sticker placed on bottom
side of the PCB.
• The board marking MB1136 C-01 corresponds to a board configured as LSE not used.
• The board marking MB1136 C-02 (or higher) corresponds to a board configured with onboard
32kHz oscillator.
• The board marking MB1136 C-03 (or higher) corresponds to a board using new LSE crystal
(ABS25) and C26, C31 & C32 value update.
Read carefully
All Nucleo boards with a release version equal to MB1136 C-02 have a severe issue with
the values of the dumping resistor R34, R36 and with the capacitors C26, C31 & C32. This
issue prevents the LSE to start correctly.
for other HAL modules, is outside the scope of this book. It would require we keep track of too many
differences among the several STM32 microcontrollers. So, we will now give a brief overview to its
main features and to the steps involved during the configuration of the clock tree.
The most relevant C struct to configure the clock tree are RCC_OscInitTypeDef and RCC_ClkInit-
TypeDef. The first one is used to configure the RCC internal/external oscillator sources (HSE, HSI,
LSE, LSI), plus some additional clock sources if provided by the MCU. For example, some STM32
MCUs from the F0 series (STM32F07x, STM32F0x2 and STM32F09x ones) provide USB 2.0 support, in
addition to an internal dedicated and factory-calibrated high-speed oscillator running at 48MHz to
bias the USB peripheral. If this is the case, the RCC_OscInitTypeDef struct is also used to configure
those additional clock sources. The RCC_OscInitTypeDef struct also has a field that is instance of
the RCC_PLLInitTypeDef struct, which configures the main PLL used to increase the speed of the
source clock. It reflects the hardware structure of the main PLL, and can be composed by several
fields depending on the STM32 series (in STM32F2/4/7 MCUs it can have a quite complex structure).
The RCC_ClkInitTypeDef struct, instead, is used to configure the source clock for the System Clock
Switch (SWCLK), for the AHB bus and the APB1/2 buses.
CubeMX is designed to generate the right code initialization for the clock tree of our MCU. All the
necessary code is packed inside the SystemClock_Config() routine, which we have encountered in
the projects generated until now. For example, the following implementation of the SystemClock_-
Config() reflects the clock tree configuration for an STM32F030R8 MCU running at 48MHz:
1 void SystemClock_Config(void) {
2 RCC_OscInitTypeDef RCC_OscInitStruct;
3 RCC_ClkInitTypeDef RCC_ClkInitStruct;
4
5 RCC_OscInitStruct.OscillatorType = RCC_OSCILLATORTYPE_HSI;
6 RCC_OscInitStruct.HSIState = RCC_HSI_ON;
7 RCC_OscInitStruct.HSICalibrationValue = 16;
8 RCC_OscInitStruct.PLL.PLLState = RCC_PLL_ON;
9 RCC_OscInitStruct.PLL.PLLSource = RCC_PLLSOURCE_HSI;
10 RCC_OscInitStruct.PLL.PLLMUL = RCC_PLL_MUL12;
11 RCC_OscInitStruct.PLL.PREDIV = RCC_PREDIV_DIV1;
12 HAL_RCC_OscConfig(&RCC_OscInitStruct);
13
14 RCC_ClkInitStruct.ClockType = RCC_CLOCKTYPE_SYSCLK;
15 RCC_ClkInitStruct.SYSCLKSource = RCC_SYSCLKSOURCE_PLLCLK;
16 RCC_ClkInitStruct.AHBCLKDivider = RCC_SYSCLK_DIV1;
17 RCC_ClkInitStruct.APB1CLKDivider = RCC_HCLK_DIV1;
18 HAL_RCC_ClockConfig(&RCC_ClkInitStruct, FLASH_LATENCY_1);
19
20 HAL_SYSTICK_Config(HAL_RCC_GetHCLKFreq()/1000);
21
22 HAL_SYSTICK_CLKSourceConfig(SYSTICK_CLKSOURCE_HCLK);
23
Clock tree 302
Lines [5:12] select the HSI as source oscillator and enable the main PLL, setting the HSI as its clock
source through the PLL multiplexer. The clock frequency is then increased by twelve times (settings
the PLLMUL field). Lines [14:18] set the SYSCLK frequency. The PLLCLK is selected as clock source
(line 15). In the same way, the SYSCLK frequency is selected as source for the AHB bus, and the
same HCLK frequency (RCC_HCLK_DIV1) as source for the APB1 bus. The other lines of code set the
SysTick timer, a special timer available in the Cortex-M core used to synchronize some internal HAL
activities (or to drive the scheduler of an RTOS, as we will see in a following chapter). The HAL is
based on the convention that SysTick timer generates an interrupt ever 1ms. Since we are configuring
the SysTick clock so that it runs at the maximum core frequency of 48MHz (which means that the
SYSCLK performs 48.000.000 clock cycles every seconds), we can set the SysTick timer so that it
generates an interrupt every 48.000.000 cycles/1000ms = 48.000 clock cycles ¹².
• if SYSCLK source is the HSI oscillator, then returns the value based on the HSI_VALUE macro;
• if SYSCLK source is the HSE oscillator, then returns the value based on the HSE_VALUE macro;
• if SYSCLK source is the PLLCLK, then returns a value based on HSI_VALUE/HSE_VALUE
multiplied by the PLL factor, according the specific STM32 MCU implementation.
HSI_VALUE and HSE_VALUE macros are defined inside the stm32xxx_hal_conf.h file, and they are
hardcoded values. The HSI_VALUE is defined by ST during chip design, and we can trust the value
¹²As we will see in the next chapter, a timer is free counter module, that is a device that counts from 0 to a given value at every
clock cycle. Take note that, for the sake of completeness, the SysTick timer is a 24-bit downcounter timer, that is it counts from its
maximum value (0xFFFFFF) down to zero, and then automatically restarts again. The source clock of a timer establishes how fast
this timer counts. Since here we are specifying that the clock source for the SysTick timer is the HCLK (line 22), then the counter
will reach zero every 1ms.
¹³Pay attention that the Cortex-M core is not clocked by the SYSCLK frequency, but by the HCLK frequency, which could be
lowered by the AHB prescaler. So, to recap, the core frequency is equal to HAL_RCC_GetSysClockFreq()/AHB-prescaler.
Clock tree 303
of the corresponding macro (except for that 1% of accuracy). Instead, if we are using an external
oscillator as HSE source, we must provide the actual value for the HSE_VALUE macro, otherwise the
value returned by the HAL_RCC_GetSysClockFreq() function is wrong¹⁴. And this also affects the
tick frequency (that is, how long it takes to generate the timer interrupt) of the SysTick timer.
We can also retrieve the core frequency by using the SystemCoreClock CMSIS global variable.
Read carefully
If we decide to manipulate the clock tree configuration by hand without using CubeHAL
routines, we have to remember that every time we change the SYSCLK frequency, we need
to call the CMSIS function SystemCoreClockUpdate(), otherwise some CMSIS routines
may give wrong results. This function is automatically called for us by the HAL_RCC_-
ClockConfig() routine.
For example, to route the PLLCLK to MCO1 pin in an STM32F401RE MCU (which corresponds to
PA8 pin), we must invoke the above function in the following way:
Read carefully
Please, take note that when configuring the MCO pin as output GPIO, its speed (that is, the
slew rate) affects the quality of the output clock. Moreover, for higher clock frequencies
the compensation cell must be enabled in the following way:
HAL_EnableCompensationCell();
¹⁴The HAL_RCC_GetSysClockFreq() is defined to return an uint32_t. This means that it could return wrong results with
fractional values for the HSE oscillator.
Clock tree 304
void NMI_Handler(void) {
HAL_RCC_NMI_IRQHandler();
}
The right way to catch the failure of the HSE clock is by defining the callback:
void HAL_RCC_CSSCallback(void) {
//Catch the HSE failure and take proper actions
}
The frequency of the internal RC oscillator can be fine-tuned to achieve better accuracy with
wider temperature and supply voltage ranges. The trimming bits are used for this purpose. Five
trimming bits RCC_CR->HSITRIM[4:0] are used for fine-tuning. The default trimming value is 16.
An increase/decrease in this trimming value causes an increase/decrease in HSI frequency. The HSI
oscillator is fine-tuned in steps of 0.5% of the HSI clock speed:
The internal oscillator frequency is not measured directly but it is computed from the number of
clock pulses counted using a timer compared with the typical value. To do this, a very accurate
reference frequency must be available such as the LSE frequency provided by the external 32.768
kHz crystal or the 50 Hz/60 Hz of the mains.
ST provides several application notes describing better this procedure (for example, the AN4067¹⁷ is
about the calibration procedure in the STM32F0 family). Please, refer to those documents for more
information.
¹⁷http://bit.ly/1R8kEbf
11. Timers
Embedded devices perform some activities on a time basis. For really simple and inaccurate delays a
busy loop could carry out the task, but using the CPU core to perform time-related activities is never
a smart solution. For this reason, all microcontrollers provide dedicated hardware peripherals: the
timers. Timers are not only timebase generators, but they also provides several additional features
used to interact with the Cortex-M core and other peripherals, both internal and external to the
MCU.
Depending on the family and package used, STM32 microcontrollers implement a variable number of
timers, each one with specific characteristics. Some part numbers can provide up to 14 independent
timers. Different from the other peripherals, timers have almost the same implementation in all
STM32-series, and they are grouped inside nine distinct categories. The most relevant of these are:
basic, general purpose and advanced timers.
STM32 timers are an advanced peripheral that offer a wide range of customizations. Moreover, some
of their features are specific of the application domain. This would require a completely separated
book to deepen the topic (you have to consider that usually more than 250 pages of a typical STM32
datasheet is dedicated to timers). This chapter, which is undoubtedly the longest in the book, tries
to shape the most relevant concepts regarding basic and general purpose timers in STM32 MCUs,
looking to the related CubeHAL module used to program them.
• They can be used as time base generator (which is the feature common to all STM32 timers).
• They can be used to measure the frequency of an external event (input capture mode).
• To control an output waveform, or to indicate when a period of time has elapsed (output
compare mode).
¹This is not entirely true, but it is ok to consider it true here.
306
Timers 307
– One pulse mode (OPM) is a particular case of the input capture mode and the output
compare mode. It allows the counter to be started in response to a stimulus and to
generate a pulse with a programmable length after a programmable delay.
• To generate PWM signals in edge-aligned mode or center-aligned mode independently on
each channel (PWM mode).
– In some STM32 MCUs (notably from STM32F3 and recent STM32L4 series), some timers
can generate a center-aligned PWM signals with a programmable delay and phase shift.
Depending on the timer type, a timer can generate interrupts or DMA requests when the following
events occur:
• Update events
– Counter overflow/underflow
– Counter initialized
– Others
• Trigger
– Counter start/stop
– Counter Initialize
– Others
• Input capture/Output compare
• Basic timers: timers from this category are the simplest form of timers in STM32 MCUs. They
are 16-bit timers used as time base generator, and they do not have output/input pins. Basic
timers are also used to feed the DAC peripheral, since their update event can trigger DMA
requests for the DAC (for this reason they are usually available in STM32 MCUs providing at
least a DAC). Basic timers can be also used as “masters” for other timers.
• General purpose timers: they are 16/32-bit timers (depending on the STM32-series) providing
the classical features that a timer of a modern embedded microcontroller is expected to
implement. They are used in any application for output compare (timing and delay gener-
ation), One-Pulse Mode, input capture (for external signal frequency measurement), sensor
interface (encoder, hall sensor), etc. Obviously, a general purpose timer can be used as time
base generator, like a basic timer. Timers from this category provide four-programmable
input/output channels.
– 1-channel/2-channels: they are two subgroups of general purpose timers providing only
one/two input/output channel.
– 1-channel/2-channels with one complimentary output: same as previous type, but
having a dead time generator on one channel. This allows having complementary signals
with a time base independent from the advanced timers.
Timers 308
• Advanced timers: these timers are the most complete ones in an STM32 MCU. In addition to
the features found in a general purpose timer, they include several features related to motor
control and digital power conversion applications: three complementary signals with dead
time insertion, emergency shut-down input.
• High resolution timer: The high-resolution timer (HRTIM1) is a special timer provided by
some microcontrollers from the STM32F3 series (which is the series dedicated to motor control
and power conversion). It allows generating digital signals with high-accuracy timings, such
as PWM or phase-shifted pulses. It consists of 6 sub-timers, 1 master and 5 slaves, totaling 10
high-resolution outputs, which can be coupled by pairs for dead time insertion. It also features
5 fault inputs for protection purposes and 10 inputs to handle external events such as current
limitation, zero voltage or zero current switching.
HRTIM1 timer is made of a digital kernel clocked at 144 MHz followed by delay lines. Delay
lines with closed loop control guarantee a 217ps resolution whatever the voltage, temperature
or chip-to-chip manufacturing process deviation. The high-resolution is available on the 10
outputs in all operating modes: variable duty cycle, variable frequency, and constant ON time.
This book will not cover HRTIM1 timer.
• Low-power timers: timers from this group are especially designed for low-power applica-
tions. Thanks to their diversity of clock sources, these timers are able to keep running in all
power modes (except for Standby mode). Given this capability to run even with no internal
clock source, Low-power timers can be used as a “pulse counter” which can be useful in some
applications. They also have the capability to wake up the system from low-power modes.
Low-power timers will be explored in a subsequent chapter.
Table 1² summarizes the most relevant features to keep on hand for each timer category.
²The table is adapted from the one found in the AN4013(http://bit.ly/1WAewd6) from ST, an application note dedicated to
STM32 timers.
Timers 309
• Given a specific timer (e.g. TIM1, TIM8, etc.), its implementation (features, number and types
Timers 310
of registers, generated interrupts, DMA requests, peripheral interconnection³, etc.) is the same⁴
in all STM32 MCUs. This guarantees you that a firmware written to use a specific timer is
portable to other MCUs or STM32-series having the same timer.
• The effective presence of a timer in an MCU belonging to the given family depends on sales
type and the package used (packages with more pins may provide all timers implemented by
that family).
• The table was extracted, expanded and rearranged from the AN4013⁵. I have checked carefully
the values reported in that table, and found some non-updated things. However, I am not
totally sure that it faithfully adheres to actual implementation⁶ of the whole STM32 portfolio
(I should check more than 600 microcontrollers to be sure of that values). For this reason, I
have leaved cells empty, so you can eventually add values if you discover a mistake⁷.
Table 3 reports the list of all timers implemented by the MCUs equipping the sixteen Nucleo boards
we are considering in this book. It is important to underline some things reported in Table 3:
When dealing with timers, it is important to have a pragmatic approach. Otherwise, it is really easy
to get lost in their settings and in the corresponding HAL routines (the HAL_TIM and HAL_TIM_EX
modules are among the most articulated in the CubeHAL).
For this reason, we will start studying how to use basic timers, whose functionalities are also
common to more advanced STM32 timers.
³With the term peripheral interconnection we indicate the ability of some peripherals to “trigger” other peripherals, or to fire
some of their DMA requests (for example, the TIM6 update event can trigger the DAC1 conversion). More about this topic in a
subsequent chapter.
⁴As said at the beginning of this chapter, STM32 timers are the only peripherals that share the same implementation among
all STM32 families. This is almost true, except for TIM2 and TIM5 timers, which have a 32-bit resolution in the majority of STM32
MCUs and 16-bit resolution in some early STM32 MCUs. Moreover, some really specific features may have a slight different
implementation between some STM32 series (especially between more “old” STM32F1 microcontrollers and more recent STM32F4
ones). Always consult the datasheet for your MCU before you plan to use a really dedicated feature provided by some timers.
⁵http://bit.ly/1WAewd6
⁶The table was arranged in February 2016. STM32 MCUs evolve almost day-by-day, so some things may be changed when you
read this chapter (for example, I suspect that ST is going to release an STM32L1 MCU with at least one low-power timer soon).
⁷And eventually send me an email so that I can correct the table in next releases of the book :-)
Timers 311
Table 3: which timers are implemented in each STM32 MCU equipping sixteen Nucleo boards
typedef struct {
TIM_TypeDef *Instance; /* Pointer to timer descriptor */
TIM_Base_InitTypeDef Init; /* TIM Time Base required parameters */
HAL_TIM_ActiveChannel Channel; /* Active channel */
DMA_HandleTypeDef *hdma[7]; /* DMA Handlers array */
HAL_LockTypeDef Lock; /* Locking object */
__IO HAL_TIM_StateTypeDef State; /* TIM operation state */
} TIM_HandleTypeDef;
Let us see more in depth the most important fields of this struct.
• Instance: is the pointer to the TIM descriptor we are going to use. For example, TIM6 is one
of the basic timers available in the majority of STM32 microcontrollers.
• Init: is an instance of the C struct TIM_Base_InitTypeDef, which is used to configure the
base timer functionalities. We will study it more in depth in a while.
• Channel: it indicates the number of active channels in timers that provide one or more
input/output channels (this is not the case of basic timers). It can assume one or more values
from the enum HAL_TIM_ActiveChannel, and we will study its usage in a next paragraph.
• *hdma[7]: this is an array containing the pointers to DMA_HandleTypeDef descriptors for DMA
requests associated to the timer. As we will see later, a timer can generate up to seven DMA
requests used to drive its features.
• State: this is used internally by the HAL to keep track of the timer state.
All the timer configuration activities are performed by using an instance of the C struct TIM_-
Base_InitTypeDef, which is defined in the following way:
Timers 313
typedef struct {
uint32_t Prescaler; /* Specifies the prescaler value used to divide the TIM clock. */
uint32_t CounterMode; /* Specifies the counter mode. */
uint32_t Period; /* Specifies the period value to be loaded into the active
Auto-Reload Register at the next update event. */
uint32_t ClockDivision; /* Specifies the clock division. */
uint32_t RepetitionCounter; /* Specifies the repetition counter value. */
} TIM_Base_InitTypeDef;
• Prescaler: it divides the timer clock by a factor ranging from 1 up to 65535 (this means that the
prescaler register has a 16-bit resolution). For example, if the bus where the timer is connected
runs at 48MHz, then a prescaler value equal to 48 lowers the counting frequency to 1MHz.
• CounterMode: it defines the counting direction of the timer, and it can assume one of the
values from Table 4. Some counting modes are available only in general purpose and advanced
timers. For basic timers, only the TIM_COUNTERMODE_UP is defined.
• Period: sets the maximum value for the timer counter before it restarts counting again. This
can assume a value from 0x1 to 0xFFFF (65535) for 16-bit timers, and from 0x1 to 0xFFFF FFFF
for TIM2 and TIM5 timers in those MCUs that implement them as 32-bit timers. If Period is
set to 0x0 the timer does not start.
• ClockDivision: this bit-field indicates the division ratio between the internal timer clock
frequency and sampling clock used by the digital filters on ETRx and TIx pins. It can assume
one value from Table 5, and it is available only in general purpose and advanced timers. We
will study digital filters on input pins of a timer later in this chapter. This field is also used by
the dead time generator (a feature non described in this book).
• RepetitionCounter: every timer has a specific update register that keeps track of the timer
overflow/underflow condition. This can also generate a specific IRQ, as we will see next. The
RepetitionCounter says how many times the timer overflows/underflows before the update
register is set, and the corresponding event is raised (if enabled). RepetitionCounter is only
available in advanced timers.
Table 5: available ClockDivision modes for general purpose and advanced timers
• is a free-running counter, which counts from 0 up to the value specified in the Period⁹ field
in the TIM_Base_InitTypeDef initialization structure, which can assume the maximum value
of 0xFFFF (0xFFFF FFFF for 32-bit timers);
• the counting frequency depends on the speed of the bus where the timer is connected, and
it can be lowered up to 65536 times by setting the Prescaler register in the initialization
structure;
• when the timer reaches the Period value, it overflows and the Update Event (UEV) flag is
set¹⁰; the timer automatically restarts counting again from the initial value (which is always
zero for basic timers)¹¹.
⁹The Period is used to fill the Auto-reload register (ARR) of the timer. I do not know why ST engineers have decided to name
it in this way, since ARR is the register name used in all ST datasheets. This can lead to a lot of confusion, especially when you are
new to the CubeHAL, but unfortunately there is nothing we can do.
¹⁰The Update Event (UEV) is latched to the prescaler clock, and it is automatically cleared on the next clock edge. Don’t confuse
the UEV with the Update Interrupt Flag (UIF), which must be cleared manually like every other IRQ. UIF is set only when the
corresponding interrupt is enabled. As we will discover in a following chapter, the UEV event, like all event flags set for other
peripherals, allows to wake up the MCU when it entered a low-power mode using the WFE instruction.
¹¹This is an important distinction with other microcontroller architectures (especially 8-bit ones) where timers need to be
“rearmed” manually before they can start counting again.
Timers 315
The Period and Prescaler registers determine the timer frequency, that is how long does it takes to
overflow (or, if you prefer, how often an Update Event is generated), according this simply formula:
T imerclock
U pdateEvent = [1]
(P rescaler + 1)(P eriod + 1)
For example, assume a timer connected to the APB1 bus in an STM32F030 MCU, with the HCLK set
to 48MHz, a Prescaler value equal to 47999 and a Period value equal to 499. We have that timer
will overflow at every:
48.000.000 1
U pdateEvent = = 2Hz = s = 0.5s
(47999 + 1)(499 + 1) 2
The following code, designed to run on a Nucleo-F030R8, shows a complete example using the
TIM6¹². The example is nothing more than the classical blinking LED, but this time we use a basic
timer to compute delays.
Filename: src/main-ex1.c
7 TIM_HandleTypeDef htim6;
8
9 int main(void) {
10 HAL_Init();
11
12 Nucleo_BSP_Init();
13
14 htim6.Instance = TIM6;
15 htim6.Init.Prescaler = 47999; //48MHz/48000 = 1000Hz
16 htim6.Init.Period = 499; //1000HZ / 500 = 2Hz = 0.5s
17
18 __HAL_RCC_TIM6_CLK_ENABLE(); //Enable the TIM6 peripheral
19
20 HAL_NVIC_SetPriority(TIM6_IRQn, 0, 0); //Enable the peripheral IRQ
21 HAL_NVIC_EnableIRQ(TIM6_IRQn);
22
23 HAL_TIM_Base_Init(&htim6); //Configure the timer
24 HAL_TIM_Base_Start_IT(&htim6); //Start the timer
25
26 while (1);
27 }
28
29 void TIM6_IRQHandler(void) {
30 // Pass the control to HAL, which processes the IRQ
¹²Owners of the Nucleo boards equipping F411, F401 and F103 STM32 MCUs will find a slight different example using a general
purpose timer. However, concepts remain the same.
Timers 316
31 HAL_TIM_IRQHandler(&htim6);
32 }
33
34 void HAL_TIM_PeriodElapsedCallback(TIM_HandleTypeDef *htim) {
35 // This callback is automatically called by the HAL on the UEV event
36 if(htim->Instance == TIM6)
37 HAL_GPIO_TogglePin(LD2_GPIO_Port, LD2_Pin);
38 }
Lines [15:17] configure TIM6 using the Prescaler and Period values computed before. The timer
peripheral is then enabled by using the macro at line 19. The same applies to its IRQ. The timer is then
configured at line 24 and started in interrupt mode using the HAL_TIM_Base_Start_IT() function¹³.
The rest of the code is really similar to what seen until now.
The TIM6_IRQHandler() ISR fires when the timer overflows, and the HAL_TIM_IRQHandler() is then
called. The HAL will automatically handle for us all the necessary operations to properly manage
the update event, and it will call the HAL_TIM_PeriodElapsedCallback() routine to signal us that
the timer has been overflowed.
So far we have seen that all base functionalities of a timer are configured through an instance of
the TIM_Base_InitTypeDef struct. This struct contains a field named RepetitionCounter used to
further increase the period between two consecutive update events: the timer will count a given num-
ber of times before setting the event and raising the corresponding interrupt. RepetitionCounter
is only available in advanced timers, and this causes that the formula to compute the frequency of
update events becomes:
T imerclock
U pdateEvent =
(P rescaler + 1)(P eriod + 1)(RepetitionCounter + 1)
Leaving the RepetitionCounter equal to zero (default behaviour), we obtain the same working
mode of a basic timer.
...
while (1) {
if(__HAL_TIM_GET_COUNTER(&tim) == value)
...
That way to poll for a timer is completely wrong, even if it apparently works in some examples.
Why?
Timers run independently from the Cortex-M core. A timer can count really fast, up to the same
clock frequency of the CPU core. But checking a timer counter for equality (that is, to check if it
is equal to a given value) requires several ARM assembly instructions, which in turn need several
clock cycles. There is no guarantee that the CPU accesses to the counter register exactly at the same
time it reaches the configured value (this happens only if the timer runs really slow). A better way
is to check if the timer current counter value is equal or greater than the given value, or to check
against the UIF flag status¹⁵: in the worst case we can have a shift in time measuring, but we will not
lose the event at all (unless the timer runs really fast and we lose the subsequent events because the
interrupt is masked - that is, UIF flag it still set before it is cleared manually by us or automatically
by the HAL).
¹⁵However this requires that the timer is enabled in interrupt mode, using the HAL_TIM_Base_Start_IT() function.
Timers 318
...
while (1) {
if(__HAL_TIM_GET_FLAG(&tim) == TIM_FLAG_UPDATE) {
//Clear the IRQ flag otherwise we lose other events
__HAL_TIM_CLEAR_IT(htim, TIM_IT_UPDATE);
...
However, timers are asynchronous peripherals, and the correct way to manage the overflow/under-
flow event is by using interrupts. There is no reason to not use a timer in interrupt mode, unless
the timer runs really fast and generating an interrupt after few microseconds (or even nanoseconds)
would completely flood the MCU preventing it from processing other instructions¹⁶.
Table 6: DMA requests (the most of them are available only in general purpose and advanced timers
The following example is another variation of the blinking LED application, but this time we use a
¹⁶Remember that even if the exception handling in a Cortex-M MCU has a deterministic latency (Cortex-M3/4/7 cores serve
an interrupt in 12 CPU cycles, while Cortex-M0 does it in 15 cycles and Cortex-M0+ in 16 cycles) it has a non-negligible cost,
which requires several nanoseconds in “low-speed” MCUs (for example, for an STM32F030 MCU running at 48MHz, an interrupt
is serviced in about 300ns). This cost has to be added to the overhead introduced by the HAL during the interrupt management, as
seen before.
Timers 319
timer in DMA mode to turn the LED ON/OFF. Here we are going to use the TIM6 timer programmed
to overflow every 500ms: when this happens, the timer generates the TIM6_UP request (which in
an STM32F030 MCU is bound to the third channel of DMA1) and the next element of a buffer is
transferred to the GPIOA->ODR register in DMA circular mode, which causes that the LD2 blinks
indefinitely.
Read carefully
In STM32F2/F4/F7/L1/L4 families, only the DMA2 has full access to the Bus Matrix. This
means that only timers whose requests are bound to this DMA controller can be used to
perform transfers involving other peripheral (except for the internal and external volatile
memories). For this reasons, this example for Nucleo boards based on F2/F4/L1/L4 MCUs
use TIM1 as base generator.
In STM32F103R8, STM32F302RB and STM32F334R8, STM32L053R8 and STM32L073RZ
MCUs TIMx_UP request does not allow to trigger transfer between memory and
GPIO peripheral. So this example is not available for the corresponding Nucleo
boards.
Filename: src/main-ex2.c
13 int main(void) {
14 uint8_t data[] = {0xFF, 0x0};
15
16 HAL_Init();
17 Nucleo_BSP_Init();
18 MX_DMA_Init();
19
20 htim6.Instance = TIM6;
21 htim6.Init.Prescaler = 47999; //48MHz/48000 = 1000Hz
22 htim6.Init.Period = 499; //1000HZ / 500 = 2Hz = 0.5s
23 htim6.Init.CounterMode = TIM_COUNTERMODE_UP;
24 __HAL_RCC_TIM6_CLK_ENABLE();
25
26 HAL_TIM_Base_Init(&htim6);
27 HAL_TIM_Base_Start(&htim6);
28
29 hdma_tim6_up.Instance = DMA1_Channel3;
30 hdma_tim6_up.Init.Direction = DMA_MEMORY_TO_PERIPH;
31 hdma_tim6_up.Init.PeriphInc = DMA_PINC_DISABLE;
32 hdma_tim6_up.Init.MemInc = DMA_MINC_ENABLE;
33 hdma_tim6_up.Init.PeriphDataAlignment = DMA_PDATAALIGN_BYTE;
34 hdma_tim6_up.Init.MemDataAlignment = DMA_MDATAALIGN_BYTE;
35 hdma_tim6_up.Init.Mode = DMA_CIRCULAR;
36 hdma_tim6_up.Init.Priority = DMA_PRIORITY_LOW;
37 HAL_DMA_Init(&hdma_tim6_up);
Timers 320
38
39 HAL_DMA_Start(&hdma_tim6_up, (uint32_t)data, (uint32_t)&GPIOA->ODR, 2);
40 __HAL_TIM_ENABLE_DMA(&htim6, TIM_DMA_UPDATE);
41
42 while (1);
43 }
Lines [29:37] configure the DMA_HandleTypeDef for the DMA1_Channel3 in circular mode. Then line
39 starts the DMA transfer so that the content of the data buffer is transferred inside the GPIOA-
>ODR register every time a TIM6_UP request is generated, that is the timer overflows. This causes
that the LD2 LED blinks. Take note that we are not using the HAL_TIM_Base_Start_DMA() function
here. Why not?
Looking to the implementation of the HAL_TIM_Base_Start_DMA() routine, you can see that ST
engineers have defined it so that the DMA transfer is performed from the memory buffer to the
TIM6->ARR, which corresponds to the Period.
Basically, we can use the HAL_TIM_Base_Start_DMA() only to change the timer Period every time
it overflows. So we need to configure the DMA by ourself in order to perform this transfer.
Configuration view. The timer configuration view allows to setup the values for the Prescaler
and Period registers, as shown in Figure 2. CubeMX will generate all the necessary initialization
code inside the MX_TIMx_Init() function. Moreover, always in the same configuration dialog, it is
possible to enable timer-related IRQs and DMA requests.
Figure 2: CubeMX allows to easily generate the necessary code to configure a timer
(PSC), which in turn determines how fast the Counter Register (CNT) is increased/decreased. This
one is compared with the content of the auto-reload register (which is filled with the value of
the TIM_Base_InitTypeDef.Period field). When they match, the UEV event is generated, and the
corresponding IRQ is fired, if enabled.
Looking at Figure 3, we can see that a timer can receive “stimuluses” from other sources. These can
be divided in two main groups:
• Clock sources, which are used to clock the timer. They can come from external sources
connected to the MCU pins or from other timers connected internally to the MCU. Keep in
mind that a timer cannot work without a clock source, because this is used to increment the
counter register.
• Trigger sources, which are used to synchronize the timer with external sources connected
to the MCU pins or with other timers connected internally. For example, a timer can be
configured to start counting when an external event triggers it. In this case the timer is
clocked by another clock source (which can be both the APBx bus or an external clock source
Timers 323
connected to the ETR2 pin), and it is controlled (that is, when it starts counting, etc.) by another
device.
Depending on the timer type and its actual implementation, a timer can be clocked from:
Let us study these ways to clock/trigger a timer from an external source by analyzing practical
examples.
General purpose timers have the ability to be clocked from external sources, setting them in two
distinct modes: External Clock Source Mode 1 and 2. The fist one is available when the timer is
configured in slave mode. We will study this mode in the next paragraph.
The second mode is, instead, activated simply by using an external clock source. This allows to use
more accurate and dedicated sources, and to eventually further reduce the counting frequency. In
fact, when the External Clock Source Mode 2 is selected, the formula to compute the frequency of
update events becomes:
EXTclock
U pdateEvent = [2]
(EXTclock P rescaler)(P rescaler + 1)(P eriod + 1)(RepetitionCounter + 1)
Timers 324
where EXTclock is the frequency of the external source and EXTclock P rescaler is a source
frequency divider that can assume the values 1, 2, 4 and 8.
The clock source of a general purpose timer can be selected by using the function HAL_TIM_Config-
ClockSource() and an instance of the struct TIM_ClockConfigTypeDef, which is defined in the
following way:
typedef struct {
uint32_t ClockSource; /* TIM clock sources */
uint32_t ClockPolarity; /* TIM clock polarity */
uint32_t ClockPrescaler; /* TIM clock prescaler */
uint32_t ClockFilter; /* TIM clock filter */
} TIM_ClockConfigTypeDef;
• ClockSource: specifies the source of the clock signal used to bias the timer. It can assume a
value from the Table 7. By default, the TIM_CLOCKSOURCE_INTERNAL mode is selected.
• ClockPolarity: indicates the polarity of the clock signal used to bias the timer. It can assume
a value from the Table 8. By default, the TIM_CLOCKPOLARITY_RISING mode is selected.
• ClockPrescaler: specifies the prescaler for the external clock source. It can assume a value
from the Table 9. By default, the TIM_CLOCKPRESCALER_DIV1 value is selected.
• ClockFilter: this 4-bit field defines the frequency used to sample the external clock signal
and the length of the digital filter applied to it. The digital filter is made of an event counter
in which N consecutive events are needed to validate a transition on the output. Refer to the
datasheet of your MCU about how the fDT S (Dead-Time Signal) is computed. By default, the
filter is disabled.
Table 7: available clock source modes for general purpose and advanced timers
¹⁸In the ST documentation these modes are also called External Trigger mode 1 and 2 (ETR1 and ETR2).
Timers 325
Table 8: available external clock polarity modes for general purpose and advanced timers
Table 9: available external clock prescaler modes for general purpose and advanced timers
Let us build an example that shows how to use an external clock source for the TIM3 timer. The
example consists in routing the Master Clock Output (MCO) pin to the TIM3_ETR2 pin, which
corresponds to PD2 pin for all Nucleo boards providing this timer. This can easily done by using
the Morpho connectors, as shown in Figure 4 for the Nucleo-F030R8 (for your Nucleo, use CubeMX
tool to identify the MCO pin and the corresponding pinout diagram from Appendix C).
Figure 4: how to route the MCO pin to the TIM3_ETR pin in a Nucleo-F030R8 board
Timers 326
The MCO pin is enabled and connected to the LSE clock source, which runs at 32.768KHz¹⁹. The
following code shows the most relevant parts of the example.
Filename: src/main-ex3.c
23 void MX_TIM3_Init(void) {
24 TIM_ClockConfigTypeDef sClockSourceConfig;
25
26 htim3.Instance = TIM3;
27 htim3.Init.Prescaler = 0;
28 htim3.Init.CounterMode = TIM_COUNTERMODE_UP;
29 htim3.Init.Period = 16383;
30 htim3.Init.ClockDivision = TIM_CLOCKDIVISION_DIV1;
31 htim3.Init.RepetitionCounter = 0;
32 HAL_TIM_Base_Init(&htim3);
33
34 sClockSourceConfig.ClockSource = TIM_CLOCKSOURCE_ETRMODE2;
35 sClockSourceConfig.ClockPolarity = TIM_CLOCKPOLARITY_NONINVERTED;
36 sClockSourceConfig.ClockPrescaler = TIM_CLOCKPRESCALER_DIV1;
37 sClockSourceConfig.ClockFilter = 0;
38 HAL_TIM_ConfigClockSource(&htim3, &sClockSourceConfig);
39
40 HAL_NVIC_SetPriority(TIM3_IRQn, 0, 0);
41 HAL_NVIC_EnableIRQ(TIM3_IRQn);
42 }
43
44 void HAL_TIM_Base_MspInit(TIM_HandleTypeDef* htim_base) {
45 GPIO_InitTypeDef GPIO_InitStruct;
46 if(htim_base->Instance==TIM3) {
47 /* Peripheral clock enable */
48 __HAL_RCC_TIM3_CLK_ENABLE();
49 __HAL_RCC_GPIOD_CLK_ENABLE();
50
51 /**TIM3 GPIO Configuration
52 PD2 ------> TIM3_ETR
53 */
54 GPIO_InitStruct.Pin = GPIO_PIN_2;
55 GPIO_InitStruct.Mode = GPIO_MODE_AF_PP;
56 GPIO_InitStruct.Pull = GPIO_NOPULL;
57 GPIO_InitStruct.Speed = GPIO_SPEED_LOW;
58 HAL_GPIO_Init(GPIOD, &GPIO_InitStruct);
59 }
60 }
¹⁹Unfortunately, early releases of the Nucleo boards do not provide an external low speed clock source. If this is your case,
rearrange the examples so that the LSI oscillator is used. Moreover, it is not possible to route either LSI nor LSE to the MCO pin in
an STM32F103R8 MCU. For this reason, this example on the Nucleo-F103R8 uses the HSI as MCO source.
Timers 327
Lines [27:33] configure the TIM3 timer, setting its period to 19999. Lines [34:38] configure the
external clock source for TIM3. Since the LSE oscillator runs at 32.768KHz, using the equation [2]
we can compute the UEV frequency, which is equal to:
32.768
U pdateEvent = = 2Hz = 0.5s
(1)(0 + 1)(16383 + 1)(0 + 1)
Finally, lines [48:58] enable the TIM3 and configure the PD2 pin (which corresponds to the TIM3_-
ETR2 pin) as input source.
Read carefully
It is important to underline that the GPIO port D must be enabled, before we can use it as
clock source for TIM3, by using the __GPIOD_CLK_ENABLE() macro. The same applies even
to TIM3, which is enabled by using the __TIM3_CLK_ENABLE(): this is required because the
external clocks are not directly feeding the prescaler, but they are first synchronized
with the APBx clock through dedicated logical blocks.
STM32 general purpose and advanced timers can be configured to work in master or slave mode²⁰.
When configured to act as a slave, a timer can be fed by internal ITR0, ITR1, ITR2 and ITR3 lines,
an external clock connected to the ETR1 pin or from other clock sources connected to TI1FP1 and
TI2FP2 sources, which correspond to Channel 1 and 2 input pins. This working mode is called
External Clock Mode 1.
The External Clock Mode 1 and 2 are rather confusing for all novices of the STM32 platform.
Both modes are a way to clock a timer using an external clock source, but the first one is
achieved by configuring the timer in slave mode (it is indeed a form of “triggering”), while
the second one is obtained by simply selecting a different clock source. I do not know the
origin of this nomenclature, and what are the practical effects of this distinction. However,
it is important to remark here that the ways to configure a timer in ETR1 or ETR2 mode
are completely different, as we will see in the next example.
Looking to Figure 16 we can see that the TI1FP1 and TI2FP2 inputs are nothing more than
the TI1 and TI2 input channels of a timer after the input filter has been applied.
typedef struct {
uint32_t SlaveMode; /* Slave mode selection */
uint32_t InputTrigger; /* Input Trigger source */
uint32_t TriggerPolarity; /* Input Trigger polarity */
uint32_t TriggerPrescaler; /* Input trigger prescaler */
uint32_t TriggerFilter; /* Input trigger filter */
} TIM_SlaveConfigTypeDef;
Table 10: available slave modes for general purpose and advanced timers
Table 11: available trigger/clock sources for a timer working in slave mode
Table 12: available trigger/clock polarity modes for a timer working in slave mode
Table 13: available trigger/clock prescaler modes for a timer working in slave mode
When the External Clock Source Mode 1 is selected, the formula to compute the frequency of update
events becomes:
Timers 330
T RGIclock
U pdateEvent = [3]
(P rescaler + 1)(P eriod + 1)(RepetitionCounter + 1)
where T RGIclock is the frequency of the clock source connected to the ETR1 pin, the frequency of
the internal/external trigger clock source connected to internal lines ITR0..ITR3 or the frequency of
signal connected to external channels TI1FP1..T2FP2.
So, let us recap what seen until now:
• a timer can be clocked by an external source when working only in master mode²² by
connecting this source to the ETR2 pin;
• if the timer is working in slave mode, then it can be clocked by a signal connected to the ETR1
pin, by any trigger source connected to the internal lines ITR0…ITR2 (hence, the clock source
can be only another timer) or by an input signal connected to the timer channels TI1 and TI2,
which becomes TI1FP1 and TI2FP2 if the input filtering stage is activated.
Let us build another example that shows how to use an external clock source for the TIM3 timer.
The example consists in routing the Master Clock Output (MCO) pin to the TI2FP2 pin (that is, the
second channel of TIM3 timer), which in a Nucleo-F030R8 corresponds to PA7 pin. This can easily
done by using the Morpho connectors, as shown in Figure 5 (for your Nucleo, use CubeMX tool to
identify both MCO and TI2FP2 pins).
Figure 5: how to route the MCO pin to the TI2FP2 pin in a Nucleo-F030R8 board
The MCO pin is enabled and connected to the LSE clock source, as seen in the previous example.
The following code shows the most relevant parts of the example.
²²As we will discover later, the master/slave mode of a timer is not exclusively: a timer can be configured to work as a master
and slave at the same time.
Timers 331
Filename: src/main-ex4.c
24 void MX_TIM3_Init(void) {
25 TIM_SlaveConfigTypeDef sSlaveConfig;
26
27 htim3.Instance = TIM3;
28 htim3.Init.Prescaler = 0;
29 htim3.Init.CounterMode = TIM_COUNTERMODE_UP;
30 htim3.Init.Period = 16383;
31 htim3.Init.ClockDivision = TIM_CLOCKDIVISION_DIV1;
32 HAL_TIM_Base_Init(&htim3);
33
34 sSlaveConfig.SlaveMode = TIM_SLAVEMODE_EXTERNAL1;
35 sSlaveConfig.InputTrigger = TIM_TS_TI2FP2;
36 sSlaveConfig.TriggerPolarity = TIM_TRIGGERPOLARITY_RISING;
37 sSlaveConfig.TriggerFilter = 0;
38 HAL_TIM_SlaveConfigSynchronization(&htim3, &sSlaveConfig);
39
40 HAL_NVIC_SetPriority(TIM3_IRQn, 0, 0);
41 HAL_NVIC_EnableIRQ(TIM3_IRQn);
42 }
43
44 void HAL_TIM_Base_MspInit(TIM_HandleTypeDef* htim_base) {
45 GPIO_InitTypeDef GPIO_InitStruct;
46 if(htim_base->Instance==TIM3) {
47 /* Peripheral clock enable */
48 __HAL_RCC_TIM3_CLK_ENABLE();
49 __HAL_RCC_GPIOA_CLK_ENABLE();
50
51 /**TIM3 GPIO Configuration
52 PA7 ------> TIM3_CH2
53 */
54 GPIO_InitStruct.Pin = GPIO_PIN_7;
55 GPIO_InitStruct.Mode = GPIO_MODE_AF_PP;
56 GPIO_InitStruct.Pull = GPIO_NOPULL;
57 GPIO_InitStruct.Speed = GPIO_SPEED_LOW;
58 GPIO_InitStruct.Alternate = GPIO_AF1_TIM3;
59 HAL_GPIO_Init(GPIOA, &GPIO_InitStruct);
60 }
Lines [34:38] configure TIM3 in slave mode. The input trigger source is set to TI2FP2, and the timer
is synchronized to the rising edge of the input signal. Finally, lines [54:59] configure the PA7 as input
pin for the second channel of TIM3.
Timers 332
11.3.1.3 Using CubeMX to configure the source clock of a general purpose timer
Configuring the clock source of a general purpose timer can be a nightmare, especially for novices of
the STM32 platform. CubeMX can simplify this process, even if a good understanding of master/slave
modes and ETR1 and ETR2 modes is required.
To configure the timer in External Clock Mode 2 it is sufficient to select ETR2 as clock source from
the Pinout view, as shown in Figure 6.
Once the clock source is selected, it is possible to set the external clock filter, polarity and prescaler
from the timer configuration dialog, as shown in Figure 7.
To configure the timer in External Clock Mode 1, we have to select this mode from the Slave entry
and then select the Trigger Source (which in this case is the clock source for the timer), as shown
in Figure L.
Timers 333
Figure L: how to select the ETR1 mode from the IP tree pane
Once the clock source is selected, it is possible to set the other configuration parameters from the
timer configuration dialog (not shown here).
Figure 9: the TIM1 can fed the TIM2 timer through the ITR0 line
A timer configured as slave can also simultaneously act as master for another timer, allowing to
create complex networks of timers. For example, the Figure 10 shows how timers can be connected
in cascade, while Figure 11 shows how timers can form hierarchical structures using combinations
of master/slave modes. Note that TIM1, TIM2 and TIM3 are internally interconnected through the
²³Some STM32 microcontrollers, notably STM32F3 ones, provide two independent trigger lines, named TRGO1 and TRGO2.
This case is not shown in this book.
Timers 334
same ITR0 line. This allows to synchronize several timers upon the same event (reset, enable, update,
etc.).
Figure 10: the combination of master/slave modes allows to configure timers in cascade
Figure 11: the combination of master/slave modes allows to configure timers in a hierarchical structure
typedef struct {
uint32_t MasterOutputTrigger; /* Trigger output (TRGO) selection */
uint32_t MasterSlaveMode; /* Master/slave mode selection */
} TIM_MasterConfigTypeDef;
• MasterOutputTrigger: specifies the behaviour of the TRGO output and it can assume a value
from Table 14.
• MasterSlaveMode: it is used to enable/disable the master/slave mode of a timer. It can assume
the values TIM_MASTERSLAVEMODE_ENABLE or TIM_MASTERSLAVEMODE_DISABLE.
Timers 335
Table 14: available trigger/clock sources for a timer working in slave mode
Let us see an example that shows how to configure TIM1 and TIM3 in cascade mode, with TIM1 as
master for TIM3 timer. TIM1 is used as clock source for TIM3 through the ITR0 line. Moreover, the
TIM1 is configured so that it starts counting upon an external event on its TI1FP1 line, which in a
Nucleo-F030 corresponds to PA8 pin: TIM1 starts counting when the PA8 pin goes high, and then it
feeds the TIM3 timer through the ITR0 line.
Filename: src/main-ex5.c
12 int main(void) {
13 HAL_Init();
14
15 Nucleo_BSP_Init();
16 MX_TIM1_Init();
17 MX_TIM3_Init();
18
19 HAL_TIM_Base_Start_IT(&htim3);
20
21 while (1);
22 }
23
24 void MX_TIM1_Init(void) {
25 TIM_ClockConfigTypeDef sClockSourceConfig;
26 TIM_MasterConfigTypeDef sMasterConfig;
27 TIM_SlaveConfigTypeDef sSlaveConfig;
Timers 336
28
29 htim1.Instance = TIM1;
30 htim1.Init.Prescaler = 47999;
31 htim1.Init.CounterMode = TIM_COUNTERMODE_UP;
32 htim1.Init.Period = 249;
33 htim1.Init.ClockDivision = TIM_CLOCKDIVISION_DIV1;
34 htim1.Init.RepetitionCounter = 0;
35 HAL_TIM_Base_Init(&htim1);
36
37 sClockSourceConfig.ClockSource = TIM_CLOCKSOURCE_INTERNAL;
38 HAL_TIM_ConfigClockSource(&htim1, &sClockSourceConfig);
39
40 sSlaveConfig.SlaveMode = TIM_SLAVEMODE_TRIGGER;
41 sSlaveConfig.InputTrigger = TIM_TS_TI1FP1;
42 sSlaveConfig.TriggerPolarity = TIM_TRIGGERPOLARITY_RISING;
43 sSlaveConfig.TriggerFilter = 15;
44 HAL_TIM_SlaveConfigSynchronization(&htim1, &sSlaveConfig);
45
46 sMasterConfig.MasterOutputTrigger = TIM_TRGO_UPDATE;
47 sMasterConfig.MasterSlaveMode = TIM_MASTERSLAVEMODE_ENABLE;
48 HAL_TIMEx_MasterConfigSynchronization(&htim1, &sMasterConfig);
49 }
50
51 void MX_TIM3_Init(void) {
52 TIM_SlaveConfigTypeDef sSlaveConfig;
53
54 htim3.Instance = TIM3;
55 htim3.Init.Prescaler = 0;
56 htim3.Init.CounterMode = TIM_COUNTERMODE_UP;
57 htim3.Init.Period = 1;
58 htim3.Init.ClockDivision = TIM_CLOCKDIVISION_DIV1;
59 HAL_TIM_Base_Init(&htim3);
60
61 sSlaveConfig.SlaveMode = TIM_SLAVEMODE_EXTERNAL1;
62 sSlaveConfig.InputTrigger = TIM_TS_ITR0;
63 HAL_TIM_SlaveConfigSynchronization(&htim3, &sSlaveConfig);
64
65 HAL_NVIC_SetPriority(TIM3_IRQn, 0, 0);
66 HAL_NVIC_EnableIRQ(TIM3_IRQn);
67 }
68
69 void HAL_TIM_Base_MspInit(TIM_HandleTypeDef* htim_base) {
70 GPIO_InitTypeDef GPIO_InitStruct;
71 if(htim_base->Instance==TIM3) {
72 __HAL_RCC_TIM3_CLK_ENABLE();
Timers 337
73 }
74
75 if(htim_base->Instance==TIM1) {
76 __HAL_RCC_TIM1_CLK_ENABLE();
77
78 GPIO_InitStruct.Pin = GPIO_PIN_8;
79 GPIO_InitStruct.Mode = GPIO_MODE_AF_PP;
80 GPIO_InitStruct.Pull = GPIO_PULLDOWN;
81 GPIO_InitStruct.Speed = GPIO_SPEED_LOW;
82 GPIO_InitStruct.Alternate = GPIO_AF2_TIM1;
83 HAL_GPIO_Init(GPIOA, &GPIO_InitStruct);
84 }
85 }
Lines [29:38] configure TIM1 to be clocked from the internal APB1 bus. Lines [40:44] configure TIM1
in slave mode, so that it starts counting when the TI1FP1 line goes high (that is, it is triggered).
PA8 GPIO is configured accordingly in lines [74:79] (it is configured as GPIO_AF2_TIM1). Take note
that the internal pull-down resistor is activated in line 76: this prevents that a floating input could
accidentally trigger the timer. For the same reason, the TriggerFilter is set to the maximum level
at line 43 (if you try to set it to zero, you will notice that it is really easy to trigger accidentally the
timer, even by simply touching the wire connected to PA8 pin).
Lines [46:48] configure TIM1 to work also in master mode. The timer will trigger its internal line
(which is connected to the ITR0 line of TIM3) every time the update event is generated. Finally, lines
[61:63] configure the TIM3 in External Clock Mode 1, selecting the ITR0 line as source clock.
Note that the, in order to have LD2 LED blinking every 500ms (2Hz), the TIM1 period is set
to 249²⁴, which causes that the update frequency of TIM1 is 4Hz. This is required because,
applying the equation [3], we have that:
4Hz
U pdateEvent = = 2Hz = 0.5s
(0 + 1)(1 + 1)(0 + 1)
To trigger TIM1 you have to connect the PA8 pin to a +3V3 source. Figure 12 shows how to connect
it in a Nucleo-F030.
Finally, note that we do not call the HAL_TIM_Base_Start() function for the TIM1 timer (see the
main() routine), because the timer is started upon the trigger event generated on Channel 1 (that is,
we tight the PA8 pin to the +3V3 source).
²⁴Clearly, that prescaler value is referred to an STM32F030R8 MCU running at 48MHz. For your Nucleo, check the book examples
for the right prescaler setting.
Timers 338
Figure 12: how to connect the TI2FP2 pin to AVDD pin in a Nucleo-F030R8 board
When a timer works in slave mode, the timer IRQ is raised, if enabled, every time the specified
trigger event occurs. For example, when the master clock triggers due to an update event, the IRQ
of the slave timer is faired and we can be notified of this by defining the callback:
By default, the HAL_TIM_Base_Start_IT() does not enable this type of interrupt. We have to use
the function HAL_TIM_SlaveConfigSynchronization_IT(), instead of the function HAL_TIM_Slave-
ConfigSynchronization(). Obviously, the corresponding ISR must be defined, and the function
HAL_TIM_IRQHandler() has to be called from it.
To configure a timer in slave mode from CubeMX, it is sufficient to select the desired trigger mode
(Reset Mode, Gated Mode, Trigger Mode) from the IP Pane tree (Slave mode combo-box), and
then select the Trigger Source, as shown in Figure 13. Remember that a timer configured in slave
mode, and not working in External Clock Mode 1, must be clocked from the internal clock or by the
ETR2 clock source.
Timers 339
Instead, to enable the master mode, we have to select this mode from the timer configuration view,
as shown in Figure 14. Once the master mode is selected, it is possible to select the TRGO source
event.
which accepts the pointer to the timer handle and the event to generate. The EventSource parameter
can assume one value from Table 15.
The TIM_EVENTSOURCE_UPDATE plays two important roles. The first one is related to the way
the Period register (that is the TIMx->ARR register) is updated when the timer is running. By
default, the content of the ARR register is transferred to the internal shadow register when the
TIM_EVENTSOURCE_UPDATE event is generated, unless the timer is differently configured. More about
this later.
The TIM_EVENTSOURCE_UPDATE event is also useful when the TRGO output of a timer configured
as master is set in TIM_TRGO_RESET mode: in this case, the slave timer will be triggered only if the
TIMx->EGR register is used to generate the TIM_EVENTSOURCE_UPDATE event (that is, the UG bit is set).
The following code shows how to software event generation works (the example is based on
an STM32F401RE MCU). TIM3 and TIM4 are two timers configured in master and slave mode
respectively. TIM4 is configured to work in ETR1 mode (that is, it is clocked by the master timer).
TIM3 is configured to trigger the TRGO output (which is internally connected to the ITR2 line) when
the UG bit of the TIM3->EGR register is set. Finally, we generate the UEV event manually every 200ms
from the main() routine.
int main(void) {
...
while (1) {
HAL_TIM_GenerateEvent(&htim3, TIM_EVENTSOURCE_UPDATE);
HAL_Delay(200);
}
...
}
void MX_TIM3_Init(void){
Timers 341
TIM_ClockConfigTypeDef sClockSourceConfig;
TIM_MasterConfigTypeDef sMasterConfig;
htim3.Instance = TIM3;
htim3.Init.Prescaler = 65535;
htim3.Init.CounterMode = TIM_COUNTERMODE_UP;
htim3.Init.Period = 120;
htim3.Init.ClockDivision = TIM_CLOCKDIVISION_DIV1;
HAL_TIM_Base_Init(&htim3);
sClockSourceConfig.ClockSource = TIM_CLOCKSOURCE_INTERNAL;
HAL_TIM_ConfigClockSource(&htim3, &sClockSourceConfig);
sMasterConfig.MasterOutputTrigger = TIM_TRGO_RESET;
sMasterConfig.MasterSlaveMode = TIM_MASTERSLAVEMODE_ENABLE;
HAL_TIMEx_MasterConfigSynchronization(&htim3, &sMasterConfig);
}
void MX_TIM4_Init(void) {
TIM_SlaveConfigTypeDef sSlaveConfig;
htim4.Instance = TIM4;
htim4.Init.Prescaler = 0;
htim4.Init.CounterMode = TIM_COUNTERMODE_UP;
htim4.Init.Period = 1;
htim4.Init.ClockDivision = TIM_CLOCKDIVISION_DIV1;
HAL_TIM_Base_Init(&htim4);
sSlaveConfig.SlaveMode = TIM_SLAVEMODE_EXTERNAL1;
sSlaveConfig.InputTrigger = TIM_TS_ITR2;
HAL_TIM_SlaveConfigSynchronization_IT(&htim4, &sSlaveConfig);
}
Figure 15: the three major counting modes of a general purpose timer
Figure 16: the structure of the input channel in a general purpose timer
The input capture mode offered by general purpose and advanced timers allows to compute the
frequency of external signals applied to each one of the 4 channels that these timers provide. And
the capture is performed independently for each channel.
Figure 17: the capture process of an external signal feeding one of the timer channels
The Figure 17 shows how the capture process works. TIMx is a timer, configured to work at a
given TIMx_CLK clock frequency²⁶. This means that it increments the TIMx_CNT register up to
the Period value every T IM x_CLK
1
seconds. Supposing that we apply a square wave signal to one
²⁶The timer clock frequency is independent from the way the timer works (in this case, input capture mode). As seen in the
previous paragraphs, the timer clock depends on the bus frequency or the external clock source and on the related prescaler settings.
Timers 344
of the timer channels, and supposing that we configure the timer to trigger at every rising edge
of the input signal, we have that the TIMx_CCRx²⁷ register will be updated with the content of
the TIMx_CNT register at every detected transition. When this happens, the timer will generate a
corresponding interrupt or a DMA request, allowing to keep track of the counter value.
To get the external signal period, two consecutive captures are needed. The period is calculated by
subtracting these two values, CN T0 (the value 4 in Figure 17) and CN T1 (the value 20 in Figure
17), and using the following formula:
Capture
P eriod = [4]
(T IM x_CLK)(P rescaler + 1)(CHP rescaler )(P olarityIndex )
where:
Capture = CN T1 − CN T0 if CN T0 < CN T1
Capture = (T IM x_P eriod − CN T0 ) + CN T1 if CN T0 > CN T1
CHP rescaler is a further prescaler that can be applied to the input channel and P olarityIndex is equal
to 1 if the channel is configured to trigger on rising or falling edge of the input signal, or it is equal
to 2 if both the edges are sampled.
Another relevant condition is that the UEV frequency should be lower than the sampled signal
frequency. The reason why this matters is evident: if the timer runs faster that the sampled signal,
then it will overflow (that is, it runs out the Period counter) before it can sample the signal edges
(see Figure 18). For this reason, it usually convenient to set the Period value to the maximum, and
increase the Prescaler factor to lower the counting frequency.
Figure 18: if the timer runs faster than the sample signal, then it overflow before the two rising edges are dected
To configure the input channels we use the function HAL_TIM_IC_ConfigChannel() and an instance
of the C struct TIM_IC_InitTypeDef, which is defined in the following way:
²⁷CCR is acronym for Capture Compare Register and the x is the channel number.
Timers 345
typedef struct {
uint32_t ICPolarity; /* Specifies the active edge of the input signal. */
uint32_t ICSelection; /* Specifies the input. */
uint32_t ICPrescaler; /* Specifies the Input Capture Prescaler. */
uint32_t ICFilter; /* Specifies the input capture filter. */
} TIM_IC_InitTypeDef;
• ICPolarity: specifies the polarity of the input signal, and it can assume a value from Table
16.
• ICSelection: specifies the used input of the timer. It can assume a value from Table 17. It is
possible to selectively remap input channels to different input sources, that is (IC1,IC2) are
mapped to (TI2,TI1) and (IC3,IC4) are mapped to (TI4,TI3). Usually this is used to differentiate
rising-edge from falling-edge captures for signals where the Ton is different from Tof f . It is
also possible to capture from the same internal channel, named TRC, connected to ITR0..ITR3
sources.
• ICPrescaler: configures the prescaler stage of a given input. It can assume a value from Table
18.
• ICFilter: this 4-bit field defines the frequency used to sample the external clock signal
connected to TIMx_CHx pin and the length of the digital filter applied to it. It is useful to
debounce the input signal. Refer to the datasheet of your MCU for more information.
Now it is the right time to see a practical example. We are going to rearrange the Example 2 of
this chapter so that we sample the switching frequency of PA5 pin (the one connected to LD2 LED)
through the Channel 1 of TIM3 timer (in an STM32F030 MCU this pin coincides with PA6 pin).
We so configure the Channel 1 as input capture pin, and we configure it in DMA mode so that it
triggers the TIM3_CH1 request to automatically fill a temporary buffer that stores the value of the
TIM3_CNT register when the rising edge of input signal is detected.
Before we analyze the main() function, it is best to give a look to the TIM3 initialization routines.
Filename: src/main-ex6.c
85 */
86 GPIO_InitStruct.Pin = GPIO_PIN_6;
87 GPIO_InitStruct.Mode = GPIO_MODE_AF_PP;
88 GPIO_InitStruct.Pull = GPIO_NOPULL;
89 GPIO_InitStruct.Speed = GPIO_SPEED_LOW;
90 GPIO_InitStruct.Alternate = GPIO_AF1_TIM3;
91 HAL_GPIO_Init(GPIOA, &GPIO_InitStruct);
92
93 /* Peripheral DMA init*/
94 hdma_tim3_ch1_trig.Instance = DMA1_Channel4;
95 hdma_tim3_ch1_trig.Init.Direction = DMA_PERIPH_TO_MEMORY;
96 hdma_tim3_ch1_trig.Init.PeriphInc = DMA_PINC_DISABLE;
97 hdma_tim3_ch1_trig.Init.MemInc = DMA_MINC_ENABLE;
98 hdma_tim3_ch1_trig.Init.PeriphDataAlignment = DMA_PDATAALIGN_HALFWORD;
99 hdma_tim3_ch1_trig.Init.MemDataAlignment = DMA_MDATAALIGN_HALFWORD;
100 hdma_tim3_ch1_trig.Init.Mode = DMA_NORMAL;
101 hdma_tim3_ch1_trig.Init.Priority = DMA_PRIORITY_LOW;
102 HAL_DMA_Init(&hdma_tim3_ch1_trig);
103
104 /* Several peripheral DMA handle pointers point to the same DMA handle.
105 Be aware that there is only one channel to perform all the requested DMAs. */
106 __HAL_LINKDMA(htim_ic, hdma[TIM_DMA_ID_CC1], hdma_tim3_ch1_trig);
107 }
108 }
The MX_TIM3_Init() configures the TIM3 timer so that it runs at a frequency equal to ∼732Hz. The
first channel is then configured to trigger the capture event (TIM3_CH1) at every rising edge of the
input signal. The HAL_TIM_IC_MspInit() then configures the hardware part (the PA6 pin connected
to the TIM3 Channel 1) and the DMA descriptor used to configure the TIM3_CH1 request.
Here we have two things to note. First of all, the DMA is configured so that both the
peripheral and memory data align are set to perform a 16-bit transfer, since the timer
counter register is 16-bit wide. In those MCU where TIM2 and TIM5 timers have a counter
register 32-bit wide, you need to setup the DMA to perform a word-aligned transfer.
Next, since we are using the HAL_TIM_IC_Init() at line 69, the HAL is designed to call
the function HAL_TIM_IC_MspInit() to perform low-level initializations, instead of the
HAL_TIM_Base_MspInit one.
Timers 348
Filename: src/main-ex6.c
The most relevant part of the application is the main() function. We first initialize TIM6 timer
(which is configured to run at 100KHz - this means that the PA5 pin is set HIGH every 20µs =
50KHz) using the MX_TIM6_Init() function and then we start it in DMA mode, as described so far
Timers 349
in this chapter. Then we start TIM3 and we enable the DMA mode on the first channel, by using the
HAL_TIM_IC_Start_DMA() function (line 40). The captures array is used to store the two consecutive
captures acquired on the channel.
Lines [42:53] are the part where we compute the frequency of the external signal. When the two
captures are performed, the global variable captureDone is set to 1 by the HAL_TIM_IC_Capture-
Callback() callback function (not shown here), which is invoked at the end of the capture process.
When this happens we compute the frequency of the sample signal using the equation [4].
Thanks to CubeMX, it is really easy to configure the input channels of a general purpose timer in
the input capture mode. To bound one channel to the corresponding input (that is, IC1 to TI1), you
have to select the Input capture direct mode for the desired channel, as shown in Figure 19.
Instead, to map the other channel of the couple (IC1,IC2) or (IC3,IC4) to the same input (that is
TI1 or TI2 for (IC1,IC2)), it is possible to enable the other channel in the couple in Input capture
indirect mode, as shown in Figure 20. Finally, from the TIMx configuration view (not shown here),
it is possible to configure the other input capture parameters (channel polarity, its filter, and so on).
• Output compare timing²⁹: the comparison between the output compare register (CCRx) and
the counter (CNT) has no effect on the output. This mode is used to generate a timing base.
• Output compare active: set the channel output to active level on match. The channel output is
forced high when the counter (CNT) matches the capture/compare register (CCRx).
• Output compare inactive: set channel to inactive level on match. The channel output is forced
low when the counter (CNT) matches the capture/compare register (CCRx).
• Output compare toggle: the channel output toggles when the counter (CNT) matches the
capture/compare register (CCRx).
• Output compare forced active/inactive: the channel output is forced high (active mode) or low
(inactive mode) independently from counter value.
Each channel of the timer is configured in output compare mode by using the function HAL_TIM_-
OC_ConfigChannel() and an instance of the C struct TIM_OC_InitTypeDef, which is defined in the
following way:
typedef struct {
uint32_t OCMode; /* Specifies the TIM mode. */
uint32_t Pulse; /* Specifies the pulse value to be loaded
into the Capture Compare Register. */
uint32_t OCPolarity; /* Specifies the output polarity. */
uint32_t OCNPolarity; /* Specifies the complementary output polarity.*/
uint32_t OCFastMode; /* Specifies the Fast mode state. */
uint32_t OCIdleState; /* Specifies the TIM Output Compare pin state during Idle state.*/
uint32_t OCNIdleState; /* Specifies the complementary TIM Output Compare pin
state during Idle state. */
} TIM_OC_InitTypeDef;
• OCMode: specifies the output compare mode and it can assume a value from Table 19.
• Pulse: the content of this field will be stored inside the CCRx register and it establishes when
to trigger the output.
²⁸The output compare modes are actually eight, but two of them are related to PWM output, and they will be analized in the
next paragraph.
²⁹This mode in CubeMX is called Frozen mode.
Timers 351
• OCPolarity: defines the output channel polarity when the CCRx registers matches with the
CNT one. It can assume a value from Table 20.
• OCNPolarity: defines the complimentary output polarity. It is a mode available only in
TIM1 and TIM8 advanced timers, which allow to generate, on additional dedicated channels,
complimentary signals (that is, when the CH1 is HIGH the CH1N is LOW and vice versa).
This feature is especially designed for motor control applications, and it is not described in
this book. It can assume a value from Table 21.
• OCFastMode: specifies the fast mode state. This parameter is valid only in PWM1 and PWM2
mode and it can assume the values TIM_OCFAST_DISABLE and TIM_OCFAST_ENABLE.
• OCIdleState: specifies the channel output compare pin state during the timer idle state. It
can assume the values TIM_OCIDLESTATE_SET and TIM_OCIDLESTATE_RESET. This parameter is
available only in TIM1 and TIM8 advanced timers.
• OCNIdleState: specifies the complementary channel output compare pin state during the
timer idle state. It can assume the values TIM_OCNIDLESTATE_SET and TIM_OCNIDLESTATE_-
RESET. This parameter is available only in TIM1 and TIM8 advanced timers.
When the CCRx registers matches with the timer CNT counter, and the channel is configured to
work in output compare mode, a specific interrupt is generated (if enabled). This allows to control
the switching frequency of each channel independently, and eventually perform phase shift between
channels. The channel frequency can be computed using the following formula:
T IM x_CLK
CHx_U pdate = [5]
CCRx
where:
T IM x_CLK is the running frequency of the timer and CCRx is the Pulse value of the TIM_-
OnePulse_InitTypeDef struct used to configure the channel. This means that we can compute the
Pulse value, given a channel frequency, in the following way:
T IM x_CLK
Pulse = [6]
CHx_U pdate
The following example shows how to generate two output square wave signals, one running at
50KHz and one at 100KHz. It uses the Channel 1 and 2 (bound to OC1 and OC2) of TIM3 timer and
it is designed to run on a Nucleo-F030R8.
Filename: src/main-ex7.c
31
32 /* TIM3 init function */
33 void MX_TIM3_Init(void) {
34 TIM_OC_InitTypeDef sConfigOC;
35
36 htim3.Instance = TIM3;
37 htim3.Init.Prescaler = 2;
38 htim3.Init.CounterMode = TIM_COUNTERMODE_UP;
39 htim3.Init.Period = 65535;
40 htim3.Init.ClockDivision = TIM_CLOCKDIVISION_DIV1;
41 HAL_TIM_OC_Init(&htim3);
42
43 CH1_FREQ = computePulse(&htim3, 50000);
44 CH2_FREQ = computePulse(&htim3, 100000);
45
46 sConfigOC.OCMode = TIM_OCMODE_TOGGLE;
47 sConfigOC.Pulse = CH1_FREQ;
48 sConfigOC.OCPolarity = TIM_OCPOLARITY_HIGH;
49 sConfigOC.OCFastMode = TIM_OCFAST_DISABLE;
50 HAL_TIM_OC_ConfigChannel(&htim3, &sConfigOC, TIM_CHANNEL_1);
51
52 sConfigOC.Pulse = CH2_FREQ;
53 HAL_TIM_OC_ConfigChannel(&htim3, &sConfigOC, TIM_CHANNEL_2);
54 }
Lines [48:59] configure Channel 1 and 2 to work as output compare channels. Both are configured
in toggle mode (that is, they invert the state of the GPIO every time the CCRx register matches
with the CNT timer register). The TIM3 is configured to run at 16MHz, and hence the function
computePulse(), which uses the equation [6], will return the values 320 and 160 to have a channel
switching frequency equal to 50KHz and 100KHz respectively. However, the above code is still not
sufficient to drive the GPIO at that frequency. Here we are configuring the channels so that they
will toggle their output every time the CNT counter is equal to 320 for Channel 1 and to 160 for
Channel 2. But this means that the switching frequency is equal to:
16.000.000
= 244Hz
65535
and we only have a shift of 10µs between the two channels, as shown by Figure 21.
Timers 354
To reach the desired switching frequency³⁰, we need to toggle the output every each 320 and 160
ticks of the TIM3 CNT register. To do so, we can define the following callback routine:
Filename: src/main-ex7.c
62 uint16_t pulse;
63
64 /* TIM2_CH1 toggling with frequency = 50KHz */
65 if(htim->Channel == HAL_TIM_ACTIVE_CHANNEL_1)
66 {
67 pulse = HAL_TIM_ReadCapturedValue(htim, TIM_CHANNEL_1);
68 /* Set the Capture Compare Register value */
69 __HAL_TIM_SET_COMPARE(htim, TIM_CHANNEL_1, (pulse + CH1_FREQ));
70 }
71
72 /* TIM2_CH2 toggling with frequency = 100KHz */
73 if(htim->Channel == HAL_TIM_ACTIVE_CHANNEL_2)
74 {
75 pulse = HAL_TIM_ReadCapturedValue(htim, TIM_CHANNEL_2);
76 /* Set the Capture Compare Register value */
77 __HAL_TIM_SET_COMPARE(htim, TIM_CHANNEL_2, (pulse + CH2_FREQ));
78 }
79 }
The same result may be obtained using the DMA mode and a pre-initialized vector, eventually stored
in the FLASH memory by using the const modifier:
The configuration process of the output compare mode in CubeMX is identical to the one for the
input capture mode. The first step is to select the Output compare CHx mode for the desired
channel, as shown in Figure 19. Next, from the TIMx configuration view (not shown here), it is
possible to configure the other output compare parameters (the output mode, channel polarity, and
so on).
TON
D= × 100% [8]
P eriod
where D is the duty cycle, TON is the time the signal is active. Thus, a 50% duty cycle means the
signal is on 50% of the time but off 50% of the time. The duty cycle says nothing about how long
it lasts. The “on time” for a 50% duty cycle could be a fraction of a second, a day, or even a week,
depending on the length of the period. The pulse width is the duration of the TON , given the actual
period. For example, assuming a period of 1s, a duty cycle of 20% generates a pulse width of 200ms.
Timers 356
Figure 23: three different duty cycles - 50%, 20% and 80%
The Figure 23 shows three different duty cycles: 50%, 20% and 80%.
Pulse-width modulation (PWM) is a technique used to generate several pulses with different duty
cycles in a given period of time or, if you prefer, at a given frequency. PWM has many applications
in digital electronics, but all of them can be grouped in two main categories:
Those two categories can be expanded in several practical usages of the PWM technique. Focusing
our attention on the control of the output voltage, we can find several applications:
• generation of an output voltage ranging from 0V up to VDD (that is, the maximum allowed
voltage for an I/O, which in an STM32 is 3.3V);
– dimming of LEDs;
– motor control;
– power conversion;
• generation of an output wave running at a given frequency (sine wave, triangle, square, and
so on);
• sound output;
³¹However, keep in mind that the PWM as modulation technique is not limited to digital electronics, but it originates in the
“analog era” when it was used to modulate an audio wave on a carrier frequency.
Timers 357
With adequate output filtering, which usually involves the usage of a low-pass filter, the PWM
can replicate the behaviour of a DAC, even if the MCU does not provide one. By varying the duty
cycle of the output pin it is possible to regulate the output voltage proportionally. An amplifier can
increase/decrease the voltage range at a need, and it is also possible to control high currents and
loads using power transistors.
A timer channel is configured in PWM mode by using the function HAL_TIM_PWM_ConfigChannel()
and an instance of the C struct TIM_OC_InitTypeDef seen in the previous paragraph. The TIM_-
OC_InitTypeDef.Pulse field defines the duty cycle, and it ranges from 0 up to the timer Period
field. The longer is the Period the wider is the tuning range. This means that we can fine-tune the
output voltage.
The choice of the period, which determines the frequency of the output signal together
with the timer clock (internal, external and so on), is not a detail to be left to chance. It
depends on the specific application field, and it can have a severe impact on the overall
EMI emissions. Moreover, some devices controlled with PWM tecnnique may emit audible
noise at given frequencies. This is the case of electric motors, which could emit unwanted
buzzing noise when controlled at frequencies in the hearing range. Another example, not
too much related here but with a similar genesis, is the noise emitted by power inductors
in switching power supplies, which use the concept underlying the PWM to regulate their
output voltage, and therefore the current. Sometimes, the output noise is unavoidable, and
it is required to use varnishing products to reduce the problem. Other times, the right
frequency come from “natural limitations”: dimming a LED at a frequency close to 100Hz
is usually sufficient to avoid visible flickering of the light.
There are two PWM modes avialable: PWM mode 1 and 2. Both of them are configurable through the
field TIM_OC_InitTypeDef.OCMode, using the values TIM_OCMODE_PWM1 and TIM_OCMODE_PWM2. Let us
see the differnces.
• PWM mode 1: in upcounting, the channel is active as long as Period < Pulse, else inactive.
In downcounting, the channel is inactive as long as Period > Pulse, else active.
• PWM mode 2: in upcounting, channel 1 is inactive as long as Period < Pulse, else active.
In downcounting, channel 1 is active as long as Period > Pule, else inactive.
The following example shows a typical application of the PWM technique: LED dimming. The
example is designed to run on a Nucleo-F401RE and it fades ON/OFF the LD2 LED³².
³²Unfortunately, not all Nucleo boards have the LD2 LED connected to a timer channel (this depends on the fact that the pinout
of LQFP-64 STM32 microcontrollers is not perfectly compatible). Only seven of them have this feature. Owners of other Nucleo
boards have to rearrange the example using an external LED.
Timers 358
Filename: src/main-ex8.c
11 int main(void) {
12 HAL_Init();
13
14 Nucleo_BSP_Init();
15 MX_TIM2_Init();
16
17 HAL_TIM_PWM_Start(&htim2, TIM_CHANNEL_1);
18
19 uint16_t dutyCycle = HAL_TIM_ReadCapturedValue(&htim2, TIM_CHANNEL_1);
20
21 while(1) {
22 while(dutyCycle < __HAL_TIM_GET_AUTORELOAD(&htim2)) {
23 __HAL_TIM_SET_COMPARE(&htim2, TIM_CHANNEL_1, ++dutyCycle);
24 HAL_Delay(1);
25 }
26
27 while(dutyCycle > 0) {
28 __HAL_TIM_SET_COMPARE(&htim2, TIM_CHANNEL_1, --dutyCycle);
29 HAL_Delay(1);
30 }
31 }
32 }
33
34 /* TIM3 init function */
35 void MX_TIM2_Init(void) {
36 TIM_OC_InitTypeDef sConfigOC;
37
38 htim2.Instance = TIM2;
39 htim2.Init.Prescaler = 499;
40 htim2.Init.CounterMode = TIM_COUNTERMODE_UP;
41 htim2.Init.Period = 999;
42 htim2.Init.ClockDivision = TIM_CLOCKDIVISION_DIV1;
43 HAL_TIM_PWM_Init(&htim2);
44
45 sConfigOC.OCMode = TIM_OCMODE_PWM1;
46 sConfigOC.Pulse = 0;
47 sConfigOC.OCPolarity = TIM_OCPOLARITY_HIGH;
48 sConfigOC.OCFastMode = TIM_OCFAST_DISABLE;
49 HAL_TIM_PWM_ConfigChannel(&htim2, &sConfigOC, TIM_CHANNEL_1);
50 }
Lines [45:49] configure the first channel of timer TIM2 to work in PWM Mode 1. The duty cycle will
be range from 0 up to 999, which corresponds to the Period value. This means that we can regulate
Timers 359
the output voltage with steps of ∼0,0033V if the output is well filtered (and the PCB has a good
layout). This is close to the performances of a 10bit DAC.
Lines [21:32] is where the fading effect takes place. The first loop increments the value of the Pulse
(which corresponds to the Capture Compare Register 1 (CCR1)) up to the Period value (which
corresponds to the Auto Reload Register (ARR)) every 1ms. This means that in less then 1s the
LED becomes full bright. The second loop, in the same way, decrements the Pulse field unless it
reaches zero.
An output square wave generated with the PWM technique can be filtered to generate a smoothed
signal, that is an analog signal that has a reduced peak-to-peak voltage (Vpp ). A Resistor-Capacitor
(RC) low-pass filter (see Figure 24) is able to cut-off all those AC signals having a frequency higher
than a given threshold. The general rule of thumb of RC low-pass filters is that the lower is the
cut-off frequency the lower is the Vpp ³⁴. An RC low-pass filter uses an important characteristic of
capacitors: the ability to block DC currents while allowing the passing of AC ones: given the R/C
time constant formed by the resistor-capacitor network, the filter will short to ground those AC
signal with a frequency higher than the RC constant, allowing to pass DC component of the signal
and lower frequency AC voltages.
Figure 24: a typical low pass filter implemented with a resistor and a capscitor
While this circuit is very simple, choosing the appropriate values for R (the resistance) and C (the
capacitance) encompass some design decisions: how much ripple we can tolerate and how fast the
filter needs to respond. These two parameters are mutually exclusive. In most filters, we would like to
have the perfect filter – one that passes all frequencies below the cut-off frequency, with no voltage
³³The maximum frequency of timers in an STM32F401RE MCU, when clocked from the APB1 bus, is 84MHz.
³⁴When dealing with filters to smooth an output wave it is more convenient to consider the effects on the output voltage than
the response in frequency of the filter. However, the math under the transfer function of a filter is outside the scope of this book.
If interested, this on-line calculator(http://bit.ly/22breq2) allows to evaluate the Vpp output given a VIN , the PWM frequency and
the R and C values.
Timers 360
ripple. Unfortunately this ideal filter does not exists: to reduce the ripple to zero we have to chose
a very large filter, which causes that it will take a lot of time to the output to become stable. While
this could be acceptable for a continuous and fixed voltage, this has sever impact on the quality of
the output signal if we are trying to generate a complex waveform from the PWM signal.
The cut-off frequency (fc ) of a first order RC low-pass filter is expressed by the formula:
1
fc = [9]
2πRC
Figure 25 shows the effect of a low-pass filter on a PWM signal with a frequency of 100Hz. Here we
have chosen a 1K resistor and a 10µF capacitor. This means that the cut-off frequency is equal to:
1
fc = ≈ 15.9Hz
2π103 × 10−5
Figure 25: the effect of a low-pass filter with cut-off frequency equal to 15.9Hz
Figure 26 shows the effect of the low-pass filter with a 4300K resistor and a 10µF capacitor. This
means that the cut-off frequency is equal to:
1
fc = ≈ 3.7Hz
2π(4.3 × 103 ) × 10−5
As you can see, the second filter allows to have a (Vpp ) equal to about 160mV, which is a voltage
difference passable for a lot of applications.
Timers 361
Figure 26: the effect of a low-pass filter with cut-off frequency equal to 3.7Hz
By varying the output voltage (which implies that we vary the duty cycle) we can generate an
arbitrary output waveform, whose frequency is a fraction of the PWM period. The basic idea here
is to divide the waveform we want, for example a sine wave, into ‘x’ number of divisions. For each
division we have a single PWM cycle. The TON time (that is, the duty cycle) directly corresponds to
the amplitude of the waveform in that division, which is calculated using sin() function.
Figure 27: how a sine wave can be approximated with multiple PWM signals
Consider the diagram shown in Figure 27. Here the sine wave has been divided in 10 steps. So
here we will require 10 different PWM pulses increasing/decreasing in sinusoidal manner. A PWM
pulse with 0% duty cycle will represent the min amplitude (0V), the one with 100% duty cycle will
represent max amplitude(3.3V). Since out PWM pulse has voltage swing between 0V to 3.3V, our
sine wave will swing between 0V to 3.3V too.
It takes 360 degrees for a sine wave to complete one cycle. Hence for 10 divisions we will need to
increase the angle in steps of 36 degrees. This is called the Angle Step Rate or Angle Resolution. We
can increase the number of divisions to get more accurate waveform. But as divisions increase we
also need to increase the resolution, which implies that we have to increase the frequency of the
timer used to generate the PWM signal (the faster runs the timer the smaller is the period).
Usually 200 divisions are a good approximation for an output wave. This means that if we want to
generate a 50Hz sine wave, we need to run the timer at a 50Hz*200 = 10KHz. The pulse period will be
Timers 362
equal to 200 (the number of steps - this means that we vary the output voltage by 3.3V/200=0.016V),
and so the Prescaler value will be (assuming an STM32F030 MCU running at 48MHz):
48M Hz
P rescaler = = 24
50Hz × 200divisions × 200Pulse
The following example shows how to generate a 50Hz pure sine wave in an STM32F030MCU running
at 48MHz.
Filename: src/main-ex9.c
14 #define PI 3.14159
15 #define ASR 1.8 //360 / 200 = 1.8
16
17 int main(void) {
18 uint16_t IV[200];
19 float angle;
20
21 HAL_Init();
22
23 Nucleo_BSP_Init();
24 MX_TIM3_Init();
25
26 for (uint8_t i = 0; i < 200; i++) {
27 angle = ASR*(float)i;
28 IV[i] = (uint16_t) rint(100 + 99*sinf(angle*(PI/180)));
29 }
30
31 HAL_TIM_PWM_Start_DMA(&htim3, TIM_CHANNEL_1, (uint32_t *)IV, 200);
32
33 while (1);
34 }
35
36 /* TIM3 init function */
37 void MX_TIM3_Init(void) {
38 TIM_OC_InitTypeDef sConfigOC;
39
40 htim3.Instance = TIM3;
41 htim3.Init.Prescaler = 23;
42 htim3.Init.CounterMode = TIM_COUNTERMODE_UP;
43 htim3.Init.Period = 199;
44 htim3.Init.ClockDivision = TIM_CLOCKDIVISION_DIV4;
45 HAL_TIM_PWM_Init(&htim3);
46
47 sConfigOC.OCMode = TIM_OCMODE_PWM1;
48 sConfigOC.Pulse = 0;
Timers 363
49 sConfigOC.OCPolarity = TIM_OCPOLARITY_HIGH;
50 sConfigOC.OCFastMode = TIM_OCFAST_DISABLE;
51 HAL_TIM_PWM_ConfigChannel(&htim3, &sConfigOC, TIM_CHANNEL_1);
52
53 hdma_tim3_ch1_trig.Instance = DMA1_Channel4;
54 hdma_tim3_ch1_trig.Init.Direction = DMA_MEMORY_TO_PERIPH;
55 hdma_tim3_ch1_trig.Init.PeriphInc = DMA_PINC_DISABLE;
56 hdma_tim3_ch1_trig.Init.MemInc = DMA_MINC_ENABLE;
57 hdma_tim3_ch1_trig.Init.PeriphDataAlignment = DMA_PDATAALIGN_HALFWORD;
58 hdma_tim3_ch1_trig.Init.MemDataAlignment = DMA_MDATAALIGN_HALFWORD;
59 hdma_tim3_ch1_trig.Init.Mode = DMA_CIRCULAR;
60 hdma_tim3_ch1_trig.Init.Priority = DMA_PRIORITY_LOW;
61 HAL_DMA_Init(&hdma_tim3_ch1_trig);
62
63 /* Several peripheral DMA handle pointers point to the same DMA handle.
64 Be aware that there is only one channel to perform all the requested DMAs. */
65 __HAL_LINKDMA(&htim3, hdma[TIM_DMA_ID_CC1], hdma_tim3_ch1_trig);
66 __HAL_LINKDMA(&htim3, hdma[TIM_DMA_ID_TRIGGER], hdma_tim3_ch1_trig);
67 }
The most relevant part is represented by lines [26:29]. That lines of code are used to generate the
Initialization Vector (IV), that is the vector containing the Pulse values used to generate the sine
wave (which corresponds to the output voltage levels). The C sinf() returns the sine of the given
angle expressed in radians. So we need to convert the angular expresses in degrees to radians using
the formula:
π
Radians = × Degrees
180°
However, in our case we have divided the sine wave cycle in 200 steps (that is, we have divided the
circumference in 200 steps), so we need to compute the value in radians of each step. But since sine
gives negative values for angle between 180° and 360° (see Figure 28) we need to scale it, since PWM
output values cannot be negative.
Timers 364
Figure 28: the values assumed by sine function between 180° and 360°
Once the IV vector is generated, we can start PWM in DMA mode. The DMA1_Channel4 is
configured to work in circular mode, so that it automatically sets the value of the TIMx_CCRx
register according the Pulse values contained in IV. Using a timer in DMA mode is the best way to
generate arbitrary function without introducing latency and affecting the Cortex-M core. However,
often IVs are hardcoded inside the program, using const arrays automatically stored in the FLASH
memory. You can find several on-line tools to do this, like the one provided here³⁵.
Figure 29: how timers allow to approximate a 50Hz sine wave using PWM
Figure 29 shows the output from TIM3 Channel 1: as you can see, using an adequate filtering stage³⁶,
it is really easy to generate a pure 50Hz sine wave.
The configuration process of the PWM mode in CubeMX is straightforward, once the fundamental
concepts of PWM generation have been mastered. The first step is to select the PWM Generation
CHx mode for the desired channel, as shown in Figure 19. Next, from the TIMx configuration view
(not shown here), it is possible to configure the other PWM settings (PWM mode 1 or 2, channel
polarity, and so on).
³⁵http://bit.ly/1QPfm4k
³⁶Here, I have used a 100ohm resistor an a 10µF capacitor, which give a cut-off frequency of ∼159Hz and a Vpp equal to 0.08V.
Timers 365
Both the channel are configured with an instance of the C struct TIM_OnePulse_InitTypeDef,
which is defined in the following way:
typedef struct {
uint32_t Pulse; /* Specifies the pulse value to be loaded into the CCRx register.*/
/* Output channel configuration */
uint32_t OCMode; /* Specifies the TIM mode. */
uint32_t OCPolarity; /* Specifies the output polarity. */
uint32_t OCNPolarity; /* Specifies the complementary output polarity. */
uint32_t OCIdleState; /* Specifies the TIM Output Compare pin state during Idle state.*/
uint32_t OCNIdleState; /* Specifies the TIM Output Compare pin state during Idle state.*/
/* Input channel configuration */
uint32_t ICPolarity; /* Specifies the active edge of the input signal. */
uint32_t ICSelection; /* Specifies the input. */
uint32_t ICFilter; /* Specifies the input capture filter. */
} TIM_OnePulse_InitTypeDef;
The struct is logically divided in two parts: one related to the configuration of the input channel,
and one to the output. We will not go into the details of the struct fields, because they are similar
to what seen so far when we have talked about input capture and output compare modes.
An important aspect to understand is the way the timer computes delay and pulse durations. The
delay is computed according the following formula:
Pulse
Delay = T IM x_CLK
[10]
( Prescaler+1
)
while the duration (that is, the duty cycle) of the pulse is computed with this one:
Period - Pulse
Duration = [11]
( TPrescaler+1
IM x_CLK
)
Timers 366
This means that, once the input channel detects the trigger event, the timer starts counting and when
the CNT register reaches the CCRx register (Pulse) it generates the output signal, which lasts until
the CNT register reaches the ARR register (Period), that is Period - Pulse.
The OPM can be set as single shoot or in repetitive mode. This is performed by using the
which accepts the pointer to the timer handler and the symbolic constant TIM_OPMODE_SINGLE to
configure OPM in single shoot or TIM_OPMODE_REPETITIVE to enable repetitive mode.
The following example shows how to configure TIM3 in OPM mode in an STM32F030 MCU.
Filename: src/main-ex10.c
12 int main(void) {
13 HAL_Init();
14
15 Nucleo_BSP_Init();
16 MX_TIM3_Init();
17
18 HAL_TIM_OnePulse_Start(&htim3, TIM_CHANNEL_1);
19
20 while (1);
21 }
22
23 /* TIM3 init function */
24 void MX_TIM3_Init(void) {
25 TIM_OnePulse_InitTypeDef sConfig;
26
27 htim3.Instance = TIM3;
28 htim3.Init.Prescaler = 47;
29 htim3.Init.CounterMode = TIM_COUNTERMODE_UP;
30 htim3.Init.Period = 65535;
31 HAL_TIM_OnePulse_Init(&htim3, TIM_OPMODE_SINGLE);
32
33 /* Configure the Channel 1 */
34 sConfig.OCMode = TIM_OCMODE_PWM1;
35 sConfig.OCPolarity = TIM_OCPOLARITY_LOW;
36 sConfig.Pulse = 19999;
37
38 /* Configure the Channel 2 */
39 sConfig.ICPolarity = TIM_ICPOLARITY_RISING;
40 sConfig.ICSelection = TIM_ICSELECTION_DIRECTTI;
41 sConfig.ICFilter = 0;
42
Timers 367
Lines [34:36] configure the output channel in PWM Mode 1, while lines [39:41] configure the input
channel. The HAL_TIM_OnePulse_ConfigChannel(), at line 43, configures the two channels, setting
the Channel 1 as the output and the Channel 2 as the input. Finally the HAL_TIM_OnePulse_Start()
(called at line 18) starts the timer in OPM mode. By biasing the PA7 pin in a Nucleo-F030R8, the
timer will start after a delay of 20ms, and it will generate a PWM of about 45ms, as shown in Figure
30.
The output channel of a timer running in One Pulse can be configured even in other modes different
than the PWM one.
To enable the OPM mode using CubeMX, the first step is to configure the two Channel 1 and 2
independently, and then to select the One Pulse Mode checkbox, as shown in Figure 31. Next, from
the TIMx configuration view (not shown here), it is possible to configure the other channels settings.
It is important to remark that, at the time of writing this chapter, the code generated by
CubeMX is not that good. The code does not use the HAL_TIM_OnePulse_ConfigChannel(),
and each channel is configured as they would be used independently. This leads to a more
verbose and confusing code. However, it could be that when you read this chapter, ST has
already fixed this part.
Figure 32: the square waves emitted by a quadrature encoder on A and B channels
They employ two outputs called A and B, which are called quadrature outputs, as they are 90 degrees
out of phase, as shown in Figure 32. The direction of the motor depends if phase A leads phase B,
or phase B leads phase A. An optional third channel, index pulse, occurs once per revolution and
it is used as a reference to measure an absolute position. There are several ways to detect direction
and position of a rotary encoder. By connecting the A and B pins to two MCU I/O it is possible to
detect when the signal goes HIGH and LOW. This can be performed both manually (using interrupts
to capture when the channel changes status) or by using a timer: its channels can be configured in
input capture mode and the capture values are compared to compute the direction and speed of the
encoder.
STM32 general purpose timers provide a convenient way to read rotary encoders: this mode is indeed
called encoder mode and it simplifies a lot the capture process. When a timer is configured in encoder
mode, the timer counter register (TIMx_CNT) is incremented/decremented on the edge of input
channels.
Timers 369
Figure 33: how encoder speed and direction are computed by a timer in encoder mode
There are two capturing modes available: X2 and X4. In X2 mode the CNT register is increment-
ed/decremented on every edge of only one channel (either T1 or T2). In X4 mode the CNT register
is updated on every edge of both the channels: this doubles the capture frequency. The direction
of the movement is automatically derived and made available to the programmer in the TIMx_DIR
register, as shown in Figure 33. By comparing the value of the counter register on a regular basis, it is
possible to derive the number of RPM, given the number of pulses the encoder emits per revolution.
Incremental mechanical encoders usually need to be debounced, due to noisy output. A comparator
is usually used as filtering stage of these devices, especially if they are used to interface motors and
other noisy devices. Under certain conditions, the input filter stage of an STM32 timer can be used
to filter the A and B channels, reducing the number of BOM components.
The encoder mode is available only on TI1 and TI2 channels, and it is activated by using the function
HAL_TIM_Encoder_Init() and an instance of the C struct TIM_Encoder_InitTypeDef, which is
defined in the following way.
Timers 370
typedef struct {
/* T1 channel */
uint32_t EncoderMode; /* Specifies the active edge of the input signal. */
uint32_t IC1Polarity; /* Specifies the active edge of the input signal. */
uint32_t IC1Selection; /* Specifies the input. */
uint32_t IC1Prescaler; /* Specifies the Input capture prescaler. */
uint32_t IC1Filter; /* Specifies the input capture filter. */
/* T2 channel */
uint32_t IC2Polarity; /* Specifies the active edge of the input signal. */
uint32_t IC2Selection; /* Specifies the input. */
uint32_t IC2Prescaler; /* Specifies the Input capture prescaler. */
uint32_t IC2Filter; /* Specifies the input capture filter. */
} TIM_Encoder_InitTypeDef;
We have encountered the majority of the TIM_Encoder_InitTypeDef fields in the previous para-
graphs. The only remarkable one is the EncoderMode, which can assume the values TIM_ENCODER-
MODE_TI1 or TIM_ENCODERMODE_TI2 to set the X2 encoder mode on one of the two channels, and the
value TIM_ENCODERMODE_TI12 to set the X4 mode so that the TIMx_CNT register is updated on every
edge of TI1 and TI2 channels.
The following example, designed to run on a Nucleo-F030R8, simulates an incremental encoder by
using the TIM1 in output compare mode. TIM1 OC1 and OC2 (PA8, PA9) channels are routed to
TIM3 TI1 and TI2 channels (PA6, PA7) using the morpho connector, and they are configured so that
they generate two square wave signals having the same period but shifted in phase. The TIM3 is
then configured in encoder mode. The SysTick timer is used to generate the timebase: every 1s, the
number of pulses is computed, together with the encoder direction. The number of RPMs is then
derived, assuming an encoder that generates 4 pulses for every revolution. Finally, by pressing the
USER button it is possible to change the phase shift between phase A and B: this will invert the
encoder revolution.
Filename: src/main-ex11.c
22 #define PULSES_PER_REVOLUTION 4
23
24 int main(void) {
25 HAL_Init();
26
27 Nucleo_BSP_Init();
28 MX_TIM1_Init();
29 MX_TIM3_Init();
30
31 HAL_TIM_Encoder_Start(&htim3, TIM_CHANNEL_ALL);
32 HAL_TIM_OC_Start(&htim1, TIM_CHANNEL_1);
33 HAL_TIM_OC_Start(&htim1, TIM_CHANNEL_2);
34
Timers 371
35 cnt1 = __HAL_TIM_GET_COUNTER(&htim3);
36 tick = HAL_GetTick();
37
38 while (1) {
39 if (HAL_GetTick() - tick > 1000L) {
40 cnt2 = __HAL_TIM_GET_COUNTER(&htim3);
41 if (__HAL_TIM_IS_TIM_COUNTING_DOWN(&htim3)) {
42 if (cnt2 < cnt1) /* Check for counter underflow */
43 diff = cnt1 - cnt2;
44 else
45 diff = (65535 - cnt2) + cnt1;
46 } else {
47 if (cnt2 > cnt1) /* Check for counter overflow */
48 diff = cnt2 - cnt1;
49 else
50 diff = (65535 - cnt1) + cnt2;
51 }
52
53 sprintf(msg, "Difference: %d\r\n", diff);
54 HAL_UART_Transmit(&huart2, (uint8_t*) msg, strlen(msg), HAL_MAX_DELAY);
55
56 speed = ((diff / PULSES_PER_REVOLUTION) / 60);
57
58 /* If the first three bits of SMCR register are set to 0x3
59 * then the timer is set in X4 mode (TIM_ENCODERMODE_TI12)
60 * and we need to divide the pulses counter by two, because
61 * they include the pulses for both the channels */
62 if ((TIM3->SMCR & 0x3) == 0x3)
63 speed /= 2;
64
65 sprintf(msg, "Speed: %d RPM\r\n", speed);
66 HAL_UART_Transmit(&huart2, (uint8_t*) msg, strlen(msg), HAL_MAX_DELAY);
67
68 dir = __HAL_TIM_IS_TIM_COUNTING_DOWN(&htim3);
69 sprintf(msg, "Direction: %d\r\n", dir);
70 HAL_UART_Transmit(&huart2, (uint8_t*) msg, strlen(msg), HAL_MAX_DELAY);
71
72 tick = HAL_GetTick();
73 cnt1 = __HAL_TIM_GET_COUNTER(&htim3);
74 }
75
76 if (HAL_GPIO_ReadPin(GPIOC, GPIO_PIN_13) == GPIO_PIN_RESET) {
77 /* Invert rotation by swapping CH1 and CH2 CCR value */
78 tim1_ch1_pulse = __HAL_TIM_GET_COMPARE(&htim1, TIM_CHANNEL_1);
79 tim1_ch2_pulse = __HAL_TIM_GET_COMPARE(&htim1, TIM_CHANNEL_2);
Timers 372
80
81 __HAL_TIM_SET_COMPARE(&htim1, TIM_CHANNEL_1, tim1_ch2_pulse);
82 __HAL_TIM_SET_COMPARE(&htim1, TIM_CHANNEL_2, tim1_ch1_pulse);
83 }
84 }
85 }
86
87 /* TIM1 init function */
88 void MX_TIM1_Init(void) {
89 TIM_OC_InitTypeDef sConfigOC;
90
91 htim1.Instance = TIM1;
92 htim1.Init.Prescaler = 9;
93 htim1.Init.CounterMode = TIM_COUNTERMODE_UP;
94 htim1.Init.Period = 999;
95 HAL_TIM_Base_Init(&htim1);
96
97 sConfigOC.OCMode = TIM_OCMODE_TOGGLE;
98 sConfigOC.Pulse = 499;
99 sConfigOC.OCPolarity = TIM_OCPOLARITY_HIGH;
100 sConfigOC.OCFastMode = TIM_OCFAST_DISABLE;
101 sConfigOC.OCIdleState = TIM_OCIDLESTATE_RESET;
102 sConfigOC.OCNPolarity = TIM_OCNPOLARITY_HIGH;
103 sConfigOC.OCNIdleState = TIM_OCNIDLESTATE_RESET;
104 HAL_TIM_OC_ConfigChannel(&htim1, &sConfigOC, TIM_CHANNEL_1);
105
106 sConfigOC.Pulse = 999; /* Phase B is shifted by 90° */
107 HAL_TIM_OC_ConfigChannel(&htim1, &sConfigOC, TIM_CHANNEL_2);
108 }
109
110 /* TIM3 init function */
111 void MX_TIM3_Init(void) {
112 TIM_Encoder_InitTypeDef sEncoderConfig;
113
114 htim3.Instance = TIM3;
115 htim3.Init.Prescaler = 0;
116 htim3.Init.CounterMode = TIM_COUNTERMODE_UP;
117 htim3.Init.Period = 65535;
118
119 sEncoderConfig.EncoderMode = TIM_ENCODERMODE_TI12;
120
121 sEncoderConfig.IC1Polarity = TIM_ICPOLARITY_RISING;
122 sEncoderConfig.IC1Selection = TIM_ICSELECTION_DIRECTTI;
123 sEncoderConfig.IC1Prescaler = TIM_ICPSC_DIV1;
124 sEncoderConfig.IC1Filter = 0;
Timers 373
125
126 sEncoderConfig.IC2Polarity = TIM_ICPOLARITY_RISING;
127 sEncoderConfig.IC2Selection = TIM_ICSELECTION_DIRECTTI;
128 sEncoderConfig.IC2Prescaler = TIM_ICPSC_DIV1;
129 sEncoderConfig.IC2Filter = 0;
130
131 HAL_TIM_Encoder_Init(&htim3, &sEncoderConfig);
132 }
Function MX_TIM1_Init() configures the TIM1 timer so that its OC1 and OC2 channels work in
output compare mode, triggering their output every ∼20μs. The two outputs are shifted in phase by
setting two different Pulse values (lines 84 and 92). The MX_TIM3_Init() function configures the
TIM3 in encoder X4 mode (TIM_ENCODERMODE_TI12).
The main() function is designed so that every 1000 ticks of the SysTimer (which is configured to
generate a tick every 1ms) the current content of the counter register (cnt2) is compared with a
saved value (cnt1): according the encoder direction (up or down), the difference is computed, and
the speed is calculated. The code needs also to detect an eventual overflow/underflow of the counter,
and compute the difference accordingly. Take also note that, since we are performing a comparison
every one second, TIM1 must be configured so that the sum of pulses generated by channels A and B
should be less than 65535 per second. For this reason, we slow down TIM1 setting a Prescaler equal
to 9. Finally, lines [76:83] invert the phase shift between A and B (that is, OC1 and OC2 channels of
TIM1 timer) when the Nucleo user button is pressed.
To enable the encoder mode using CubeMX, the first step is to enable this mode from the Combined
Channels combo box, as shown in Figure 34. Next, from the TIMx configuration view (not shown
here), it is possible to configure the other channels settings.
In a brushed DC motor, brushes control the commutation by physically connecting the coils at the
correct moment. In Brush-Less DC (BLDC) motors the commutation is controlled by electronics,
using PWM. The electronics can either have position sensor inputs, which provide information about
when to commutate, or use the Back Electromotive Force (BEF ) generated in the coils. Position
sensors are most often used in applications where the starting torque varies greatly or where a high
initial torque is required. Position sensors are also often used in applications where the motor is used
for positioning.
Hall-effect sensors, or simply Hall sensors, are mainly used to compute the position of three-phases
BLDC motors (one sensor for each phase). STM32 general purpose timers can be programmed to
work in Hall sensor mode. By setting the first three input in XOR mode, it is possible to automatically
detect the position of the rotor.
This is done using the advanced-control timers (TIM1) to generate PWM signals to drive the motor
and another timer (e.g. TIM3) referred to as “interfacing timer”. This interfacing timer captures the
three timer input pins (CC1, CC2, CC3) connected through a XOR to the TI1 input channel (see
Figure 16). TIM3 is in slave mode, configured in reset mode; the slave input is TI1F_ED³⁷. Thus,
each time one of the 3 inputs toggles, the counter restarts counting from 0. This creates a time base
triggered by any change on the Hall inputs.
On the “interfacing timer” (TIM3), capture/compare channel 1 is configured in capture mode, capture
signal is TRC (See Figure 16 - TRC is highlighted in red). The captured value, which corresponds to
the time elapsed between 2 changes on the inputs, gives information about motor speed.
The “interfacing timer” can be used in output mode to generate a pulse which changes the
configuration of the channels of the advanced-control timer (TIM1) (by triggering a COM event).
The TIM1 timer is used to generate PWM signals to drive the motor. To do this, the interfacing
timer channel must be programmed so that a positive pulse is generated after a programmed delay
(in output compare or PWM mode). This pulse is sent to the advanced timer (TIM1) through the
TRGO output.
³⁷ED is acronyms for Edge Detector and it is an internal filtered timer input enabled when only one of the three inputs in XOR
is HIGH.
Timers 375
The ST32F3 family is the one dedicated to advanced power conversion and motor control. Some
STM32F3 MCUs, notably STM32F30x and STM32F3x8, provide the ability to generate one to three
center-aligned PWM signals with a single programmable signal ANDed in the middle of the pulses.
Moreover, they can generate up to three complementary outputs with insertion of dead time. These
features, in addition to the Hall sensor mode seen before, allow to build electronic devices suitable
for the motor control. For more information about this, refer to the AN4013³⁸ from ST.
The break input is an emergency input in the motor control application. The break function protects
power switches driven by PWM signals generated with the advanced timers. The break input is
usually connected to fault outputs of power stages and 3-phase inverters. When activated, the break
circuitry shuts down the TIM outputs and forces them to a predefined safe state.
Moreover, advanced timers offer a gradual protection of their registers, programming the LOCK bits
in the BDTR register. There are three locking levels available, which selectively lock up to all timer
register. For more information refer to the reference manual for your MCU.
We have left uncommented one thing from Figure 16. The ARR register is graphically represented
with a shadow. This happens because it is preloaded, that is writing to or reading from the ARR
register accesses the preload register. The content of the preload register is transferred to the shadow
register (that is, the register internal to the timer that effectively contains the counter value to match)
permanently or at each UEV event if and only if the auto-reload preload bit (APRE) is enabled in
the TIMx->CR1 register. If so, a UEV event can be generated setting the corresponding bit in the
TIMx->EGR register: this will cause that the content of the preload register is transferred in the
shadow one and the new value will be taken in account by the timer. Obviously, if you stop the
timer, you can change the content of the ARR register freely.
This is an important aspect to clarify. When a timer is stopped, we can configure the ARR register
using the TIM_Base_InitTypeDef.Period structure: the content of the Period field is transferred
in the TIMx->ARR register by the HAL_TIM_Base_Init() function. This will cause that the UEV
event is generated and, if enabled, the corresponding IRQ will be raised. It is important to remark
that this happens even when the timer is configured for the first time since the peripheral was reset.
Let us consider this code:
³⁸http://bit.ly/1WAewd6
Timers 376
htim6.Instance = TIM6;
htim6.Init.Prescaler = 47999; //48MHz/48000 = 1KHz
htim6.Init.Period = 4999; //1KHz / 5000 = 5s
htim6.Init.CounterMode = TIM_COUNTERMODE_UP;
__TIM6_CLK_ENABLE();
HAL_NVIC_SetPriority(TIM6_IRQn, 0, 0);
HAL_NVIC_EnableIRQ(TIM6_IRQn);
HAL_TIM_Base_Init(&htim6);
HAL_TIM_Base_Start_IT(&htim6);
The above code configure the TIM6 timer so that it expires after 5 seconds. However, if you rearrange
that code in a complete example, you can see that the IRQ fires almost immediately after the
HAL_TIM_Base_Start_IT() function is called. This is due to the fact that the HAL_TIM_Base_Init()
routine generates an UEV events to transfer the content of the TIM6->ARR register inside the internal
shadow register. This causes that the UIF flag is set and the IRQ fires when the HAL_TIM_Base_-
Start_IT() enables it.
We can bypass this behaviour by setting the URS bit inside the TIMx->CR1 register: this will cause
that the UEV event is generated only when the counter reaches the overflow/underflow.
It is possible to configure the timer so that the ARR register is buffered, by setting the TIM_CR1_ARPE
bit in the TIMx->CR1 control register. This will cause that the content of the shadow register is
updated automatically. Unfortunately, the HAL does not seem to offer an explicit macro to do that,
and we need to access to the timer register at low-level:
Preloading is especially useful when we use a timer in output compare mode with multiple output
channels enabled and each one with its own capture value, and we have to be sure that any change
to the CCRx register takes place at the same time. This is especially true if we use a timer for motor
control or power conversion. Enabling the preload feature guarantees us that the new setting from
the CCRx register will take place on the next overflow/underflow of timer counter.
working mode of a timer. Additionally, the outputs of the timers having complementary outputs are
disabled and forced to an inactive state. This feature is extremely useful for applications where the
timers are controlling power switches or electrical motors. It prevents the power stages from being
damaged by excessive current, or the motors from being left in an uncontrolled state when hitting
a breakpoint.
The macro __HAL_DBGMCU_UNFREEZE_TIMx() restores the default behaviour (that is, the timer does
not stop during a breakpoint).
To configure the SysTick timer so that it generates an update event every 1ms, and assuming that it
is clocked at the same speed of the AHB bus, it is sufficient to invoke the HAL_SYSTICK_Config() in
the following way:
HAL_SYSTICK_Config(HAL_RCC_GetHCLKFreq()/1000);
The HAL_SYSTICK_Config() routine is also responsible of enabling the timer and its SysTick_IRQn
exception³⁹. The priority of the exception can be configured at compile time setting the TICK_INT_-
PRIORITY symbolic constant in the include/stm32XXxx_hal_conf.h file, or by calling the HAL_-
NVIC_SetPriority() on the SysTick_IRQn exception, as seen in Chapter 7.
When the SysTick timer reaches zero, the SysTick_IRQn exception is raised, and the corresponding
handler is called. CubeMX already provides for us the right function body, which is defined in the
following way:
³⁹Remember that the SysTick_IRQn is an exception and not an interrupt, even if it is common to refer to it as interrupt. This
means that we cannot use the HAL_NVIC_EnableIRQ() function to enable it.
Timers 378
void SysTick_Handler(void) {
HAL_IncTick();
HAL_SYSTICK_IRQHandler();
}
The HAL_IncTick() automatically increments the global SysTick counter, while the HAL_SYSTICK_-
IRQHandler() contains nothing more than a call to the HAL_SYSTICK_Callback() routine, which is
a callback that we can optionally implement to be notified when the timer underflows.
Read carefully
Avoid to use slow code inside the HAL_SYSTICK_Callback() routine, otherwise the time-
base generation could be affected. This may lead to unpredictable behaviour of some HAL
modules, which rely on the exact 1ms timebase generation.
Moreover, care must be taken when using HAL_Delay(). This function provides accurate
delay (in milliseconds) based on SysTick counter. This implies that if HAL_Delay() is
called from a peripheral ISR process, then the SysTick interrupt must have higher priority
(numerically lower) than the peripheral interrupt. Otherwise the caller ISR process will be
blocked (because the global tick counter is never incremented).
To suspend the system timebase generation, it is possible to use HAL_SuspendTick() routine, while
to resume it the HAL_ResumeTick() one.
void delay1US() {
#define CLOCK_CYCLES_PER_INSTRUCTION X
#define CLOCK_FREQ Y //IN MHZ (e.g. 16 for 16 MHZ)
while (cycleCount--);
}
Timers 380
But how to establish how many clock cycles are required to compute one step of the while(cycleCount-
-) instruction? Unfortunately, it is not simple to give an answer. Let us assume that cycleCount is
equal to 1. Doing some tests (I will explain later how I have done them), with compiler optimizations
disabled (option -O0 to GCC), we can see that in this case the whole C instruction requires 24 cycles
to execute. How is it possible that? You have to figure out that our C statement is unrolled in several
assembly instructions, as we can see if we disassemble the firmware binary file:
...
while(counter--);
800183e: f89d 3003 ldrb.w r3, [sp, #3]
8001842: b2db uxtb r3, r3
8001844: 1e5a subs r2, r3, #1
8001846: b2d2 uxtb r2, r2
8001848: f88d 2003 strb.w r2, [sp, #3]
800184c: 2b00 cmp r3, #0
800184e: d1f6 bne.n 800183e <delay1US+0x3e>
Moreover, another source of latency is related to the fetch of instructions from internal MCU flash
(which differs a lot from “low-cost” STM32 MCUs and more powerful ones, like the STM32F4 and
STM32F7 with the ART accelerator, which is designed to zero the FLASH access latency). So that
instruction has a “basic cost” of 24 cycles. How many cycles are required if cycleCount is equal to
2? In this case the MCU requires 33 cycles, that is 9 additional cycles. This means that if we want to
spin for 84 cycles, cycleCount has to be equal to (84-24)/9, which is about 7. So, we can write our
delay function in a more general way:
while(1) {
delayUS(1);
GPIOA->ODR = 0x0;
delayUS(1);
GPIOA->ODR = 0x20;
}
we can check, using an oscilloscope attached to PA5 pin, that we obtain the delay we are looking
for:
Timers 381
Is this way to delay 1µs consistent? Unfortunately, the answer is no. First of all, it works well only
when this specific MCU (STM32F401RE) works at full speed (84MHz). If we decide to use a different
clock speed, we need to rearrange it doing tests. Second, it is subject to compiler optimizations,
as we are going to see soon, and to CPU internal caches on D-Bus and I-Bus available in some
STM32 microcontrollers (these caches can be eventually disabled by setting the (PREFETCH_ENABLE,
INSTRUCTION_CACHE_ENABLE, DATA_CACHE_ENABLE in the include/stm32XXxx_hal_conf.h file).
Let us enable GCC optimizations for “size” (-Os). What results do we obtain? In this case we have
that the delayUS() function costs only 72 CPU cycles, that is ∼850ns. The oscilloscope confirms
this:
And what happens if we enable the maximum optimization for speed (-O3)? In this case we have
only 64 CPU cycles, that is our delayUS() lasts only ∼750ns. However, this issue can be addressed
using specific GCC pragma directives:
However, if we want use a lower CPU frequency or we want to port our code to a different STM32
MCU, we still need to redo tests again and derive the number of cycles empirically.
Timers 382
However, take in account that the lower the CPU frequency is the more difficult is to delay
for 1µs precisely, because the number of cycles are fixed for a given instruction, but there
is less amount of cycles in the same unit of time.
So, how can we obtain a precise 1µs delay without doing tests if we change hardware setup?
One answer may be represented by setting a timer that overflows every 1µs (just setting its Period
to the peripheral bus speed in MHz - for example, for an STM32F401RE we need to set the Period
to (84 - 1)), and we may increment a global variable that keeps track of elapsed microseconds. This
is the same way SysTick timer is used for the timebase generation of the HAL.
However, this approach is impractical, especially for low-speed STM32 MCUs. Generating an
interrupt every 1µs (which in an STM32F0 MCU running at full speed would mean every 48 CPU
cycles) would congest the MCU, reducing the overall multiprogramming degree. Moreover, the
interrupt management has a non-negligible cost (from 12 up to 16 cycles), which would affect the
1µs timebase generation.
In the same way, polling the timer for the value of its counter is also impractical: a lot of time
would be spent checking the counter against a starting value, and the handling of the timer
overflow/underflow would impact on the timebase generation.
A more robust solution comes from the previous tests. How I have measured CPU cycles? Cortex-
M3/4/7 processors can have an optional debug unit, named Data Watchpoint and Tracing (DWT),
that provides watchpoints, data tracing, and system profiling for the processor. One register of this
unit is CYCCNT, which counts the number of cycles performed by CPU. So, we can use this special
unit available to count the number of cycles performed by the MCU during instruction execution.
uint32_t cycles = 0;
Using DWT we can build a more generic delayUS() routine in this way:
Timers 383
How much precise this function is? If you are interested to the best resolution at 1µs, this function
will not help you, as shown by the scope.
The best performance is achieved when the higher compiler optimization level is set. As you can
see, for a wanted delay of 1µs, the function gives about 1.22µs delay (22% slower). However, if we
need to spin for 10µs, we obtain a real delay of 10.5µs (5% slower), which is more close to what we
want.
branching. However, the advantage of this function is that it automatically detects CPU speed, and
it works out of the box especially if we are working on faster processors.
If you want full control over compiler optimizations, the best 1µs delay can be reached using this
macro fully written in assembler:
#define delayUS_ASM(us) do { \
asm volatile ("MOV R0,%[loops]\n \
1: \n \
SUB R0, #1\n \
CMP R0, #0\n \
BNE 1b \t" \
: : [loops] "r" (16*us) : "memory" \
); \
} while(0)
This is the most optimized way to write the while(counter--) function. Doing tests with the scope, I
found that 1µs delay can be obtained when the MCU execute this loop 16 times at 84MHZ. However,
this macro has to be rearranged if you processor speed is lower, and keep in mind that being a macro,
it is “expanded” every time you use it, causing the increase of firmware size.
III Advanced topics
385
12. Power management
Energy efficiency is one of the trend topics in the electronics industry. Even if you are not designing a
battery-powered device, probably you have to address power-related requirements anyway. A well-
designed device, from the power point of view, not only consumes less energy, but it also allows to
simplify and minimize its power-section, reducing the overall dimension of the PCB, the BOM and
the power dissipation.
Often we think that the power management of an electronic board is all related to its powering stage.
In the last two decades, power-conversion has been the hot topic. The research and development
made by IC vendors did generate a lot of integrated devices able to boost the overall power efficiency
in a lot of applications fields, ranging from low-power solutions to high-load power conversion units
able to supply thousands of amperes. Instead, as embedded developers, we have great responsibility
in ensuring that our firmware can minimize the energy consumption of devices we make.
Modern microcontrollers provide to developers a lot of tools to minimize the energy used. Cortex-
M cores aren’t an exception, and they provide an “abstract” power management model that is
rearranged by silicon manufacturers to create their own power management scheme. This is exactly
the case of STM32 MCUs: even if power management is addressed in all STM32-series, it reaches
a very sophisticated implementation in STM32L families, which provide to developers a scalable
power model to precisely tune-up the energy needed. This allows to design electronic devices able
to run even for years while powered by a coin-cell battery.
In this chapter we will give a quick look to the way power management is implemented in STM32
MCUs, analyzing the STM32F-series and the STM32L-series separately. We will start examining
which features are provided by the Cortex-M core and then we will discover how ST engineers have
specialized them to provide up to eleven different power modes in the recent STM32L4-series.
386
Power management 387
is maintained active at all conditions¹, including sleep, shutdown and VBAT modes, the current
consumption of the LSE becomes more critical in overall system-level application design.
Focusing our attention exclusively on the MCU, the first aspect that affects the power consumption
is its running frequency: the faster goes the CPU, the higher it consumes. And this a law written in
the stone that all firmware developers must know: even if the MCU we are using is able to run up
to 200MHz, if we do not need all that speed then we can save a lot of energy by simply reducing the
clock frequency. And this is one of the main reasons why STM32 microcontrollers have a complex
clock distribution tree.
Another implication of this aspect is that the more peripherals are actively running, the more power
the MCU eats. This means that a well-designed firmware always immediately disables a peripheral
that becomes unnecessary. For example, if we need an I2C EEPROM only during the bootstrap
process (because it stores some configuration parameters that we retain in RAM during the firmware
life-cycle), then we have to disable the I2C peripheral once finished². This is the reason why STM32
MCUs offer the ability to selectively disable every peripheral, gating its clock source, by calling the
__HAL__RCC_<PPP>_CLK_DISABLE(), where <PPP> is the specific peripheral (for example, the __HAL_-
RCC_DMA1_CLK_DISABLE() allows to gate the clock of the DMA1, while the __HAL_RCC_DMA1_CLK_-
ENABLE() to enable it).
When talking about microcontrollers, it is best to talk about energy efficiency instead of just their
power consumption. While the power consumption of a device talks just about how many mA or
µA it uses, the energy efficiency measures “how much work” it can do with a limited amount of
energy, for example, in the form of DMIPS/mW or CoreMark/mW. We can so discover that for an
STM32L4 MCU the best energy compromise is reached when it runs in Low-Power RUN (LPRUN)
mode, as shown in Figure 8.
Finally, the design itself of the MCU and its peripherals impact on the overall power consumption.
This is the reason why STM32L microcontrollers are expressly designed to provide the best-in class
power consumption while providing the best performances according the specific sub-family. For
example, some of the communication peripherals in an STM32L4 MCU (the LPUART is one of these)
allow exchanging data in DMA mode while the MCU is in STOP2 mode³.
frequency and the number of active peripherals. Here it is important to remark that also the FLASH
and the SRAM memories are “peripherals” external to the Cortex-M core. Moreover, the adoption
of advanced FLASH prefetch technologies, like the ARTTM Accelerator, impact on the overall power
consumption too.
In this mode the developer can change the way the MCU consumes energy by regulating the clock
speed and by disabling the unneeded peripherals. This may seem obviously, but it is important to
remark that this is the best power optimization we can do in a lot of real situations. As we will see
later in this chapter, STM32L MCUs structure the run mode in several sub-modes, offering more
control on the power consumption while guaranteeing the majority of functionalities and the best
CPU performances.
If we know that we do not need to process anything for a given period of time, then Cortex-M cores
allow us to put them in sleep mode without doing busy-waits. In this mode the core is halted and it
can be woken up only by “external events” coming from the EXTI controller (for example, a push-
button connected to a GPIO). Again, STM32L MCUs expand this mode offering up to eight different
sub-modes, as we will see next.
It is important to underline that the Cortex-M core enters in sleep mode on “a voluntary basis”: two
distinct ARM instructions, that we are going to see in while, halts the CPU while leaving some of
its event lines active. By triggering these lines, the CPU resumes the execution in a given wake-up
time, which depends on the effective sleep level and the Cortex-M core type (M0, M3, and so on).
The wake-up latency can be expressed in term of CPU cycles for “lightweight” sleep modes, and in
µs for deep sleep modes. This means that the deeper is the sleep mode, the longer is the wake-up
time. Developers need to decide which sleep mode should be used for their specific applications: the
energy and time consumed entering and then exiting a deep low power state may outweigh any
potential power saving gains. In a wearable device energy efficiency is the most important factor,
while in some industrial control applications the wake-up latency can be really critical.
There are also different approaches to designing low power systems. Nowadays a lot of embedded
systems are designed to be interrupt driven. This means that the system stays in sleep mode when
there are no requests to be processed. When an interrupt request arrives, the processor wakes up
and processes it, and goes back into sleep mode when the work is done. Alternatively, if the data
processing request is periodic and has a constant duration, and if the data processing latency is not
an issue, you could run the system at the slowest possibly clock speed to reduce the power. There is
no clear answer to which approach is better, as the choice will be dependent on the data processing
requirements of the application, the microcontroller being used, and other factors like the type of
power source available.
Power management 389
Figure 1: how a firmware could potentially manage clock speed and power modes during its activity
The Figure 1 shows a possible strategy for the minimization of power consumption. During the
microcontroller booting process, the MCU runs at its maximum speed to allow a fast completion of
all initialization activities. When all peripherals are configured, the clock speed is lowered and the
MCU enters in sleep modes. In this period, the MCU is woken-up by interrupts that can be processed
at lower CPU speeds. When CPU-intensive operations need to be carried out, the clock speed can
be increased up to the maximum, and then decreased again once finished.
So, when to go into sleep mode? As said before, it is up to us to decide the right time to place the
MCU in one of the possible sleep modes. If we know that the MCU is waiting for asynchronous
events notified with interrupts, then it could be the right time to go into sleep mode instead of doing
busy-wait. Let us consider the classical blinking LED application we have seen several times in this
book.
...
while(1) {
HAL_GPIO_TogglePin(LD2_GPIO_Port, LD2_Pin);
HAL_Delay(500);
}
This apparently innocent code has a dramatic impact on the power consumption of our device. Even
if we do not have too much to do during those 500ms, we waste a lot of power checking the value
of the global SysTick tick count to see if that time has been elapsed. Instead, we can rearrange that
code to stay in sleep mode for the most of the time, and we can set up a timer that wakes up the
MCU after 100ms.
Letting other software components decide when to place the MCU in sleep mode could represent
another approach. As we will discover in a following chapter, a Real-Time Operating System may
be programmed to automatically put the MCU in sleep mode when there is nothing to do⁵.
⁵We will discover in that chapter that one possible strategy consists in placing the MCU in sleep mode when the idle thread is
scheduled. The idle thread is that thread executed by an RTOS when all other threads are “un-runnable”. This clearly means that
the MCU has nothing relevant to do, and it can be placed in sleep mode safely.
Power management 390
The Wait For Event (WFE) is the other instruction that allows to place the MCU in sleep mode. It
differs from the WFI due to the fact that it checks the status of a particular event register⁷ before
it halts the core: if this register is set, the WFE clears it and does not halt the CPU, continuing the
program execution (this allows us to manage the pending event, if needed). Otherwise, it halts the
MCU until this event register is set again.
But what is exactly the difference between an event and an interrupt? Events are a source of
confusion in the STM32 world (also in the Cortex-M world in general). They appear like something
intangible, compared to the interrupts that we have learned to handle in Chapter 7. Before we clarify
what events are, we need to better explain the role of the EXTI controller in an STM32 MCU. The
Extended Interrupts and Events Controller (EXTI) is the hardware component internal to the MCU
that manages the external and internal asynchronous interrupts/events and generates the event
request to the CPU/NVIC controller and a wake-up request to the Power Controller (see Figure
2). The EXTI allows the management of several event lines, which can wake up the MCU from some
sleep modes (not all events can wake up MCU). The lines are either configurable or direct and hence
hardwired inside the MCU:
• The lines are configurable: the active edge can be chosen independently, and a status flag
indicates the source of the interrupt. The configurable lines are used by the I/Os external
interrupts, and by few peripherals (more about this soon).
• The lines are direct and hardwired: they are used by some peripherals to generate a wakeup
from stop event or interrupt. The status flag is provided by the peripheral itself. For example,
the RTC can be used to generate an event to wake up the MCU.
⁶Clearly, we are talking about the power consumption of the MCU core and all integrated peripherals. The power consumption
of the overall board is determined by other things that we will not address here.
⁷This register is internal to the core and not accessible to the user.
Power management 391
This controller also allows to emulate events or interrupts by software, multiplexed with the
corresponding hardware event line, by writing to the dedicated register.
Another important aspect to clarify about EXTI and NVIC controllers is that each line can be masked
independently for an interrupt or an event generation. For example, in Chapter 6 we have seen that
a GPIO can be configured to work in GPIO_MODE_EVT_* mode, which is different from the GPIO_-
MODE_IT_* mode: in the first case, when an I/O is triggered it will not generate an IRQ request, but
it will set the event flag. This will cause the MCU to wake up if it has entered a low-power mode
using the WFE instruction.
So the WFE instruction checks that no event is pending, and for this reason it is also called the
conditional sleep instruction. This event register can be set by:
In Chapter 7 we have seen that in Cortex-M3/4/7 cores we can temporarily mask the execution of
those interrupts having a priority lower than a value set in the BASEPRI register. However, these
interrupts are still enabled and marked as pending if they fires. We can configure the MCU to set
the event register in case of pending interrupts, by setting the SCR->SEVONPEND bit. As the name
suggest, this register will cause to “set the event register if interrupts are pending”. This means that,
Power management 392
if the processor was placed in sleep mode by the WFE instruction, the CPU is immediately awakened
and we can eventually process pending interrupts. Instead, the WFI instruction would never wake
up the core. The Cube HAL provides two convenient functions, HAL_PWR_EnableSEVOnPend() and
HAL_PWR_DisableSEVOnPend(), to perform this setting.
If, instead, interrupts are masked by setting the PRIMASK register, a pending interrupt can wake up
the processor, regardless for the sleep instruction used (WFI or WFE): this characteristic allows some
parts of the MCU to be turned OFF by software by gating its clock, and the software can turn it back
on after waking up before executing the ISR.
So, to recap, the WFI and WFE have the same following behaviour:
• wake up on interrupt/exception requests that are enabled and with higher priority than current
level⁸;
• can be woken up by debug events;
• can be used to produce both sleep and deep sleep modes (more about this soon).
Instead, the WFI and WFE differ for the following reasons:
• execution of WFE does not enter sleep mode if the internal event register is set, while the
execution of WFI always results in sleep;
• new pending of a disabled or masked interrupt can wake up the processor from WFE if
SEVONPEND is set;
• WFE can be woken up by en external event;
• WFI can be woken up by an enabled interrupt when PRIMASK is set.
12.2.1.1 Sleep-On-Exit
The Sleep-On-Exit feature is useful for interrupt-driven applications where all operations (apart from
the initialization stage) are performed inside interrupt handlers. This is a programmable feature,
and can be enabled or disabled setting a bit of the SCB->SCR register. When enabled, the Cortex-
M core automatically enters sleep mode (with the same behavior of WFI instruction) when exiting
from an exception/interrupt handler. The Sleep-On-Exit feature should be enabled at the end of the
initialization stage. Otherwise, if an interrupt event happens during the initialization stage while
the Sleep-On-Exit feature is already enabled, the processor will enter sleep even if the initialization
stage was not yet completed.
The CubeHAL provides two convenient routines to enable/disable this mode: HAL_PWR_EnableSleep-
OnExit() and HAL_PWR_DisableSleepOnExit().
has an internal power distribution network that defines several voltage domains used to power those
peripherals that share the same powering characteristics. For example, the VDDA domain includes
those analog peripherals that need a separated (better filtered) power source, fed through the VDDA
pins.
The VDD and VDD18 domains are the most relevant one. The VDD domain is supplied by an external
power source, while the VDD18 domain is supplied by a voltage regulator internal to the MCU. This
regulator can be configured to work in low-power mode, as we will see next. To retain the content
of the backup registers¹¹ and supply the RTC function when VDD is turned OFF, VBAT pin can be
connected to an optional standby voltage supplied by a battery or by another source. The VBAT pin
powers the RTC unit, the LSE oscillator and one or two pins used to wake up the MCU from deep
sleep modes, allowing the RTC to operate even when the main power supply is turned OFF. For this
reason, the VBAT power source is said to power the RTC domain. The switch to the VBAT supply
is controlled by the Power Down Reset (PDR) embedded in the reset block.
¹¹Backup registers are a dedicated memory area, with a typical size of 4Kb, that is powered by a different power source usually
connected to a battery or a super-capacitor. This is used to store volatile data that remains valid even when the MCU is powered
OFF, either if the whole device is turned OFF or the MCU is placed in standby mode.
Power management 395
By default, and after power-on or a system reset, STM32F MCUs are placed in run mode, which is
a fully active mode that consumes much power even when performing minor tasks. Consumptions
of both the run and the sleep modes depend on the operating frequency¹².
The Figure 4¹³ shows the power consumption levels of some of the most recent STM32F4 MCUs.
In run mode, the main regulator supplies full power to the 1.8-1.2V domain (CPU core, memories
and digital peripherals). In this mode, the regulator output voltage (around 1.8-1.2V depending on
the particular STM32F MCU) can be scaled by software to different voltage values (more about this
soon). Some recent STM32F4 MCUs provide two run modes:
• Normal mode: the CPU and core logic operate at maximum frequency at a given voltage
scaling (scale 1, scale 2 or scale 3).
• Over-drive mode: this mode allows the CPU and the core logic to operate at a higher
frequency than the normal mode for the voltage scaling scale 1 and scale 2. More about this
mode later.
¹²Don’t forget that in sleep mode only the CPU clock is turned OFF, while other peripherals remain active. So, the speed of the
HCLK clock source still continues to affect the overall power consumption.
¹³The figure is taken from the ST AN4365(http://bit.ly/1XzmF2o) application note.
Power management 396
The power used by a DC circuit is given by the current drawn and the voltage of the circuit. This
means that we can reduce the power needed by the circuit by reducing the voltage. STM32F4/F7
provides a smart powering technology named Dynamic Voltage Scaling (DVS) distinctive of
STM32L-series. The idea behind DVS is that many embedded systems do not always require the
system’s full processing capabilities, because not all subsystems are always active. When this is
the case, the system can remain in the active mode without the processor running at its maximum
operating frequency. The voltage supplied to the processor can be then decreased when a lower
frequency is sufficient. With such power management, we reduce the power drawn battery by
monitoring the processor input voltage in response to the system’s performance requirements.
That consists in scaling the STM32F4 regulator output voltage that supplies the 1.2V domain (core,
memories and digital peripherals) when we lower the clock frequency based on processing needs.
STM32F4/F7 offer three voltages scales (scale 1, scale 2 and scale 3). The maximum achievable core
frequency for a given voltage scale is determined by the specific STM32 MCU. For example, the
STM32F401 provides only two voltage scales, scale 2 and scale 3, that allow to run the core up to
84MHz and 60MHz respectively. To control the voltage scaling, the CubeHAL provides the function:
• Set the HSI or HSE as system clock frequency using the HAL_RCC_ClockConfig().
• Call the HAL_RCC_OscConfig() to configure the PLL.
• Call HAL_PWREx_ConfigVoltageScaling() API to adjust the voltage scale.
• Set the new system clock frequency using the HAL_RCC_ClockConfig().
Some MCUs from the STM32F4 family and all STM32F7 ones provide two or even several sub-
running modes. These modes are called over-drive and under-drive. The first one consists in
increasing the core frequency with a sort of “overclocking”. It is recommended to enter over-
drive mode when the application is not running critical tasks and when the system clock source
is either HSI or HSE. These features are useful when we want to temporarily increase/decrease
the MCU clock speed without reconfiguring the clock tree, which usually introduces a non-
negligible overhead. The HAL provides two convenient functions, HAL_PWREx_EnableOverDrive()
and HAL_PWREx_DisableOverDrive() to perform this operation.
The under-drive mode is the opposite of the over-drive one and consists in lowering the CPU
frequency and by disabling some peripherals. In this mode it is possible to place the internal voltage
regulator in low-power mode. In some STM32F4/F7 MCUs the under-drive mode is available even
in stop mode.
The sleep mode is entered by executing the WFI or WFE instruction. In the sleep mode, all I/O pins keep
the same state as in the run mode. However, we should not care to deal with assembly instructions,
since the CubeHAL provides the function:
The first parameter, Regulator, is meaningless in sleep mode for all STM32F-series, and it is left
for compatibility with STM32L-series. The second parameter, SLEEPEntry, can assume the values
PWR_SLEEPENTRY_WFI or PWR_SLEEPENTRY_WFE: as the names suggest, the former performs a WFI
instruction and the latter a WFE.
If you take a look to the HAL_PWR_EnterSLEEPMode() function you discover that, if we pass
the parameter PWR_SLEEPENTRY_WFE, it executes two WFE instructions consecutively. This
causes that the HAL_PWR_EnterSLEEPMode() enters in the sleep mode in the same way as it
would be called with the parameter PWR_SLEEPENTRY_WFI (calling WFE twice causes that if
the event register is set, then it is cleared by the first WFE instruction, and the second one
place the MCU in sleep mode). I do not know why ST has adopted this approach. If you
want full control over the way the MCU is placed in low-power modes, than you have to
rearrange the content of that function at your need. Clearly, the MCU will exit from sleep
mode following the exit condition of the WFE instruction.
¹⁴http://bit.ly/1XzmF2o
Power management 398
If the WFI instruction is used to enter in sleep mode, any peripheral interrupt acknowledged by the
nested vectored interrupt controller (NVIC) can wake up the device from sleep mode.
If the WFE instruction is used to enter sleep mode, the MCU exits sleep mode as soon as an event
occurs. The wakeup event can be generated either by:
• enabling an interrupt in the peripheral control register but not in the NVIC, and enabling
the SEVONPEND bit in the System Control Registe - When the MCU resumes from WFE, the
peripheral interrupt pending bit and the peripheral NVIC IRQ channel pending bit (in the
NVIC interrupt clear pending register) have to be cleared;
• or configuring an external or internal EXTI line in event mode - When the CPU resumes from
WFE, it is not necessary to clear the peripheral interrupt pending bit or the NVIC IRQ channel
pending bit as the pending bit corresponding to the event line is not set.
This mode offers the lowest wakeup time as no time is wasted in interrupt entry/exit.
The stop mode is based on the Cortex-M deep sleep mode combined with peripheral clock gating.
In stop mode all clocks in the 1.8V domain are stopped, the PLL, the HSI and the HSE oscillators are
disabled. SRAM and register contents are preserved. In the stop mode, all I/O pins keep the same
state as in the run mode. The voltage regulator can be configured either in normal or low-power
mode. To place the MCU in stop mode the HAL provides the function:
where the Regulator parameter accepts the value PWR_MAINREGULATOR_ON to leave the internal
voltage regulator ON, or the value PWR_LOWPOWERREGULATOR_ON to place it in low-power mode. The
parameter STOPEntry can assume the values PWR_STOPENTRY_WFI or PWR_STOPENTRY_WFE.
To enter stop mode, all EXTI-line pending bits, all peripherals interrupt pending bits and RTC Alarm
flag must be reset. Otherwise, the stop mode entry procedure is ignored and program execution
continues. If the application needs to disable the external high-speed oscillator (HSE) before entering
stop mode, the system clock source must be first switched to HSI and then clear the HSEON bit.
Otherwise, if before entering stop mode the HSEON bit is kept at 1, the security system (CSS) feature
must be enabled to detect any external oscillator (external clock) failure and avoid a malfunction
when entering stop mode.
Any EXTI-line configured in interrupt or event mode forces the CPU to exit from stop mode,
according if it entered in low-power mode using the WFI or WFE instruction. Since both HSE and
PLL are disabled before entering in stop mode, when exiting from this low-power mode the MCU
source clock is set to the HSI. This means that our code shall reconfigure the clock tree according to
wanted SYSCLK speed.
Power management 399
The standby mode allows to achieve the lowest power consumption. It is based on the Cortex-M
deep sleep mode, with the voltage regulator disabled. The 1.8-1.2V domain is consequently powered
OFF. PLL multiplexer, HSI and HSE oscillators are also switched OFF. SRAM and register contents
are lost except for registers in the standby circuitry. To place the MCU in standby mode the HAL
provides the function:
void HAL_PWR_EnterSTANDBYMode(void);
The microcontroller exits the standby mode when an external reset (NRST pin), an IWDG reset,
a rising edge on one of the enabled WKUPx pins or an RTC event occurs. All registers are reset
after wakeup from standby except for Power Control/Status Register (PWR->CSR). After waking up
from standby mode, program execution restarts in the same way as after a reset (boot pin sampling,
option bytes loading, reset vector is fetched, etc.). Using the macro:
__HAL_PWR_GET_FLAG(PWR_FLAG_SB);
we can check if the MCU is resetting due to an exit from standby mode. Since both HSE and PLL
are disabled before entering in stop mode, when exiting from this low-power mode the MCU source
clock is set to the HSI. This means that our code shall reconfigure the clock tree according to wanted
SYSCLK speed.
Read carefully
Some STM32 MCUs have a hardware bug that prevents entering or exiting from standby
mode. Particular conditions must be met before we enter in this mode. Consult the errata
sheet for your MCU for more about this (if applicable).
The following example, designed to run on a Nucleo-F030R8¹⁵ shows the way low-power modes
work.
Filename: src/main-ex1.c
14 int main(void) {
15 char msg[20];
16
17 HAL_Init();
18 Nucleo_BSP_Init();
19
20 /* Before we can access to every register of the PWR peripheral we must enable it */
21 __HAL_RCC_PWR_CLK_ENABLE();
22
23 while (1) {
24 if(__HAL_PWR_GET_FLAG(PWR_FLAG_SB)) {
25 /* If standby flag set in PWR->CSR, then the reset is generated from
26 * the exit of the standby mode */
27 sprintf(msg, "RESET after STANDBY mode\r\n");
28 HAL_UART_Transmit(&huart2, (uint8_t*)msg, strlen(msg), HAL_MAX_DELAY);
29 /* We have to explicitly clear the flag */
30 __HAL_PWR_CLEAR_FLAG(PWR_FLAG_WU|PWR_FLAG_SB);
31 }
32
33 sprintf(msg, "MCU in run mode\r\n");
34 HAL_UART_Transmit(&huart2, (uint8_t*)msg, strlen(msg), HAL_MAX_DELAY);
35 while(HAL_GPIO_ReadPin(GPIOC, GPIO_PIN_13) == GPIO_PIN_SET) {
36 HAL_GPIO_TogglePin(LD2_GPIO_Port, LD2_Pin);
37 HAL_Delay(100);
38 }
39
40 HAL_Delay(200);
41
42 sprintf(msg, "Entering in SLEEP mode\r\n");
43 HAL_UART_Transmit(&huart2, (uint8_t*)msg, strlen(msg), HAL_MAX_DELAY);
44
45 SleepMode();
46
47 sprintf(msg, "Exiting from SLEEP mode\r\n");
48 HAL_UART_Transmit(&huart2, (uint8_t*)msg, strlen(msg), HAL_MAX_DELAY);
49
50 while(HAL_GPIO_ReadPin(GPIOC, GPIO_PIN_13) == GPIO_PIN_SET);
51 HAL_Delay(200);
52
53 sprintf(msg, "Entering in STOP mode\r\n");
54 HAL_UART_Transmit(&huart2, (uint8_t*)msg, strlen(msg), HAL_MAX_DELAY);
55
56 StopMode();
Power management 401
57
58 sprintf(msg, "Exiting from STOP mode\r\n");
59 HAL_UART_Transmit(&huart2, (uint8_t*)msg, strlen(msg), HAL_MAX_DELAY);
60
61 while(HAL_GPIO_ReadPin(GPIOC, GPIO_PIN_13) == GPIO_PIN_SET);
62 HAL_Delay(200);
63
64 sprintf(msg, "Entering in STANDBY mode\r\n");
65 HAL_UART_Transmit(&huart2, (uint8_t*)msg, strlen(msg), HAL_MAX_DELAY);
66
67 StandbyMode();
68
69 while(1); //Never arrives here, since MCU is reset when exiting from STANDBY
70 }
71 }
72
73
74 void SleepMode(void)
75 {
76 GPIO_InitTypeDef GPIO_InitStruct;
77
78 /* Disable all GPIOs to reduce power */
79 MX_GPIO_Deinit();
80
81 /* Configure User push-button as external interrupt generator */
82 __HAL_RCC_GPIOC_CLK_ENABLE();
83 GPIO_InitStruct.Pin = B1_Pin;
84 GPIO_InitStruct.Mode = GPIO_MODE_IT_RISING;
85 GPIO_InitStruct.Pull = GPIO_NOPULL;
86 HAL_GPIO_Init(B1_GPIO_Port, &GPIO_InitStruct);
87
88 HAL_UART_DeInit(&huart2);
89
90 /* Suspend Tick increment to prevent wakeup by Systick interrupt.
91 Otherwise the Systick interrupt will wake up the device within 1ms (HAL time base) */
92 HAL_SuspendTick();
93
94 __HAL_RCC_PWR_CLK_ENABLE();
95 /* Request to enter SLEEP mode */
96 HAL_PWR_EnterSLEEPMode(0, PWR_SLEEPENTRY_WFI);
97
98 /* Resume Tick interrupt if disabled prior to sleep mode entry*/
99 HAL_ResumeTick();
100
101 /* Reinitialize GPIOs */
Power management 402
102 MX_GPIO_Init();
103
104 /* Reinitialize UART2 */
The macro __HAL_RCC_PWR_CLK_ENABLE() at line 21 enables the PWR peripheral: before we can
perform any operation related to power management, we need to enable the PWR peripheral, even
if we are simply checking if the standby flag is set inside the PWR->CSR register. This is a source of
a lot of headaches in novice users struggling with power management.
Lines [24:31] check if the standby flag is set: if so, it means that the MCU was reset after exiting
from standby mode. Lines [33:38] represent the run mode: the LD2 LED blinks until we press the
Nucleo USER button connected to the PC13 pin. The remaining lines of code in the main() just cycle
through the three low-power modes at every pressure of the USER button.
Lines [74:106] define the SleepMode() function, used to place the MCU in sleep mode. All GPIOs
are configured as analog, to reduce current consumption on non-used IOs (especially those pins
that may be source of leaks). The corresponding peripheral clock is turned OFF, except for the
GPIOC peripheral: the PC13 GPIO is used to resume from low-power modes. The same apply for the
UART2 interface and the SysTick timer, which is halted to prevent the MCU from exiting low-power
mode after 1ms. The call to the HAL_PWR_EnterSLEEPMode() function at line 96 places the MCU in
sleep mode, until it wakes up when the USER button is pressed (the MCU wakes up because we
configure the corresponding IRQ that causes the WFI instruction exiting from the low-power mode).
The StopMode() function, not shown here, is almost identical to the SleepMode() one, except for the
fact that it calls the function HAL_PWR_EnterSTOPMode() to place the MCU in stop mode.
Filename: src/main-ex1.c
Finally, lines [144:160] define the StandbyMode() function. Here we follow the procedure described
in the STM32F30 errata sheet, since that MCU is affected by an hardware bug that prevents the CPU
from entering in standby mode: we have to disable the PWR_WAKEUP_PIN1 pin firstly, then to clear the
wake up flag in the PWR->CSR peripheral and to re-enable the wake up pin, which in an STM32F030
MCU coincides with the PA0 pin.
STM32 MCUs usually have two wake up pins, named PWR_WAKEUP_PIN1 and PWR_-
WAKEUP_PIN2. For a lot of STM32 MCUs with LQFP64 package the second wake-up pin
coincides with PC13, which is connected to the USER button in all Nucleo boards (except
for the Nucleo-F302 where it is connected to PB13 pin). However, we cannot use the
PWR_WAKEUP_PIN2 in our example, because that pin is pulled high by a resistor on the PCB.
When we configure wake up pins in conjunction with the standby mode, we are not using
the corresponding GPIO peripheral, which would allow us to configure the pin input mode,
because it is powered down before entering in standby mode: the wake up pins are directly
handled by the PWR peripheral, which resets the MCU if one of the two pins goes high.
So, in the example we use the PWR_WAKEUP_PIN1 pin, which corresponds to the PA0 pin in
an STM32F030 MCU.
Nucleo boards allow to measure the current consumption of the MCU using the IDD pin header.
Before you start measurements, you should establish the connection with the board as shown in
Figure 5 by removing the IDD jumper and connect the ammeter cables. Ensure that the ammeter is
set to the mA scale. In this way you can see the power consumption for every power mode.
Power management 404
¹⁶http://bit.ly/1rVwDBf
Power management 405
The change simply consists in adding two memory barrier instructions, one before and one after the
WFI instruction, as shown at lines X and Y.
I have asked a question regarding this issue on the official ST forum¹⁷, but I have not still received
an answer at the time of writing this chapter, and I suspect that I will not receive anything.
Even in these families, the VDD domain is the most relevant one. It is used to supply other voltage
domains, like the VDDIO1 domain, which is used to power the most of MCU pins, and the internal
voltage regulator used to supply the VCORE domain. This can be programmed by software to two
different power ranges (scale 1, scale 2 and so on) in order to optimize the consumption depending
on the maximum operating frequency of the system (thanks to the voltage scaling technology seen
before). It is interestingly to remark that for those MCU providing the GPIOG peripheral (that is,
those MCU coming with package with high pin count), the VDDIO2 domain is used to supply the
GPIOG peripheral independently. This domain, together with the USB domain, can be selectively
enabled/disabled by dedicated functions provided by the HAL (HAL_PWREx_EnableVddIO2(), HAL_-
PWREx_EnableVddUSB(), etc.).
To retain the content of the backup registers and supply the RTC function when VDD is turned
OFF, VBAT pin can be connected to an optional standby voltage supplied by a battery or by another
source. The VBAT pin powers the RTC unit, the LSE oscillator and one or two pins used to wake up
the MCU from deep sleep modes, allowing the RTC to operate even when the main power supply is
turned OFF. For this reason, the VBAT power source is said to power the RTC domain. The switch
to the VBAT supply is controlled by the PDR. The VLCD pin is provided to control the contrast of
the LCD.
Power management 407
By default, and after power-on or a system reset, STM32L MCUs are placed in run mode. The default
clock source is set to the MSI, a power-optimized clock source that we have encountered in Chapter
10. STM32L microcontrollers offer to developers more fine-tune capabilities, which allow to reduce
the power consumption in this mode. If we do not need too much computing power, then we can
leave the MSI as the main clock source, avoiding the powering consumption introduced by the PLL
multiplexer. By reducing the clock speed down to 24-26MHz, we can configure the Dynamic Voltage
Scaling (DVS) scale 2 that decreases the VCORE domain down to 1.0V in more recent STM32L4
¹⁸The power consumption values reported in Figure 7 refer to the STM32L476 series.
Power management 408
MCUs. This mode is also called run range 2 and the overall power consumption can further decreased
by disabling the FLASH memory.
As said before, the FLASH in STM32L MCU can disabled even in the run mode. The Cube-
HAL function HAL_FLASHEx_EnableRunPowerDown() automatically performs this operation
for us, while the HAL_FLASHEx_DisableRunPowerDown() routine enables again the FLASH
memory. The only required condition is that this function, and all those other routines
used when the FLASH is OFF (interrupt vector included) are placed in SRAM, otherwise
a Bus Fault occurs as soon as the FLASH is powered down. This can be easily performed
creating a custom linker script, as we will see in a following chapter.
To further reduce the energy consumption when the system is in run mode, the internal voltage
regulator can be configured in low-power mode. In this mode, the system frequency should
not exceed 2 MHz. The HAL_PWREx_EnableLowPowerRunMode() function performs this operation
automatically for us. In this mode we can eventually disable the FLASH memory, to further reduce
the overall power consumption.
The low-power run mode represents the best compromise in STM32L MCUs from the energy
Power management 409
efficiency point of view, as shown in Figure 8¹⁹. As you can see, enabling the ART accelerator
increases performance but also reduces the dynamic consumption. Best consumption is most often
reached when the Instruction Cache is ON, Data Cache is ON and Prefetch Buffer is OFF, as this
configuration reduces the number of FLASH memory accesses.
The small FLASH dynamic consumption allows a small consumption each time the firmware needs
to access the FLASH memory. Consumptions from SRAM1 and SRAM2 are quite similar, but SRAM2
is much more power efficient than SRAM1, when not remapped at address 0, thanks to its 0-wait
state access.
Sleep modes allow all peripherals to be used, providing the fastest wakeup time at the same time. In
these modes, the CPU is stopped and each peripheral clock can be configured by software to be gated
ON or OFF during the sleep and low-power sleep modes. These modes are entered by executing the
assembler instructions WFI or WFE. To place the MCU in one of the two sleep modes, the CubeHAL
provides the function:
The first parameter, Regulator, can accept the values PWR_MAINREGULATOR_ON and PWR_LOWPOWER-
REGULATOR_ON: the former places the MCU in sleep mode, the latter in low-power sleep mode. The
second parameter, SLEEPEntry, can assume the values PWR_SLEEPENTRY_WFI or PWR_SLEEPENTRY_-
WFE: as the names suggest, the first one performs a WFI instruction and the second one a WFE.
¹⁹The Figure 8 is taken from this ST official document(http://bit.ly/1WcHv8W). ST also provides a really useful application note,
the AN4746(http://bit.ly/1Nkp8NI), about power consumption optimization in STM32L4 MCUs.
Power management 410
Read carefully
Please, take note that for STM32L MCUs the system frequency should not exceed MSI range
1 value in this power mode. Please refer to product datasheet for more details on voltage
regulator and peripherals operating conditions.
If the WFI instruction is used to enter in sleep mode, any peripheral interrupt acknowledged by the
NVIC can wake up the device from sleep mode.
If the WFE instruction is used to enter sleep mode, the MCU exits sleep mode as soon as an event
occurs. The wakeup event can be generated either by:
• enabling an interrupt in the peripheral control register but not in the NVIC, and enabling
the SEVONPEND bit in the System Control Register - When the MCU resumes from WFE, the
peripheral interrupt pending bit and the peripheral NVIC IRQ channel pending bit (in the
NVIC interrupt clear pending register) have to be cleared;
• or configuring an external or internal EXTI line in event mode - When the CPU resumes from
WFE, it is not necessary to clear the peripheral interrupt pending bit or the NVIC IRQ channel
pending bit as the pending bit corresponding to the event line is not set.
After exiting the low-power sleep mode, the MCU is automatically placed in low-power run mode.
Batch Acquisition Mode (BAM) is an implicit and optimized mode for transferring data. Only the
needed communication peripheral (e.g. the I2C one), one DMA and the SRAM are configured with
clock enable in sleep mode. FLASH memory is put in power-down mode and the FLASH memory
clock is gated OFF during sleep mode.
The MCU can enter either sleep or low-power sleep mode. Take note that the I2C clock can be set
at 16 MHz even in low-power sleep mode, allowing support for 1MHz fast-mode plus. The USART
and LPUART clocks can also be based on the HSI oscillator. Typical applications of BAM are sensor
hubs.
STM32L MCUs can provide up to 2 different stop modes, named stop1 and stop2. Stop modes are
based on the Cortex-M deep sleep mode combined with the peripheral clock gating. The voltage
regulator can be configured either in normal²⁰ or low-power mode. In stop1 mode, all clocks in the
VCORE domain are stopped; the PLL, the MSI, the HSI16 and the HSE oscillators are disabled. Some
peripherals with the wakeup capability (I2C, USART and LPUART) can switch ON the HSI16 to
receive a frame, and switch OFF the HSI16 after receiving the frame if it is not a wakeup frame. In this
²⁰The HAL calls this mode stop0, and this achieved by calling the HAL_PWREx_EnterSTOP0Mode() function.
Power management 411
case, the HSI16 clock is propagated ONly to the peripheral requesting it. SRAM1, SRAM2 and register
contents are preserved. Several peripherals can be functional in stop1 mode: PVD, LCD controller,
digital to analog converters, operational amplifiers, comparators, independent watchdog, LPTIM
timers (if available), I2C, UART and LPUART. The stop2 differs from the stop1 mode by the fact
that only the following peripherals are available: PVD, LCD controller, comparators, independent
watchdog, LPTIM1, I2C3, and the LPUART.
The BOR is always available in both in stop1 and stop2 modes. The consumption is increased when
thresholds higher than VBOR0 are used.
To place the MCU in stop mode the HAL provides the function:
where the ‘x’ is equal to 0, 1 and 2 depending on the stop mode. The parameter STOPEntry can assume
the values PWR_STOPENTRY_WFI or PWR_STOPENTRY_WFE. For compatibility with the other HALs, the
HAL_PWR_EnterSTOPMode() is also available.
To enter stop mode, all EXTI-line pending bits, all peripherals interrupt pending bits and RTC Alarm
flag must be reset. Otherwise, the stop mode entry procedure is ignored and program execution
continues. Stop1 mode can be entered from run mode and low-power run mode, while it is not
possible to enter stop2 mode from the low-power run mode.
Any EXTI-line configured in interrupt or event mode forces the CPU to exit from stop mode,
according if it entered in low-power mode using the WFI or WFE instruction. Since both HSE and
PLL are disabled before entering in stop mode, when exiting from this low-power mode the MCU
source clock is set to the HSI. This means that our code shall reconfigure the clock tree according to
wanted SYSCLK speed.
STM32L MCUs provide two standby modes, which are based on the Cortex-M deep sleep mode. The
standby mode is the lowest power mode in which 32 Kbytes of SRAM2 can be retained, the automatic
switch from VDD to VBAT is supported and the I/Os level can be configured by independent pull-up
and pull-down circuitry.
By default, the voltage regulators are in power down mode and the SRAMs and the peripherals
registers are lost. The 128-byte backup registers are always retained. The ultra-low-power BOR is
always ON to ensure a safe reset regardless of the VDD slope.
To place the MCU in standby mode the HAL provides the function:
void HAL_PWR_EnterSTANDBYMode(void);
void HAL_PWREx_EnableSRAM2ContentRetention(void);
__HAL_PWR_GET_FLAG(PWR_FLAG_SB);
we can check if the MCU is resetting due to an exit from standby mode. Since both HSE and PLL
are disabled before entering in stop mode, when exiting from this low-power mode the MCU source
clock is set to the HSI. This means that our code shall reconfigure the clock tree according to wanted
SYSCLK speed.
The shutdown mode is the lowest power mode with only 30 nA at 1.8 V in STM32L4 MCUs. This
mode is similar to the standby one but without any power monitoring: the BOR is disabled and
the switch to VBAT is not supported in this mode. The LSI is not available, and consequently the
independent watchdog is also not available. A Brown-Out Reset is generated when the device exits
shutdown mode: all registers are reset except those in the backup domain, and a reset signal is
generated on the pad. The 128-byte backup registers are retained in shutdown mode. When exiting
shutdown mode, the wakeup clock is MSI at 4 MHz.
To enter shutdown mode the HAL provides the function:
void HAL_PWREx_EnterSHUTDOWNMode(void);
The microcontroller exits the shutdown mode when an external reset (NRST pin), a rising edge on
one of the enabled WKUPx pins or an RTC event occurs. All registers are reset after wakeup from
standby, including the Power Control/Status Register (PWR->CSR). After waking up from shutdown
mode, program execution restarts in the same way as after a reset (boot pin sampling, option bytes
loading, reset vector is fetched, etc.).
Power management 413
12.4.4.1 LPUART
The Low-Power UART (LPUART) is an UART that allows bidirectional UART communications with
limited power consumption. Only a 32.768 kHz LSE clock is required to allow UART communications
up to 9600 baud/s. Higher baud rates can be reached when the LPUART is clocked by clock sources
different from the LSE clock.
Even when the microcontroller is in stop mode, the LPUART can wait for an incoming UART frame
while having an extremely low-energy consumption. The LPUART includes all necessary hardware
Power management 414
support to make asynchronous serial communications possible with minimum power consumption.
It supports half-duplex single wire communications and modem operations (CTS/RTS). It also
supports multiprocessor communications. DMA can be used for data transmission/reception even
in stop 2 mode.
To program the LPUART peripheral we use the same functions from the HAL_UART module.
12.4.4.2 LPTIM
The Low-Power Timer (LPTIM) is a 16-bit timer that benefits from the ultimate developments in
power consumption reduction. Thanks to its diversity of clock sources, the LPTIM is able to keep
running whatever the selected power mode, different from standard STM32 timers that do not run
during stop modes. Given its capability to run even with no internal clock source, the LPTIM can be
used as a pulse counter which can be useful in some applications. Moreover, the LPTIM capability
to wake up the system from low-power modes makes it suitable to realize timeout functions with
extremely low-power consumption. In a following chapter about FreeRTOS, we will use the LPTIM
timer as source timebase for tickless idle mode. The LPTIM introduces a flexible clock scheme that
provides the needed functionalities and performances, while minimizing the power consumption.
These are the relevant features of a LPTIM peripheral:
• 16 bit upcounter
• 3-bit prescaler with 8 possible dividing factor (1,2,4,8,16,32,64,128)
• Selectable clock source
– Internal clock sources: LSE, LSI, HSI16 or APB clock
– External clock source over ULPTIM input (working with no LP oscillator running, used
by pulse counter application)
• 16 bit period register
• 16 bit compare register
• Continuous/one shot mode
• Selectable software/hardware input trigger
• Configurable output: Pulse, PWM
• Configurable I/O polarity
• Encoder mode
reaches the specified VBOR threshold. VBOR is configured through device option bytes. By default,
BOR is OFF. The user can select between three to five programmable VBOR threshold levels. For
full details about BOR characteristics, refer to the “Electrical characteristics” section in the device
datasheet. STM32 devices that do not provide a BOR unit, usually have a similar unit named Power
on Reset (POR)/Power Down Reset (PDR), which perform the same operation of the BOR unit but
with a fixed and factory-configured voltage threshold.
The power supply can be actively monitored by the firmware by using the Programmable Voltage
Detector (PVD). The PVD allows to configure a voltage to monitor, and if this VDD is higher or
lower than the given level, a corresponding bit in the Power Control/Status Register (PWR->CSR) is
set. If properly configured, the MCU can generate a dedicated IRQ through the EXTI controller. To
enable/disable the PVD in those MCUs with this features, the HAL provides the functions HAL_PWR_-
EnablePVD()/HAL_PWR_DisablePVD(), while to configure the voltage level it provides the function
HAL_PWR_ConfigPVD(). For more information, refer to the HAL_PWREx module of the CubeHAL.
The Figure 10 shows the main PCC view. To use it we have to first select the Vdd Power Supply
source, otherwise the tool does not allow us to create steps in the power sequence. The next optional
step consists in selecting a battery used to power the MCU when the main power is absent. This
is useful to evaluate the battery life. We can choose from a portfolio of well-known batteries, or
eventually add a custom one.
By clicking on the green ‘+’, we can add a step of the sequence. Here we can specify the power mode
(run, sleep, etc), the memories configuration (FLASH enabled/disabled, ART enabled/disabled, and
so on) and the power voltage level. From the same dialog we can also choose the CPU frequency,
the duration of the step and the enabled peripherals.
With this tool we can so figure out how much power is needed by the microcontroller. In L0, L1
and L4 MCUs is also possible to enable the Transition Checker, which allows to identify invalid
transition states (for example, we cannot switch from the run mode to the low-power sleep one
without passing from the low-power run mode). For more information about the PCC view refer to
the UM1718²² from ST.
²²http://bit.ly/1WDpa5r
13. Memory layout
Every time we compile our firmware using the GCC ARM tool-chain, a series of non-trivial things
takes place. The compiler translates the C source code in the ARM assembly and organizes it
to be flashed on a given STM32 MCU. Every microprocessor architecture defines an execution
model that has to be “matched” with the execution model of the C programming language. This
means that several operations are performed during boot, whose intent is to prepare the execution
environment for our application: the stack and heap creation, the initialization of data memory, the
vector table initialization are just some of the activities performed during startup. Moreover, some
STM32 microcontrollers provide additional memories, or allow to interface external ones using the
FMC controller, that can be assigned to specific tasks during the firmware lifecycle.
This chapter aim to throw light to those questions that are common to a lot of STM32 developers.
What does it happen when the MCU resets? Why providing the main() function is mandatory? And
how long does it take to execute since the MCU resets? How to store variables in FLASH instead of
SRAM? How to use the STM32 CCM memory?
417
Memory layout 418
The SRAM memory is also organized in several sub-regions. A variable-sized region starting from
the end of SRAM and growing downwards (that is, its base address has the highest SRAM address)
is dedicated to the stack. This happens because Cortex-M cores use a stack memory model called
full-descending stack. The base stack pointer, also called Main Stack Pointer (MSP), is computed at
compile time, and it is stored at 0x0800 0000 FLASH memory location. Once we call a function, a
new stack frame is pushed on the stack. This means that the pointer to current stack frame (SP) is
automatically decremented at every function call (for example, the ARM assembly push instruction
automatically decrements it).
The SRAM is also used to store variable data, and this region usually starts at beginning of
SRAM (0x2000 0000). This region is in turn divided between initialized and un-initialized data.
To understand the difference, let us consider this code fragment:
Memory layout 419
...
uint8_t var1 = 0xEF;
uint8_t var2;
...
var1 and var2 are two global variables. var1 is an initialized variable (we fix its starting value at
compile time), while the value var2 is un-initialized: it is up to the run-time to initialize it to zero.
For the same reason, we have two .data sections: one stored in FLASH and one in RAM, as we will
see next.
Finally, the SRAM memory could contain another growing region: the heap. It stores variables that
are allocated dynamically during the execution of the firmware (by using the C malloc() routine or
similar). This area can be in turn organized in several sub-regions, according to the allocator used (in
the next chapter we will see how FreeRTOS provides several allocators to handle dynamic memory
allocation). The heap grows upwards (that is, the base address is the lowest in its region) and it has
a fixed maximum size.
From the compiler point of view, these sections are traditionally named in a different way inside
the application binary. For example, the section containing assembly code is named .text, .rodata
is the one containing const variables and strings, while the section for initialized data is named
.data. These names are also common to other computer architectures, like x86 and MIPS. Others
are specific of “microcontrollers world”. For example, the .isr_vector section is the one designated
to store the vector table in Cortex-M based MCUs³.
Since every STM32 MCU has its own quantity of SRAM and FLASH, and since every program has
a variable number of instructions and variables, the dimension and location in memory of these
sections differ. Before we can see how to instruct the compiler to generate the binary file for the
specific MCU, we have to understand all the steps and tools involved during the generation of object
files.
Figure 2: the compilation process from the source file to the final binary image
• Global variables: these can be in turn divided between un-initialized and initialized variables;
a global variable can also defined as static, that is its visibility is limited to the current source
file.
• Local variables: these can be divided between simple local (also called automatic) variables
and static local variables (that is those variables whose lifetime extends across the entire run
of the program).
• Const data: these can be in turn divided between const data types (e.g. const int c = 5) and
string constants (e.g. "Hello World!").
• Routines: these constitute the program and they will be translated in assembly instructions.
• External resources: these are both global variables (declared as extern) and routines defined
in other source files. It will be a linker job to “link” the references to these symbols defined in
other source files and to merge the sections coming from the corresponding binary files.
Once a source file is compiled, the above program structures are mapped inside specific sections of
the binary file. The Table 1 summarizes the most relevant ones.
Memory layout 421
For every source file (.c) composing our application, the compiler will generate a corresponding
object file (.o), which contains the sections in Table 1⁴. An object file is a type of binary file that
adheres to a well-known standard. There are a lot of standards for binary files around (PE, COFF,
ELF, etc.). The one used by GCC ARM is the ELF32, an open standard really popular, due its usage in
Linux-based Operating Systems, and it is widely supported even by other tools like OpenOCD and
the ST-LINK Utility. File ending with .o⁵ are, however, a special type of object files. These are also
known as relocatable files. This name comes from the fact that all the memory addresses contained
in this type of file are relative to the same file, and starts from the 0x0000 0000 address. This means
that also .text section will start from that address, and we know that this is in contrast with the
starting address of FLASH memory (0x0800 0000) in an STM32 MCU⁶.
Starting from a series of relocatable files (plus some other configuration files that we will see in a
while), the linker will assemble their content to form one common object file that will represent
our firmware to flash on the MCU. In this process, called linking, the linker will relocate all relative
addresses to the actual memory addresses. This type of file is also known as absolute file, because
all addresses are absolute and specific of the given STM32 MCU⁷.
⁴It is important to underline that an object file contains much more sections. The most of them are related to debugging, and
contain relevant information like the original source code, all the symbols contained in the source file (even those that have been
optimized by the compiler), and so on. However, for the purposes of this discussion, it is better to leave them out.
⁵Take in mind that, from the compiler point of view, the file extension is just a convention.
⁶Those of you that want to deepen this matter can take a look to the readelf tool provided in the GCC ARM tool-chain.
⁷Here, again, the story is more complex. First of all, the linker could assemble other pieces from several external statically linked
libraries (those ending with .a). These library, also known as archive files, are nothing more than a merge of several relocatable files.
During the linking process, only those program structures actually used in our application will be merged with the final firmware.
Another important aspect to remark is that this process is essentially the same for every microprocessor platform (like the x86 and
so on), and it is also called static linking. More powerful architectures face an advanced linking process, also known as dynamic
linking, which postpones the linking when the program will be loaded in the OS process. This allows to dramatically reduce the
size of executables, and to update the dependency libraries without recompiling the whole application. In dynamic linking libraries
are called shared objects (or shared libraries, or DLL in Windows), and in modern Operating Systems it is possible to share the
same .text section from these libraries among the processes that use them by using mmap() or similar system call. This allows
reducing as well the SRAM occupation of processes (think to the tons of system libraries that should be “replicated” among the
several processes running on a modern PC).
Memory layout 422
How does the linker know where to place in memory the sections contained in the absolute file?
It is thanks to linker scripts (those files ending with .ld) that we can arrange the content of the
absolute file according to the actual memory layout. We have already seen a linker script in Chapter
4, when we have configured the mem.ld file to specify the right FLASH origin address. CubeMX
also embeds the right linker script for our MCU inside the generated C project (it is contained inside
the sub-folder SW4STM32). However, it is really hard to study the content of those scripts if we have
not mastered several concepts before. So, it is better to start smoothly creating a bare bone STM32
application.
Create now a new file named main.c and place the following code inside it⁸.
⁸This code is designed to work with the Nucleo-F401RE. Refer to the book examples for the other Nucleos.
Memory layout 423
Filename: src/main-ex1.c
44 delay(200000);
45 }
46 }
47
48 void delay(uint32_t count) {
49 while(count--);
50 }
The first 21 lines contain just macros that define the most common STM32 peripheral addresses.
Some are generic and some specific of the given MCU. At line 26 we are defining the vector table.
Being “minimal”, it just contains two things: the address in SRAM of the MSP (remember that this
it the first entry of the vector table and it must be placed at 0x0800 0000 address) and the pointer
to the handler of the Reset exception. What exactly are we doing?
In Chapter 7 we mentioned that when the MCU resets, the NVIC controller generates a Reset
exception after few cycles. This means that its handler is the real entry point of our application, and
the execution of the firmware starts from there. Here we are going to define the main() function
as the handler of Reset exception. The GCC keyword __attribute__((section(".isr_vector")))
says to the compiler to place the vector_table array inside the section named .isr_vector, which
in turn will be contained in the object file. Finally the main() routine contains nothing more then
the classical blinking application.
Before we can compile the firmware, we need to specify a couple of project settings. Go in Project
settings->C/C++ Build->Settings. In the Target Processor settings select the Cortex-M core that
fits your MCU. Then go in the Cross ARM C Linker->General section and check the entry Do not
use standard start files⁹ and uncheck the entry Remove unused sections, as shown in Figure 4.
If you try to compile the application, you will see the following warning in the Eclipse console:
warning: cannot find entry symbol _start; defaulting to 0000000000008000
What does it mean? GCC (or better, LD) is saying to us that it does not know which is the entry
routine of our application (_start() - this entry point name is a convention in GCC) and it does not
know at which absolute memory location to start placing the code. So, how can we address this?
We need a linker script.
⁹Leaving that option unchecked causes that the initialization routines from libc are used. These are usually “less optimized”,
since they need to deal with some advanced feature from libc related to the C++ programming language. So, usually the startup
routines from ST are specific for this platform, allowing to save a lot of FLASH memory and to reduce the boot time.
Memory layout 425
Create a new file named ldscript.ld and place the following content inside it.
Filename: src/ldscript.ld
Let us see the content of this file. Lines [3:7] contain the definition of the FLASH and SRAM
memories. Each region can have several attributes (w=writable, r=readable, x=executable). We also
specify their starting address and length (in the above example they are related to an STM32F401RE
MCU). Line 9 specifies the main() function as the entry point of our application (overriding the
default _start symbol). Lines [12:28] define the content of the .text and .data sections. The .text
section will be composed first by the vector table and then by the program code. With the ALIGN(4)
directive we are saying that the section is word (4 bytes) aligned, while the >FLASH directive specifies
that the .text section will be placed inside the FLASH memory. The KEEP(*(.isr_vector)) says
to LD to keep the vector table inside the final absolute file, otherwise the section could be “stripped”
by other tools that perform optimizations on the final file. Finally, the .data section is also defined
(even if does not contain nothing in this example), and it is placed inside the SRAM memory.
Before we can compile the firmware we need to instruct Eclipse to include the linker script
during compilation. Go in Project settings->C/C++ Build->Settings. In the Cross ARM C Linker-
>General section add the entry ”../ldscript.ld” to the Script files (-T) list. Now you can compile the
firmware and flash your Nucleo. Congratulation: it is almost impossible to have a smaller STM32
application¹⁰.
¹⁰Ok, coding it in assembly will allow you to save additional space, but this book is not for masochists ;-D
¹¹When you run the command, you will se much more sections all related to debug. Here you will not see them because the
debug information has been “stripped” from the file using the arm-none-eabi-strip command.
Memory layout 427
# ~/STM32Toolchain/gcc-arm/bin/arm-none-eabi-objdump -h nucleo-f401RE.elf
nucleo-f401RE.elf: file format elf32-littlearm
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000008 08000000 08000000 00008000 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
1 .text.main 00000040 08000008 08000008 00008008 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
2 .text.delay 00000020 08000048 08000048 00008048 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
3 .comment 00000070 00000000 00000000 0000b1d2 2**0
CONTENTS, READONLY
4 .ARM.attributes 00000033 00000000 00000000 0000b242 2**0
CONTENTS, READONLY
Looking to the above output we see several things regarding the sections contained in the binary file.
Every section has a size, expressed in bytes. A section has also two addresses: the Virtual Memory
Address (VMA) and the Load Memory Address (LMA). In embedded systems like the STM32 MCUs,
the VMA is the address that the section will have when the firmware starts execution. The LMA is
the address at which the section will be loaded. In most cases the two addresses will be the same.
As we will discover in the next section, they differ for the .data region.
Every section has several attributes that say to the loader (in our case, for example, the loader is
GDB in conjunction with OpenOCD, or the ST-LINK Utility tool) what to do with the given section.
Let us see what they mean:
• CONTENTS: this attribute says to the loader that the section in the binary file contains data to
load in the final LMA address. As we will see next, the .bss section does not have content in
a binary file.
• ALLOC: this says to allocate a corresponding space in the LMA memory (which could be both
the FLASH and the SRAM memory). The dimension of the space allocated is given by the
Size column.
• LOAD: this indicates to load the data from the section contained in the binary file to the final
LMA memory.
• READONLY: this indicates that the content of the section is read-only.
• CODE: this indicates that the content of the section is binary code.
Another interesting thing to remark from the previous output is that the binary file contains a
dedicated section for every callable contained in the source code (.text.main for the main() and
.text.delay for delay()). We have to specify to the linker to merge all the .text sections in a whole
common section, modifying the linker script in this way:
Memory layout 428
.text : ALIGN(4)
{
*(.isr_vector) /* Vector table */
*(.text) /* Program code */
*(.text*) /* Merge all .text.* sections inside the .text section */
KEEP(*(.isr_vector))
} >FLASH
As we will see later, the ability to have separated sections for every callable, allow us to selectively
place some functions inside different memories (for example, the fast CCM memory in some STM32
MCUs).
Finally, the File off column specifies the offset of the section inside the binary file, while the Algn
column indicates the data align in memory, which is 4-bytes.
This time we use a global initialized variable, dataVar, to start the blinking loop. The variable has
been declared volatile just to avoid that the compiler optimizes it (however, when compiling this
example, disable all optimizations [-ON] in the project settings). Looking at the code, we can reach
to the conclusion that it does the same thing of the previous example. However, if you try to flash
your Nucleo, you will see that the LD2 LED does not blink. Why not?
To understand what’s happening, we have to review some things from the C programming language.
Consider the following code fragment:
Memory layout 429
...
uint32_t globalVar = 0x3f;
void foo() {
volatile uint32_t localVar = 0x4f;
while(localVar--);
}
Here we have two variables: one defined at global scope, one locally. The localVar variable is
initialized to the value 0x4f. When this happen exactly? The initialization is executed when the
foo() routine is invoked, as shown by the following assembly code:
1 void foo() {
2 0: b480 push {r7} ;Save the current FP
3 2: b083 sub sp, #12 ;Allocate 12 bytes on the stack
4 4: af00 add r7, sp, #0 ;Save the new FP
5 volatile uint32_t localVar = 0x4f;
6 6: 234f movs r3, #79 ;Place 0x4f in r3
7 8: 607b str r3, [r7, #4] ;Store r3 (that is 0x4f) in the 4-th byte
8
9 while(localVar--);
10 a: bf00 nop
11 c: 687b ldr r3, [r7, #4]
12 e: 1e5a subs r2, r3, #1
13 10: 607a str r2, [r7, #4]
14 12: 2b00 cmp r3, #0
15 14: d1fa bne.n c <foo+0xc>
16 }
Lines [1:3] are the function prolog. Each routine is responsible of allocating its own stack frame,
saving the CPU internal registers. This is also called calling convention, and the way this is performed
is defined by a specific standard (in case of GCC ARM, it is defined by the Embedded-Application
Binary Interface (EABI) standard). Here the instruction at line 1 saves in the current stack frame the
content of register R7, which is the Frame Pointer (FP) when the core works in Thumb mode¹². The
instruction at line 2 decrements the Stack Pointer (SP) by 12 bytes, that is we are going to allocate 12
bytes (or 3 words) on the stack. Next the FP is updated with the content of the SP (since this function
does not use parameters, SP and FP coincide).
The instructions we are interested in are those at lines [5:6]. Here we are storing the value 0x4f
(which is 79 in base 10) inside the general-purpose register R3 and then moving its content inside
the second word in the stack, which corresponds to the localVar variable ¹³.
¹²The Prame Pointer is a special pointer to separate the function parameters by the local variables. More details about this register
can be found here(http://bit.ly/1ngLrop).
¹³It is important to clarify that the above assembly code is generated with all optimizations disabled.
Memory layout 430
The remaining part of the assembly code contains the while(localVar--) and the function epilog,
which is responsible of restoring the state before going back to the caller function.
So, the calling convention defines that local variables are automatically initialized upon function
call. What about global variables? Since they are not involved in a calling process, they need to be
initialized by some specific code when the MCU resets (remember that the SRAM is volatile, and
its content is undefined after a reset). This means that we have to provide a specific initialization
function.
The following routine can be used to simply copy the content of the FLASH region containing the
initialization values to the SRAM region dedicated to global initialized variables.
Before we can use this routine, we need to define few other things. First of all, we need to instruct
LD to store the initialization values for each variable contained in the .data section inside a specific
region of the FLASH memory, which will correspond to the LMA memory address. Second, we need
a way to pass to the __initialize_data() function the start and the end of .data section in SRAM
(that we are going call _sdata and _edata respectively) and the starting location (that we are going
to call _sidata) where initialization values are stored in the FLASH memory (it is important to
stress that when we initialize a variable to a given value we need to store that value somewhere in
the FLASH, and use it to initialize the SRAM location corresponding to the variable). The Figure 3
schematizes this process.
Memory layout 431
Figure 3: the copy process of initialized data from the FLASH to the SRAM memory
Once again, all these operations can be performed using the linker script, which we can modify in
the following way:
The instruction at line 26 defines the variable _sidata, which will contain the LMA address of
the .data section (that is, the starting address of FLASH memory containing initialization values).
Instructions at line [30:31] use a special operator: the “.” operator. It is named location counter
and it is a counter that keeps track of the memory location reached during the generation of
each section. The location counter independently counts location memory of every memory region
(SRAM, FLASH and so on). For example, in the above code, it starts from 0x2000 0000 since the
.data section is the first one loaded in SRAM. When the two instructions *(.data) and *(.data*)
Memory layout 432
are performed, the location counter is incremented by the size of all .data sections contained in
the file. With the instruction . = ALIGN(4); we are just forcing the location counter to be word
aligned. So, to recap, _sdata will contain 0x2000 0000 and _edata will be equal to the size of .data
section (in our example, .data section contains only one variable - dataVar- and hence its size is
0x2000 0004). Finally, the directive >SRAM AT>FLASH says to the link editor that the VMA address of
the .data section is bound to the SRAM address space (so 0x2000 0000), but the LMA address (that
is, where the initialization values are stored) is mapped inside the FLASH memory space.
Thanks to this new memory layout configuration, we can now arrange the main.c file in the
following way:
Filename: src/main-ex2.c
55
56 for(;;);
57 }
58
59 int main() {
60
61 /* enable clock on GPIOA and GPIOC peripherals */
62 *RCC_APB1ENR = 0x1 | 0x4;
63 *GPIOA_MODER |= 0x400; // Sets MODER[11:10] = 0x1
64
65 while(dataVar == 0x3f) {
66 *GPIOA_ODR = 0x20;
67 delay(200000);
68 *GPIOA_ODR = 0x0;
69 delay(200000);
70 }
71 }
The entry point is now the _start() routine, which is used as handler for the Reset exception. When
the MCU resets, it is automatically called, and in turn it calls the __initialize_data() function,
passing the parameters _sidata, _sdata and _edata computer by the Linker during the linking
process. _start() then calls the main() routine, which now works as expected.
Using the objdump tool we can check how the sections are organized in the ELF file.
# ~/STM32Toolchain/gcc-arm/bin/arm-none-eabi-objdump -h nucleo-f401RE.elf
nucleo-f401RE.elf: file format elf32-littlearm
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 000000c0 08000000 08000000 00008000 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .data 00000004 20000000 080000c0 00010000 2**2
CONTENTS, ALLOC, LOAD, DATA
2 .comment 00000070 00000000 00000000 00010004 2**0
CONTENTS, READONLY
3 .ARM.attributes 00000033 00000000 00000000 00010074 2**0
CONTENTS, READONLY
As you can see, the tool confirms that the .data section has a size equal to 4 bytes, a VMA address
equal to 0x2000 0000 and an LMA address equal to 0x0800 00c0 , which corresponds to the end of
.text section.
The same applies to the .bss section, which is reserved to uninitialized variables. According to the
ANSI C standard, the content of this section must be initialized to 0. However, the .bss section does
Memory layout 434
not have a corresponding FLASH region containing all zeros, but it is again up to the startup code to
initialize this region. The following linker script fragment shows the definition of the .bss section¹⁴:
while the following routine, always invoked from the _start() one, is used to zero the .bss region
in SRAM:
Changing the main() routine in the following way allow us to check that all works correctly:
Filename: src/main-ex3.c
76 volatile uint32_t dataVar = 0x3f;
77 volatile uint32_t bssVar;
78
79 int main() {
80
81 /* enable clock on GPIOA and GPIOC peripherals */
82 *RCC_APB1ENR = 0x1 | 0x4;
83 *GPIOA_MODER |= 0x400; // Sets MODER[11:10] = 0x1
84
85 while(bssVar == 0) {
86 *GPIOA_ODR = 0x20;
87 delay(200000);
88 *GPIOA_ODR = 0x0;
89 delay(200000);
90 }
91 }
¹⁴Please, take note that the order of sections inside a linker scripts reflects their order in memory. If we have two sections, named
A and B, both loaded in SRAM, if section A is defined before than B, then it will be placed in SRAM before then B.
Memory layout 435
Once again, we can see how the .bss section is arranged by invoking the objdump tool on the final
binary file
# ~/STM32Toolchain/gcc-arm/bin/arm-none-eabi-objdump -h nucleo-f401RE.elf
nucleo-f401RE.elf: file format elf32-littlearm
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 000000e8 08000000 08000000 00008000 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .data 00000004 20000000 080000e8 00010000 2**2
CONTENTS, ALLOC, LOAD, DATA
2 .bss 00000004 20000004 20000004 00010004 2**2
ALLOC
3 .comment 00000070 00000000 00000000 00010004 2**0
CONTENTS, READONLY
4 .ARM.attributes 00000033 00000000 00000000 00010074 2**0
CONTENTS, READONLY
The above output shows that the section has a size equal to four bytes, but it does not occupy room
in the final binary file since the section has only the ALLOC attribute.
In the previous linker script we have used the special directive *(COMMON) during the definition of
the .bss section. This simply says to the LD to merge the content of the common section inside the
.bss section. But what is exactly the common section? To better understand its role, we need to
revise some little known features of the C language. Suppose that we have two source files, and both
of them define two global initialized variables with the same name:
File A.c
File B.c
When we try to generate the final application linking the two relocatable files (.o), we obtain the
following error:
B.o:(.data+0x0): multiple definition of 'globalVar'
A.o:(.data+0x0): first defined here
collect2: error: ld returned 1 exit status
Memory layout 436
The reason why this happens is evident: we are defining the same global variable in two different
source files. But what if we declare the two symbols as un-initialized global variables?
File A.c
int globalVar[3];
...
File B.c
int globalVar[6];
...
If you try to generate the final binary file you will discover that the linker does not generate errors.
Why do the linker complain about both symbol definitions? Because the C Standard says nothing
to prohibit it. But if the language essentially allows to define multiple times a global un-initialized
variable, how much memory will be allocated? (that is, globalVar will be an array containing 3
or 6 elements?). This aspect is leaved to compiler implementation. Recent GCC versions place all
un-initialized global variables (not declared as static) inside a whole “common” section, and the
amount of memory for a given symbol will assume the value of the greatest one (in our case, the
array will have room for six elements of type int - that is, 12 bytes).
So, to recap, static global un-initialized variables are local to a given relocatable, and hence go in its
.bss section; global un-initialized variables are global to the whole application, and go inside the
common section. The previous linker script place both types of global un-initialized variables inside
the .bss section, that will be zeroed at run-time by the __initialize_bss() routine.
This behavior can be overridden specifying the option -fno-common to the GCC command. GCC
will allocate global un-initialized variables inside the .data section, initializing them to zero. This
means that if we are declaring an un-initialized global array of 1000 elements, the .data section will
contain one thousand times the value 0: this will waste a lot of FLASH memory. So, for embedded
applications is better to avoid using that command line option.
Filename: src/main-ex4.c
we have that both the string msg and the array vals are placed inside the FLASH memory, as shown
by the objdump tool:
# ~/STM32Toolchain/gcc-arm/bin/arm-none-eabi-objdump -h nucleo-f401RE.elf
nucleo-f401RE.elf: file format elf32-littlearm
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000590 08000000 08000000 00008000 2**3
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .rodata 00000024 08000590 08000590 00008590 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
2 .comment 00000070 00000000 00000000 000085b4 2**0
CONTENTS, READONLY
3 .ARM.attributes 00000033 00000000 00000000 00008624 2**0
CONTENTS, READONLY
Memory layout 438
In the first case we are declaring a pointer to a const array, which implies that a word will
be allocated inside the .data section to store the location in FLASH memory of the string
"Hello World!". In the second case, instead, we are correctly defining an array of chars.
Remember that in C arrays are not pointers.
Filename: src/main-ex5.c
¹⁵There are other better alternatives, however. We will explore them in the next chapter.
Memory layout 439
The above code is really simple. heapMsg is a pointer to a memory region dynamically allocated by
the malloc() function. We simply copy the content of the msg string and check if both strings are
equal. If so, the LD2 LED starts blinking.
If you try to compile the above code, you will see the following linking error:
What’s happening? The malloc() function relies on the _sbrk() routine, which is a feature OS and
architecture dependent. The newlib leaves to the user the responsibility of providing this function.
The _sbrk() is a routine that accepts the amount of bytes to allocate inside the heap memory and
returns the pointer to the start of this contiguous “chunk” of memory. The algorithm underlying the
_sbrk() function is fairly simple:
1. First, it needs to check that there is sufficient space to allocate the desired amount of memory.
To accomplish this task, we need a way to provide to the _sbrk() routine the maximum heap
size.
2. If the heap has sufficient room to allocate the needed memory, it increments the current heap
size and returns the pointer to the beginning of the new memory block.
3. If the heap does not have sufficient room (heap overflow), then the _sbrk() fails, and it is up
to the user to provide an error feedback.
The following code shows a possible implementation for the _sbrk() routine. Let us analyze its
code.
Memory layout 440
Filename: src/main-ex5.c
The _end_static and _Heap_Limit are provided by the linker, and they correspond to the end of .bss
section and the highest memory address for the heap region (that is, _Heap_Limit - _end_static
is the size of the heap). We will see in a while how they are defined inside the linker script. heap_end
is a statically allocated variable, and it is used to keep track of the first free memory location inside
the heap. Since it is a static un-initialized local variable, according to Table 1 it is placed inside the
.bss section, and hence it is zeroed at run-time. So, the first time _sbrk() is called it is equal to
zero, and hence it is initialized to the value of _end_static variable. The if at line 97 ensures that
there is sufficient room in the heap memory. If not, the ARM assembly BKPT instruction is called,
causing that the debugger stops the execution¹⁶. The tricky part is represented by the instructions
at line [93:96]. The preprocessor macro checks if the ARM architecture is the ARMv6-M, that is the
architectures of Cortex-M0/0+ based processors. Those processors, in fact, do not allow unaligned
memory access. The instruction at line 95 ensures that the allocated memory is always a multiple
of 4 bytes.
We have left to analyze the linker script. The part we are interested in starts at line 51.
¹⁶Here, we may use a different way to signal the heap overflow. For example, a global error() function could be called, and
take the appropriate actions there. However, this is often a programming style, so feel free to arrange that code at your needs.
Memory layout 441
Filename: src/ldscript5.ld
51 _end_static = _ebss;
52 _Heap_Size = 0x190;
53 _Heap_Limit = _end_static + _Heap_Size;
_end_static is nothing more than an alias to the _ebss memory location, that is the end of .bss
section. _Heap_Size is fixed by us, and it establishes the dimension of the heap (400 bytes). Finally,
_Heap_Limit contains nothing more than the final address of the heap memory.
The way symbols are handled in linker scripts is different from that of C. In C a symbol is
a triple made of the symbol, its memory location and the value. Symbols in liker scripts are
tuple, made of the symbol and its memory location. So symbols are containers for memory
locations, as they would be pointers, without no value. So the following instruction:
_Heap_Size, again contains the heap size as an address (that is 0x00000190), but it is not a
valid STM32 address. This fact can be also analyzed by inspecting the symbol table of the
final binary file, using the objdump tool with the -t command line parameter.
_Min_Stack_Size = 0x200;
With the above code, we are defining a “dummy” section inside the final binary file. Using the
location counter operator (“.”) we increment the size of this section so that it has a dimension equal
to the maximum heap size and the “estimated” minimum stack size. If the sum of .data, .bss, stack
and heap regions is greater than the SRAM size, the linker will emit an error, as shown below:
It is important to underline that this is a static checking and it is not related to the activities of
the firmware at run-time. Different strategies are needed to detect a stack overflow, and it is really
hard to have a complete solution for embedded system. We will analyze this topic in a subsequent
chapter.
1 class MyClass {
2 int i;
3
4 public:
5 MyClass() {
6 i = 100;
7 }
8
9 void increment() {
10 i++;
11 }
12 };
13
14 MyClass instance;
15
16 int main() {
17 instance.increment();
18 for (;;);
19 }
Let us focus our attention on line 14. Here we are defining an instance of the class MyClass.
The instance is defined as global variable. But declaring an instance of a class assumes that the
constructor of that class is automatically called. So, to be clear, when we call the increment()
method at line 17, the instance attribute i will be equal to 101. But who takes care of calling the
instance constructor? When an instance is created locally (that is, from a global function or another
method), it is up to that callable to perform class initialization. But when this happens at global
scope, it is up to other initializations routines. Usually the compiler automatically generates an array
of function pointers that will contain initializations routines for all globally and statically allocated
objects. These arrays are usually called __init_array and __fini_array (which contains the call to
object destructors).
Both the linker scripts and startup routines provided by the GNU ARM plugin and ST in its HAL
contain all necessary code to handle these and other initialization activities. Explaining them is
outside the scope of this book (this also involves analyzing in depth some libc activities performed
at startup). However, now that we know how to master the content of a linker script, it should not
be too much difficult to deal with them.
Memory layout 444
Figure 5: the direct connection between the Cortex-M core and the CCM SRAM
¹⁷The figure has been arranged from the one contained in the AN4296 from ST(http://bit.ly/1QSctkT).
Memory layout 445
In all STM32 MCUs with this additional memory, the CCM SRAM is mapped starting from the
0x1000 0000 address²⁰. Once again, to use it we need to define this memory region inside the linker
script, in the following way²¹:
Obviously, the LENGTH attribute has to reflect the size of the CCM memory for the specific STM32
MCU. Once the region is defined, we have to create a specific section inside the linker script:
.ccm : ALIGN(4) {
*(.ccm .ccm*)
} >CCM
To relocate a specific routine inside the CCM memory we can use the GCC keyword __attribute__,
as seen before for the .isr_vector section:
¹⁸Some STM32 MCUs provide two SRAM memories, with one of these allowing 0-wait access. Always consult the datasheet for
your MCU.
¹⁹Keep in mind that to reach a full parallel access to the SRAM, no other masters (e.g. the DMA) must contend the access to the
SRAM through the BusMatrix.
²⁰The STM32F7 provides a dedicated Tightly Coupled Memory (TCM) interface, with two separated bus that interconnect the
Cortex-M7 core to FLASH and SRAM. The instruction ITCM-RAM is a 16 Kb read-only region accessible only by the core and
mapped from the 0x0000 0000 address. The data DTCM-RAM is a 64Kb region mapped at address 0x2000 0000 and accessible by
all AHB masters from AHB bus Matrix but through a specific AHB slave bus of the CPU. Refer to the STM32F7 Reference Manual
for more information.
²¹The memory configuration refer to a Nucleo-F334 that, together with the Nucleo-F303, provides the CCM memory.
Memory layout 446
If, instead, we want to store data inside the CCM memory, than we also need to initialize it as
we have seen for .bss and .data regions in regular SRAM memory. In this case, we need a more
articulated liker script:
*(ccm.bss ccm.bss*)
. = ALIGN(4);
_eccmb = .;
} >CCM
Here we are defining two sections: .ccm.data, which will be used to store global initialized data in
CCM, and .ccm.bss used to store global un-initialized data. As done for the regular SRAM, will need
to call the __initialize_data() and __initialize_bss() routines from the _start() routine:
...
__initialize_data(&_siccm, &_sccmd, &_eccmd);
__initialize_bss(&_sccmb, &_eccmb);
...
Then, to place data inside the CCM, we have to instruct the compiler using the attribute keyword:
Memory layout 447
• define the vector table to place in the CCM RAM using the __attribute__((section("<section>"))
keyword;
• define the exception handlers for the interested exceptions and ISRs and place them in the
CCM RAM using the __attribute__((section("<section>")) keyword;
• define a minimal vector table, composed by the MSP pointer and the address of the Reset
exception handler, to place in the FLASH memory starting from 0x0800 0000 address;
• relocate the vector table from the Reset exception.
Let us start defining the vector table to place in CCM RAM. Here we are defining a file named
ccm_vector.c with the following content:
Filename: src/ccm_vector.c
1 #include <stm32f3xx_hal.h>
2
3 extern const uint32_t _estack;
4
5 void SysTick_Handler(void);
6
7 uint32_t *ccm_vector_table[] __attribute__((section(".isr_vector_ccm"))) = {
8 (uint32_t *)&_estack, // initial stack pointer
9 (uint32_t *) 0, // Reset_Handler not relocatable
10 (uint32_t *) 0,
11 (uint32_t *) 0,
12 (uint32_t *) 0,
13 (uint32_t *) 0,
14 (uint32_t *) 0,
15 (uint32_t *) 0,
16 (uint32_t *) 0,
17 (uint32_t *) 0,
Memory layout 448
18 (uint32_t *) 0,
19 (uint32_t *) 0,
20 (uint32_t *) 0,
21 (uint32_t *) 0,
22 (uint32_t *) 0,
23 (uint32_t *) SysTick_Handler
24 };
25
26 void SysTick_Handler(void) {
27 HAL_IncTick();
28 }
The file contains just the vector table (which is placed inside the .isr_vector_ccm section) and the
handler for the SysTick exception. Next, we need to arrange the linker script in the following way:
Filename: src/ldscript6.ld
75 .ccm : ALIGN(4)
76 {
77 _sccm = .;
78 *(.isr_vector_ccm)
79 *(.ccm)
80 KEEP(*(.isr_vector_ccm))
81
82 *src/ch10/ccm_vector.o(.text* .bss* .data*)
83 . = ALIGN(4);
84 _eccm = .;
85 } >CCM
The most relevant instruction is that at line 82. Here we are saying to the linker to place in the
.ccm section the content from sections .text, .bss and .data from the file src/ch10/ccm_vector.o,
which is the relocatable file obtained from the source C file²². In this way all routines, global and
static variables contained in this file will automatically loaded in CCM RAM. Finally, we need to
relocate the vector table. This can be easily done assigning the position in CCM memory of the
ccm_vector_table array to the register VTOR in the System Control Block (SCB):
²²In this way, we are keeping aside other un-needed sections (the most of them related to debug) that would saturate the CCM
memory.
Memory layout 449
Filename: src/main-ex6.c
Instruction at line 67 relocates the vector table at the beginning of CCM RAM (as specified by the
_sccm variable set by the linker), while the next instruction enables the write protection on the
whole CCM memory (this prevents unwanted modification of CCM RAM, which would cause the
corruption of the program code).
Memory layout 450
The CCM RAM is subdivided in pages of 1Kb. Every bits in the RCR register of the System
Configuration Controller (SYSCFG) is used to set the write protection on individual page
basis (bit 1 sets protection of first page, bit 2 sets protection on second page and so on).
Here, we are write-protecting the whole CCM memory of an STM32F334 MCU, which has
a CCM memory made of four 1Kb pages.
It is important to remark that, if we disable writing of the whole CCM memory, we cannot
place global or statically allocated variables in it, otherwise a fault will occur. On the other
side, placing both code and data in CCM memory makes us lose the benefits obtained by
the CCM memory, due to the simultaneous access to the same memory both by the D-Code
and I-Code bus (looking at Figure 5 you can se that the CCM memory is connected to just
one master port of the BusMatrix - the port M3 -; so the access from D-Bus and I-Bus is
disciplined by the BusMatrix).
The vector table relocation is not limited to the CCM memory. As we will see later, this
technique is also used when the MCU boots from different sources than the internal FLASH.
In this case, the vector table is usually placed in SRAM and it has to be relocated.
14. Running FreeRTOS
Taking full-advantage of the computing power offered by 32-bit microcontrollers is not easy,
especially for powerful STM32F2/F4/F7 series. Unless our device needs to perform really simple
tasks, the correct allocation of computing resources requires special care during the firmware
development. Moreover, the use of improper synchronization structures and poor-designed interrupt
service routines could lead to the loss of important asynchronous events and to overall unpredictable
behaviour of our device.
Real Time Operating Systems (RTOS) take advantage of the exceptions system provided by Cortex-
M cores to bring to programmers the notion of thread¹, an independent execution stream which
“contends” the MCU with other threads involved in concurrent activities. Moreover, they offer
advanced synchronization primitives, which allow both to coordinate the simultaneous access to
physical resources from different threads and to avoid wasting CPU cycles while waiting for slower
and asynchronous events.
The market segment of RTOSes is quite crowded nowadays, with several commercial as well
as free and open source solutions available to programmers. Being the Cortex-M a standardized
architectures among a lot of silicon manufacturers, STM32 developers can choose from a really
wide portfolio of RTOS systems, depending their need of complexity handling and dedicated (and
maybe commercial) support. ST Microelectronics has adopted one popular free and Open Source OS
as its official tool for the CubeHAL framework: FreeRTOS.
According some statistics, FreeRTOS is the most widespread RTOS on the market today. Thanks to
its dual license that allows the selling of commercial products without any restriction², FreeRTOS
has become a sort of standard in the electronics industry, and it is also widely adopted by the
¹Some RTOSes, like FreeRTOS, use the term task to indicate an independent execution stream contending the CPU with
other tasks. However, this author considers this terminology not appropriate. Traditionally, in general purpose Operating Systems,
multitasking is a method by which multiple tasks, also known as processes, share common hardware resources (mainly the CPU
and the RAM). With a multitasking OS, such as Linux, you can simultaneously run multiple applications. Multitasking refers to the
ability of the OS to quickly switch between each computing task to give the impression that different applications are executing
multiple actions simultaneously. A process has one relevant characteristic: its memory space is physically insulated from other
processes, thanks to features offered by the Memory Management Unit (MMU) inside the CPU. Multithreading extends the idea
of multitasking into single processes, so you can subdivide specific operations within a single application into individual threads.
Each thread could run in parallel. The important trait of threads it that they share the same memory address space. True embedded
architectures, like the STM32 are, do not provide a MMU (only a features-limited Memory Protection Unit - MPU - is available in
some of them). The absence of this unit does not allow to have separated address spaces, since it is impossible to alias the physical
addresses to logical ones. This means that they can carry out just one single application, which can be eventually split in several
threads sharing the same memory address space. For this reason, we will talk about threads in this book, even if sometimes we will
use the word “task” when talking about some FreeRTOS API or to indicate an activity of the firmware in general terms.
²FreeRTOS is licensed under a modified GPL 2.0 license, which allows companies to sell their devices based on FreeRTOS
without any restriction, unless they do not modify the FreeRTOS code and do not sell/distribute the derived firmware. If this
the case, they also need to distribute the FreeRTOS source code, while leaving their source code closed if they want. For more
information about FreeRTOS licensing model, see this page on the official web site(http://www.freertos.org/a00114.html).
451
Running FreeRTOS 452
Open Source community. Although it does not represent the only solution available for the STM32
platform, in this book we will focus our attention exclusively on this OS, since it is what ST officially
supports and integrates in the CubeHAL.
This paragraph gives a quick introduction to the main concepts underlying real-time
Operating Systems. Experienced users can safely skip it.
Except for the ISRs and exception handlers, all the examples built so far are designed so that our
applications are composed by just one execution stream. Typically, starting from the main() routine,
a large and infinite while loop carries out firmware tasks:
...
while(1) {
doOperation1();
doOperation2();
...
doOperationN();
}
The time spent by each doOperationX() is broadly estimated by the developer, who has the
responsibility to avoid that one of those functions sticks for too much time, preventing other parts
of the firmware from running correctly. Moreover, the calling order of the functions also schedules
their execution, defining the sequence of operation performed by the firmware. This, indeed, is a
form of cooperative scheduling³, where each function concurs to the execution of the next activity
by voluntarily releasing the control periodically.
In this early form of multiprogramming, there is no guarantee that a function cannot monopolize
the CPU. The application designer carefully needs to ensure that every function should be carried
out in the shortest possible time. In this execution model, an “innocent” busy loop can have dramatic
effects. Let us consider the following pseudo-code:
³Experienced user will point out that it is not correct to talk about cooperative scheduling in this context for two fundamental
reasons: the execution order of tasks is fixed (the “schedule” is computed by the programmer during the firmware development)
and each routine isn’t able to save its execution context before leaving, that is the stack frame of the doOperationX() routine is
destroyed when it returns. As we will see in a while, co-routine are a generalization of subroutines in non-preemptive multitasking
systems.
Running FreeRTOS 453
void blinkTask() {
while(HAL_GetTick() - timeKeep < 500);
HAL_GPIO_TooglePin(GPIOA, GPIO_PIN_5);
timeKeep = HAL_GetTick();
}
uint8_t readUART2Task() {
if(HAL_UART_Receive(&huart2, &uartData, 20, 1) == HAL_TIMEOUT)
return 0;
return 1;
}
while(1) {
blinkTask();
readUART2Task();
}
This code is quite common among several unexperienced embedded developers and, in some
circumstances, it is also correct. However, that code has a subtle wired behaviour. The blinkTask()
is designed so that it busy-spins for 500ms before it releases the control. If data arrives on the UART
interface during this period, the readUART2Task() will certainly loose some data⁴. A better way to
write down the blinkTask() is the following one:
void blinkTask() {
if(HAL_GetTick() - timeKeep > 500) {
HAL_GPIO_TooglePin(GPIOA, GPIO_PIN_5);
timeKeep = HAL_GetTick();
}
}
A simple modification to that routine ensures that we will not loose data coming from the UART in
the majority of situations, unless the UART transfers data really quickly.
As you can see, with cooperative scheduling programmers have a great responsibility in ensuring
their code will not affect the overall activities of the firmware, introducing performance bottlenecks.
The voluntary releasing of the execution flow is not the only limit of the code seen so far. Let us have
a closer look to the blinkTask() routine. Here we need a global variable⁵, timeKeep, to keep track
of the global tick counter incremented by the CubeHAL every 1ms and to perform a comparison to
⁴With high baudrates, polling the UART is certainly not correct at all, but here we are interested to the point.
⁵A local and static variable would have the same effect, however without changing the concept.
Running FreeRTOS 454
check if 500ms are elapsed. This is required because every time a routine exits, its execution context
(that is, the stack frame) is popped from the main stack and it is destroyed. Unless we do not use
some nasty tricks offered by the language⁶, there is no way to exit from a function without losing
its context.
Continuation routines, abbreviated as co-routines or simply coroutines, are program structures that
generalize the concept of subroutines for non-preemptive multitasking, by allowing multiple entry
points for suspending and resuming execution at certain locations. Co-routines require special
support from the run-time of the language, and they are traditionally provided from more high-
level languages like Scheme, but also more widespread languages like Python and Perl provide a
form of co-routines. A co-routine is said not to return but to yield the execution flow. For example,
the blinkTask() could be rewritten using co-routines in this way:
1 void blinkTask() {
2 uint32_t timeKeep = HAL_GetTick();
3 while (1) {
4 if(HAL_GetTick() - timeKeep > 500) {
5 HAL_GPIO_TooglePin(GPIOA, GPIO_PIN_5);
6 timeKeep = HAL_GetTick();
7 }
8 yield; /* Pass the control to another routine, e.g. the scheduler */
9 }
10 }
Co-routines work so that, the next time the control passes to blinkTask(), the execution will resume
from line 3. We will not go into details of how co-routines are implemented in languages that support
them. However, this usually involves the creation of separated stacks for each co-routine, which
could call other co-routines that in turn may pass the control to other continuations.
A preemptive multitasking Operating System is a coordinator of physical resources that allows the
execution of multiple computing tasks⁷, each one with its independent stack, by assigning a limited
quantum time (also called slice time) to each task. Every task has a well-defined temporal window,
usually large about 1ms in embedded systems, during which it performs its activities before it is
preempted. The RTOS kernel decides the execution order of the tasks ready to be executed using a
scheduling policy: a scheduler is an algorithm that characterizes the way the OS plans the execution
of tasks.
A task is “moved” in/out from the CPU by a context switch operation. A context switch is performed
by the OS, thanks to hardware features we will explore next, which makes a “snapshot” of the current
task state by saving the internal CPU registers (PC, MSP, R0..R15, etc.) before switching to another
task, which will be able to “re-use” again the CPU for the same quantum time (or even less if “it
wants”).
⁶Which involves the use of the C setjmp() and longjmp() functions.
⁷In this paragraph, and only in this one, the term task and thread will be used indiscriminately.
Running FreeRTOS 455
Figure 1: how an OS schedules the tasks execution by assigning them a fixed quantum time
Figure 1 shows how the task preemption works for the case of the example seen before. Here
we are supposing that we have just two tasks: one for the blinkTask() routine and one for the
readUART2Task() one. The OS start scheduling the blinkTask() task, which can “use” the CPU for
1000μs (that is, 1ms)⁸. After the time is gone, the OS schedules the execution of the readUART2Task()
which can now occupy the CPU for the same quantum time. After that period, the CPU will
reschedule the first task, and so on.
Figure 2 shows the way SRAM memory is typically organized by an OS. Each task is represented
by a memory segment containing the Thread Control Block (TCB), which is nothing more than a
descriptor containing all relevant information related to the task execution just “a moment”⁹ before
it is preempted (the stack pointer, the program counter, CPU registers and other few things), plus
the stack itself, that is activation records of those routines currently invoked on the thread stack.
By jumping between several threads, thanks to context switch operations, the OS guarantees the
same execution time to all threads, giving the impression that firmware activities are performed in
parallel.
⁸Those values of quantum time are indicative, since the exact duration of a quantum is affected by a lot of things. Not last, the
overhead connected with a context switch, which is non-negligible. Moreover, here we are assuming that tasks have all the same
priority, which usually is not true especially in embedded systems.
⁹This is not true at all, since before a task is preempted several other things take place. However, explaining into details these
aspects is outside the scope of this book. Refer to Joseph Yiu books if interested in deepen how context switch is performed on
Cortex-M based microcontrollers.
Running FreeRTOS 456
A Real Time Operating Systems (RTOS) is an OS able to offer the notion of multitasking (or better,
multithreading as seen in note 1) while ensuring response within specified time constraints, often
referred to as deadlines. Real-time responses are often understood to be in the order of milliseconds,
and sometimes microseconds. A system not specified as operating in real-time cannot usually
guarantee a response within any timeframe, although actual or expected response times may be
given. General-purpose Operating Systems (like Linux, Windows and MacOS) cannot be real-time
Operating Systems (even if exist some their derivative releases - especially of Linux - engineered
for real-time applications) for two simply reasons: pagination and swapping. The former allows to
segment the task memory in small chunks named pages, which can be scattered in the RAM and
aliased from the MMU giving the illusion that the process can manage the whole 4GB address space
(even if the computer do not provide that amount of SRAM). The latter allows to swap-in/swap-
out those “unused” pages on an external (and slower) memorization unit (typically an Hard Drive).
Those two features are intrinsically non-deterministic, and prevent the OS to response to requests
in short and countable time.
An RTOS allows to use the first version of the blinkTask() function minimizing the impact of the
Running FreeRTOS 457
busy loop on the UART transfer process¹⁰. However, as we will see later in this chapter, typically an
RTOS also gives us tools to completely avoid busy loops: using software timers it is possible to ask to
the OS to re-schedule the blinkTask() only when the specified amount of time is elapsed. Moreover,
the RTOS also provides ways to voluntary release the control when we know that it is completely
useless to wait for an operation that will be performed by another task (or if we are waiting for an
asynchronous event).
We have said just one moment before that an RTOS gives a way to voluntary release the control to
other threads. But what if one task does not want to release it? For example, the first release of the
blinkTask() routine could monopolize the CPU up to more than 500ms in the worst case that, given
the typical slice time of 1ms, is a really huge time. So, who has the ability to perform the context
switch? It is impossible to “jump” to other program instructions (a context switch, is a sort of goto
to another program instruction) without loosing one relevant information: the value of the program
counter itself.
The context switch needs a substantial help from the hardware. In Chapter 7 we have seen that
interrupts and exceptions are a source of multiprogramming. The way they are handled by the
Cortex-M core allows to jump to the exception handler without loosing the current execution
context. By taking advantage of a dedicated hardware timer, usually the SysTick one, the RTOS
uses the periodic interrupt generated on the overflow event to perform the context switch. This
timer is configured to overflow (or underflow in case of the SysTick, which is a downcounter timer)
every 1ms. The RTOS then captures the exception and saves the current execution context in the
TCB, passing the control to the next task in the scheduling list by restoring its execution context and
exiting from the timer interrupt. The preempted threads will not know anything that this happened¹¹.
¹⁰This does not mean that using an RTOS we can write bad code without impacting on the overall performances. This only
means that, a true preemptive scheduler can guarantee a higher multiprogramming degree, ensuring that all threads have the same
CPU time-slice. Unless we mess with task priorities, as we will see later.
¹¹However, this could not correspond to what an RTOS actually does. The story here is more complex, and it is related to the
specific hardware architectures and to the way interrupts are prioritized. During the execution of an interrupt handler, another
interrupt with a higher priority could suspend the execution of the current interrupt, as seen in Chapter 7. But when this happens,
the CPU cannot switch to the thread mode (which is the regular mode when the normal code is executed) by performing the task
switch without prior exiting from all interrupts (which run in the handler mode - a special mode provided by Cortex-M core during
the exception handling). This means that if the SysTick IRQ takes place while another IRQ is active, the SysTick exception handler
cannot perform the context switch (that is to pass the control to another task running in thread mode), because another code running
in handler mode has been preempted and needs to complete its activities. Usually this is solved by deferring the effective context
switch operation to the PendSV Handler, which is an exception configured to run at the lowest priority. However, this is just one
way to implement the context switch. If interested in deepen this topic, you have to consult the source code or the documentation
of your RTOS.
Running FreeRTOS 458
In light of the considerations that we have shown up to this point, the Figure 1 needs to be updated
with the one shown in Figure 3 where the time spent by the OS while performing context switch
is also considered. Context switches are usually computationally intensive, and much of the design
of operating systems is to optimize the use of context switches. Special care must be placed when
developers decide to change the underflow frequency of the SysTick timer (often increasing it),
which also affects the slice time of each individual task, and hence the number of context switches
per second.
Before we can start doing practical things with an RTOS, we need to explain just one last concept.
What about the case when a task wants to voluntary leave the control? In this case often RTOSes
use the SVC (SuperVisor Call) instruction implemented by Cortex-M processors, which causes that
the SVC_Handler exception handler is called, or force the PendSV exception to be raised. Explaining
when they use one and when the other is outside the scope of this book and it is also a design choice
of OS maker. For more information, refer to Joseph Yiu¹² books if interested in deepen these topics.
This is just an introduction to the complex topics underlying an RTOS. We will analyze several other
concepts, mainly related to the synchronization of concurrent tasks, later in this chapter. We will
now start seeing the most relevant features of FreeRTOS.
applications. We have talked about CMSIS-RTOS in Chapter 1, when we introduced the whole stack.
The idea behind the CMSIS initiative is that, using a common standardized set of APIs among several
silicon manufacturers and software vendors, it is possible to “easily” port our application on different
microcontrollers from other vendors. For this reason, we will introduce the FreeRTOS functionalities
using as much as possible the CMSIS-RTOS API.
The next two paragraphs show how to import the FreeRTOS distribution inside an Eclipse project,
either manually or using the CubeMXImporter tool.
¹³FreeRTOS is available in all CubeHALs, inside the Middleware/Third_Party/Source folder.
¹⁴This part of FreeRTOS is considered separated from the core FreeRTOS source three, and it is said to implement the port layer
of FreeRTOS.
Running FreeRTOS 460
If you want to import the FreeRTOS source tree into an existing project, you can proceed in the
following way.
1. Create an Eclipse folder named Middleware/FreeRTOS inside the root of the project.
2. Drag into this folder the content of the STM32Cube_FW/Middlewares/Third_Party/FreeR-
TOS/Source excluding the Portable subdirectory.
3. Now create a sub-folder named portable/GCC¹⁵ inside the Middleware/FreeRTOS Eclipse
folder, and one named portable/MemMang.
4. Drag the folder STM32Cube_FW/Middlewares/Third_Party/FreeRTOS/Source/portable/GC-
C/ARM_CMx corresponding to the architecture of your STM32 MCU (for example, if you have
an STM32F4, which is based on a Cortex-M4 core, pick the folder ARM_CM4F) inside the
portable/GCC Eclipse folder.
5. Drag only one¹⁶ of the files contained inside the STM32Cube_FW/Middlewares/Third_-
Party/FreeRTOS/Source/portable/MemMang folder inside the portable/MemMang Eclipse
folder. This folder contains 5 different memory allocation schemes used by FreeRTOS. We
will study them more in depth later. It is ok to use the heap_4.c for the moment.
At the end of the import process, you should have a project structure like the one shown in Figure
5
Read carefully
When we create new folders in an Eclipse project, by default Eclipse automatically excludes
them from the building process. So we need to enable compilation of the Middlewares
folder by right-clicking on in the Project Explorer tree-pane, then selecting Resource
configuration->Exclude from build, and unchecking all the project configurations de-
fined.
¹⁵If you are using another tool-chain, you have to rearrange the instructions accordingly.
¹⁶It is ok to import all memory management schemes and exclude from compilation those unneeded. It is up to you how organize
in the best way the project.
Running FreeRTOS 461
Now we need to define the FreeRTOS config file and include the FreeRTOS headers in the project
settings. So, rename the Middlewares/FreeRTOS/include/FreeRTOSConfig_template.h file in
Middlewares/FreeRTOS/include/FreeRTOSConfig.h. Next, go in the Project Settings->C/C++
Build->Settings->Cross ARM C Compiler->Include section and add the entries:
• "../Middlewares/FreeRTOS/include"
• "../Middlewares/FreeRTOS/CMSIS_RTOS"
• "../Middlewares/FreeRTOS/portable/GCC/ARM_CMx"¹⁷
as shown in Figure 6.
The CubeMXImporter tool allows to automatically import a project generated with CubeMX and
with the FreeRTOS middleware. Once you have configured the MCU peripherals in CubeMX, you
can easily enable the FreeRTOS middleware by checking the flag Enabled in the corresponding IP
Tree entry, as shown in Figure 7.
¹⁷Arrange this directory according your specific port layer.
Running FreeRTOS 462
Once the CubeMX project is generated, you can follow the same instructions reported in Chapter 4.
In the configuration section it is possible to set the FreeRTOS configuration parameters. We will
analyze the most relevant ones during this chapter. When you generate the CubeMX project,
CubeMX will ask you if you want to choose a separated timebase generator for the HAL, leaving the
SysTick only as timebase generator for the RTOS (see Figure 8). CubeMX asks this because FreeRTOS
is designed so that it automatically sets the SysTick IRQ priority to the lowest one (highest priority
number). This is an architectural requirement of FreeRTOS, which unfortunately conflicts with the
way the HAL is designed.
As said several other times before, the STM32Cube HAL is built around a unique timebase source,
which usually is SysTick timer. SysTick_Handler() ISR automatically increments the global tick
counter every 1ms. The HAL uses this feature by using the HAL_Delay() function really often in
several HAL routines. These are in turn called by the HAL_<PPP>_IRQHandler() functions, which
are executed in the context of an ISR (for example, the HAL_TIM_IRQHandler() is called from the
ISR of a timer). If the SysTick IRQ is not configured to run at the highest priority interrupt (which
is 0 in Cortex-M based processors), then calling the HAL_Delay() from an ISR context may lead to
deadlocks¹⁸ if the priority of the ISR that makes call to the HAL_Delay() is higher than the one of
the SysTick timer (and this is always true if you use FreeRTOS, as said before). So, it is best to use
another timer for the HAL.
Figure 8: the warning message suggests to choose a different timebase generator for the HAL
¹⁸In concurrent programming, a deadlock is a situation in which two or more concurrent execution streams are each waiting for
the other to finish, and thus neither ever does. Incur in deadlock is anything but difficult, and all programmers soon or later will
encounter this hard-to-debug event.
Running FreeRTOS 463
To change the HAL timebase source, follow the instructions written in Chapter 11.
If you have a Cortex-M4F or a Cortex-M7 based STM32 MCU, and if you try to compile the project,
you will see several errors generated by the assembler, like the ones shown in Figure 9.
Figure 9: the errors generated by GCC while trying to compile FreeRTOS sources without enabling the FPU unit
Those errors are caused by the fact that Cortex-M4F or Cortex-M7 architectures provide a dedicate
Floating Point Unit (FPU), which allows to process floating point operations directly in hardware,
without the need of dedicated, and necessarily slow, functions provided by the C run-time library.
Processors equipped with an FPU unit implement additional hardware registers that need to be saved
during a context switch operation. For this reason, the FreeRTOS GCC port for M4F/7 architectures
expects that the FPU is enabled, which by default is disabled.
To enable it go in the Project Settings->C/C++ Build->Settings->Target Processor section and
select the entry FP instructions (hard) in the Float ABI field, and for the FPU Type field select
fpv4-sp-d16 if you have a Cortex-M4F based STM32 MCU, or fpv5-sp-d16¹⁹ if you have a Cortex-
M7 based microcontroller. In case you are working on the ultimate new STM32F76xx MCUs, which
provide a double precision FPU unit, then you have to select the fpv5-d16 entry.
Now you have to rebuild the whole source tree.
¹⁹fpv4-sp-d16 means that the MCU impelements a floating-point unit conforming to the VFPv4-D16 architecture, single precision
(sp), while fpv5-sp-d16 refers to the VFPv5-D16 architecture, single precision (sp). We will treat this topics in a subsequent chapter.
Running FreeRTOS 464
The function osThreadTerminate() is used to terminate a thread, and it accepts the Thread ID (TID),
which we are going to see in a while. A thread is usually made of an infinite loop that contains
the thread instructions. Placing the osThreadTerminate() outside that loop is usually a precaution
in case the control exits from that loop, because it is not correct to terminate a thread by simply
returning from its function. Passing the NULL parameter to the osThreadTerminate() function will
cause that FreeRTOS terminates the current thread.
To start a new thread with the CMSIS-RTOS API we use the following function:
The osThreadDef_t is the thread descriptor, a C struct defined in the following way:
However, the CMSIS-RTOS API provides a convenient macro, osThreadDef(), to define and
initialize the parameters of a thread descriptor. Now it is the right time to see a practical example.
Filename: src/main-ex1.c
12 int main(void) {
13 osThreadId blinkTID;
14
15 HAL_Init();
16 Nucleo_BSP_Init();
17
18 osThreadDef(blink, blinkThread, osPriorityNormal, 0, 100);
19 blinkTID = osThreadCreate(osThread(blink), NULL);
20
21 osKernelStart();
Running FreeRTOS 465
22
23 /* Infinite loop */
24 while (1);
25 }
26
27 void blinkThread(void const *argument) {
28 while(1) {
29 HAL_GPIO_TogglePin(LD2_GPIO_Port, LD2_Pin);
30 osDelay(500);
31 }
32 osThreadTerminate(NULL);
33 }
34
35 void SysTick_Handler(void) {
36 HAL_IncTick();
37 HAL_SYSTICK_IRQHandler();
38 osSystickHandler();
Lines [17:18] define and create a new thread, assigning to it the name "blink" and passing the
pointer to the blinkThread() function, which will represent our thread. Then a normal priority
is assigned to the thread (more about this soon). The fourth parameter refers to the number of
maximum instances a thread can have, but it is not used by FreeRTOS, so it is meaningless in this
case. Finally, the last parameter defines the stack size.
The CMSIS-RTOS API expresses the thread stack size in bytes, and you will find this
information in the CMSIS-RTOS layer on the top of FreeRTOS developed by ST. However,
FreeRTOS defines the stack size as a multiple of the word size, which in a Cortex-
M processor is 32-bit, and hence 4 bytes. This means that, the value we pass to the
osThreadDef() macro is multiplied by four internally by FreeRTOS. This says it all about
the effective portability of these abstraction layers.
osThreadCreate() then effectively creates the new thread and asks to the kernel to schedule its
execution, returning the Thread ID (TID): this is used by other APIs to manipulate the thread status
and its configuration. Note that, once the thread is defined using the osThreadDef() macro, we
use the macro osThread() to refer to that thread in other part of the code. The second parameter
of the osThreadCreate() function is an optional parameter to pass to the thread. Finally, we start
the kernel scheduler by using the function osKernelStart(), which never returns unless something
wrong happens.
The function blinkThread() is nothing more than the omnipresent blinking application. The only
notably difference is the use of the osDelay() function instead of the classical HAL_Delay(): the
osDelay() is designed so that the thread will remain in blocked state for 500ms without impacting
on the CPU performances. After that time, the thread will be resumed and the LD2 LED will be
toggled again. We will talk more about the osDelay() function later.
Running FreeRTOS 466
Finally, note that, since we are using here the SysTick as timebase for the FreeRTOS kernel, we need
to add a call to the function osSystickHandler() inside the exception handler of the timer, and
configure it to generate a tick every 1ms (this is performed in the SystemClock_Config() routine,
as shown in Chapter 10).
A running thread can put itself in blocked state by start waiting for “an external” event. This event
could be, for example, a synchronization primitive (e.g. a semaphore) that will be unlocked from
another thread. Another source of blocking state is the osDelay() function, which places the thread
in blocked state until the specified delay time does not pass. A blocked thread can be placed in ready
state, and hence it becomes ready to be scheduled for execution, or in suspended state.
It is important to clarify, to avoid any misunderstanding, that a suspended or blocked thread needs
the intervention of an external entity to return in ready state.
²⁰In FreeRTOS this state is called stopped, as shown in Figure 10.
Running FreeRTOS 467
CMSIS-RTOS, instead, has a well-defined priority scheme, made of eight levels (reported in Table
1), which are mapped on the FreeRTOS priorities. The function
• Prioritized preemptive scheduling with time slicing: this is the most common algorithm
implemented by all RTOSes, and it works in this way. Every thread has a fixed priority,
which is assigned during its creation. The scheduler will not never change this priority, but
the programmer is free to reassign a different priority by calling the osThreadSetPriority()
function. In this mode, the scheduler will immediately preempt a running thread if one with
a higher priority becomes ready to be executed. Being preempted means being involuntary
(without explicitly yielding or blocking) moved out of the running state into the ready state to
allow the higher priority thread to become running. The time slicing (also known as quantum
time) is used to share CPU processing time between threads with the same priority, even
when they leave the control by explicitly yielding or blocking. When a thread “consumes” its
time slice, the scheduler will select the next running thread in the scheduling list (if available)
by assigning it the same slice time. If there are no available ready threads, the scheduler will
mark as running a special thread named idle, which we will describe next. The slice time
corresponds to the tick time of the RTOS, which by default is equal 1KHz, that is 1ms. This
can be changed by configuring the macro configTICK_RATE_HZ, and rearranging the UEV
frequency of the timer used as timebase generator. Tuning this value it is up to the specific
application, and it also depends on how fast the MCU runs. The slower the MCU runs, the
slower the tick time should be. Usually a value ranging from 100Hz up to 1000Hz is suitable
for a lot of applications.
• Prioritized preemptive scheduling without time slicing²¹: this algorithm is almost equal to
the previous one, except for the fact that once a thread enters in running state, it will leave
²¹This is the default scheduling policy configured by CubeMX for STM32F0/L0 microcontrollers.
Running FreeRTOS 469
the CPU only on a voluntary basis (by blocking, stopping or yielding) or if a higher priority
thread enters in ready state. This algorithm minimizes a lot the impact of the context switch on
the overall performances, since the number of switches is dramatically reduced. However, a
bad designed thread may monopolize the CPU, causing unpredictable behaviour of the whole
device.
• Cooperative scheduling: when this algorithm is used, a thread will leave the CPU only on a
voluntary basis (by blocking, stopping or yielding). Even if a higher priority thread becomes
ready, the OS will never preempt the current thread, and it will reschedule it again in case of
an external interrupt. This form of scheduling gives all the responsibility to the programmer,
which must carefully design the threads as if he were designing a firmware without using an
RTOS.
Special care must be placed when assigning priorities to threads, even if we are using a prioritized
preemptive scheduling with time slicing. Let us consider this example.
Filename: src/main-ex2.c
13 int main(void) {
14 HAL_Init();
15
16 Nucleo_BSP_Init();
17
18 osThreadDef(blink, blinkThread, osPriorityNormal, 0, 100);
19 osThreadCreate(osThread(blink), NULL);
20
21 osThreadDef(uart, UARTThread, osPriorityAboveNormal, 0, 100);
22 osThreadCreate(osThread(uart), NULL);
23
24 osKernelStart();
25
26 /* Infinite loop */
27 while (1);
28 }
29
30 void blinkThread(void const *argument) {
31 while(1) {
32 HAL_GPIO_TogglePin(LD2_GPIO_Port, LD2_Pin);
33 osDelay(500);
34 }
35 osThreadTerminate(NULL);
36 }
37
38 void UARTThread(void const *argument) {
39 while(1) {
Running FreeRTOS 470
This time we have two threads, one that blinks the LD2 LED and one that constantly prints on the
UART2 a message. The UARTThread() is created with a priority higher than the blinkThread()
one. Running this example, you can see that the LD2 LED never blinks. This happens because
UARTThread() is designed to continuously do something and when its quantum time expires, it is
still in ready state and, having a higher priority, it is rescheduled for execution. This clearly proves
that priorities must be used carefully to prevent other processes from starving²².
osStatus osThreadYield(void);
This causes a context switch, and the next ready thread in the scheduling list is placed in running
state. The osThreadYield() has a really relevant role if the cooperative scheduling is the scheduler
policy.
Figure 11: what usually happens when the number of thread increases too much
I often review projects sent to me from readers of this book (but sometimes I have
seen projects, having the same bad approach, made by professionals - whether you
believe it or not) where you can see tens of threads spawn around in the code that
do nothing relevant. Sometimes you can also find threads that do nothing more than
forking another thread after a comparison.
Theorists of concurrent programming will teach you that the more concurrent
streams you have the more issue you will probably have. Governing threads may
be really hard, and often the cost involved in synchronizing them overtakes the
advantage in using them. Moreover, the same operation of spawning a new thread
has a non-negligible cost. And the same applies to the context switch.
Multithreaded programming must always handled with care, especially on embed-
ded systems, where the SRAM is often really limited. Remember: keep it simple.
structures (list of TCBs, and so on). The same applies to other synchronization primitives we will
study later, such as semaphores and mutexes. Where is this memory exactly taken from?
FreeRTOS implements a dynamic memory allocation model, which uses regions of the SRAM to
allocate all OS internal structures, including TCBs. However, FreeRTOS does not make use of the
classical malloc() and free() functions provided by the C run-time library²³, because:
1. they uses a lot of code space, increasing the size of the firmware;
2. they are not designed to be thread safe;
3. they are not deterministic (the amount of time taken to execute the function will differ from
call to call).
So, FreeRTOS provides its own dynamic allocation scheme to handle the memory it needs, but since
there are several ways to do it, each one with its benefits and tradeoffs, FreeRTOS is designed so that
this part is abstracted from the rest of the core OS, and it provides five different allocation schemes
the user can choose from, according his specific needs. The pvPortMalloc() and vPortFree() are
the most important functions implemented in each scheme, and their name clearly says what they
do.
This five schemes are not part of the FreeRTOS core, but they are part of the port layer, and
they are implemented inside five C source files, named heap_1.c..heap_5.c, contained inside the
portable/MemMang folder. By compiling one of these files together with the rest of FreeRTOS code,
we automatically choose that allocation scheme for our application. Moreover, we can eventually
provide our allocation model by implementing this API layer (we essentially need to implement 5
functions, in the worst case) according our specific needs.
Before we see the features of each one of these five allocators, it is important to underline
that, in some application domains, the dynamic memory allocation is strongly discouraged
or even expressly prohibited. Even if, as we will see soon, one of these five allocators offered
by FreeRTOS answers to the majority of requirements about memory allocation in these
application domains, unfortunately this FreeRTOS characteristic prevents its usage when
this limitation applies. Other RTOSes, which often are certified for some standards (like
the OSEK/VDX for Operating Systems used in automotive electronics), provide a full static
memory allocation model, even if this may generate additional overhead to the user during
the firmware development.
The next release of FreeRTOS, the 9.0, is going to finally overtake these limits, by offering
to developers two allocation models: a dynamic one, which is essentially that one provided
in FreeRTOS 8.x, and a full static one. At the time of writing this chapter, May 2016, the
version 9.0 is going to be released. However, I think that ST will take several months before
it adapts the CMSIS-RTOS on the top of this new major release. When this will happen, I
will update this part of the book.
²³With one notably exception represented by the heap_3.c allocator, as we will see soon.
Running FreeRTOS 473
14.4.1 heap_1.c
A lot of embedded applications use an RTOS to logically divide the firmware in blocks. Each block
has its own features, and often it runs independently from other blocks. For example, suppose that
you are developing a device with a TFT display (maybe the controller of a modern dishwasher).
Usually the firmware is partitioned in few threads, where one is the responsible of the graphical
interaction (it updates the display by printing information and showing stunning graphical widgets)
and other threads are responsible of managing the washing program (and so the handling of sensors,
motors, pumps and so on). These applications usually have a main() that spawns the threads (as we
have done in the past examples), and almost nothing more is initialized by the OS once it starts
spinning. This means that the memory allocator does not have to consider any of the more complex
allocation issues, such as determinism and fragmentation, and it can be simplified.
heap_1.c allocator implements a very basic version of the pvPortMalloc(), and does not pro-
vide vPortFree(). Applications that never delete a thread, or other kernel objects like queues,
semaphores, etc, are suitable to use this memory allocation scheme. Those application domains,
where the use of dynamically allocated memory is discouraged, may benefit from this allocation
scheme, since it offers a deterministic approach to the memory management, avoiding fragmentation
(because the memory is never deallocated).
heap_1.c allocator subdivides a statically allocated array in small chunks, as calls to pvPortMalloc()
are made. This is indeed the FreeRTOS heap. The total size of this array (expressed in bytes) is
defined by the macro configTOTAL_HEAP_SIZE in the FreeRTOSConfig.h file. The only tradeoff with
this allocation scheme is that, being the whole array allocated at compile time, the application will
consume al lot of SRAM even if it does not entirely use it. This means that programmers have to
carefully choose the right value for configTOTAL_HEAP_SIZE size.
14.4.2 heap_2.c
heap_2.c also works by subdividing a statically allocated array, which is dimensioned by the
configTOTAL_HEAP_SIZE macro. It uses a best-fit algorithm to allocate the memory and, unlike the
Heap_1.c allocation scheme, it allows memory to be freed. This algorithm is considered deprecated
and not suitable for new designs. The Heap_4.c is the better alternative to this allocator. For this
Running FreeRTOS 474
reason, we will not go into details of how it works. If interested, you can consult the official FreeRTOS
documentation²⁴.
14.4.3 heap_3.c
heap_3.c uses the conventional C malloc() and free() functions to perform memory allocation.
This means that the configTOTAL_HEAP_SIZE parameter has no effects on the memory management,
since the malloc() is designed to manage the heap by its own. This means that we need to
configure our linker scripts accordingly, as shown in Chapter 13. Moreover, consider that the
malloc() implementation changes from the one provided by the newlib-nano and the regular
newlib. However, the more versatile implementation provided by the newlib library requires a lot
of more FLASH space.
heap_3.c makes malloc() and free() thread-safe by temporarily suspending FreeRTOS scheduler.
For more information about this, refer to the official FreeRTOS documentation.
14.4.4 heap_4.c
heap_4.c works in the same way of heap_1.c and heap_2.c. That is, it uses a statically allocated
array, dimensioned by the value of the configTOTAL_HEAP_SIZE macro, to store the objects allocated
at run-time. However, it has a different approach during the allocation of memory. In fact, it uses a
first fit algorithm, which combines adjacent free blocks into a single large block, reducing the risk
of memory fragmentation. This technique, commonly used by the garbage collector in languages
with dynamic and automatic memory allocation, is also called as coalescing.
Unfortunately, this behaviour of the heap_4.c allocator causes that it is non-deterministic: the
allocation/deallocation of many small objects, together with the creation/destroy of threads, could
cause a lot of fragmentation, which requires more computing processing to pack the memory.
Moreover there is no guarantee that the algorithm avoids memory leaks at all. However, it is usually
faster than the most standard implementation of malloc() and free(), especially the ones provided
by the newlib-nano lib.
Explaining in detail the heap_4.c algorithm is outside the scope of this book. For more information
refer to the FreeRTOS documentation²⁵.
14.4.5 heap_5.c
heap_5.c uses the same algorithm of the heap_4.c allocator, but it allows to split the memory pool
among different non contiguous memory regions. It is especially useful for STM32 MCUs providing
the FSMC controller, which allows to transparently use external SDRAMs to increase the whole
RAM. Programmer may decide to allocate some heavy used thread in the internal SRAM memory
²⁴http://bit.ly/1PMSPRM
²⁵http://bit.ly/1TqxX9S
Running FreeRTOS 475
(or the CCM memory, if available) and then use the external SDRAM for less relevant objects like
semaphores and mutexes.
By defining a custom linker script, it is possible to allocate two pools in two memory regions, and
then use the vPortDefineHeapRegions() function from FreeRTOS to define them as memory pools.
However, this is an advanced usage of the OS that we will not detail here. If interested, you can
refer to the excellent book Mastering the FreeRTOS Real Time Kernel by Richard Barry, creator of
FreeRTOS.
This works because, in recent libc releases, both the functions are declared as __weak.
Like for the thread definitions seen before, a memory pool can be easily defined by using the macro
osPoolDef(). A pool is effectively created using the function:
to retrieve a single block of memory from the pool, whose size is equal to the item_sz parameter of
the struct osPoolDef_t . If no more space is available in the pool, the function returns NULL. To
free a block in the poll, we use the function:
which allocates a memory block from a memory pool and sets memory block to zero.
The following pseudo-code shows how to easily use memory pools.
1 #include "cmsis_os.h"
2
3 typedef struct {
4 uint8_t Buf[32];
5 uint8_t Idx;
6 } MEM_BLOCK;
7
8 osPoolDef (MemPool, 8, MEM_BLOCK);
9
10 void AllocMemoryPoolBlock (void) {
11 osPoolId MemPool_Id;
12 MEM_BLOCK *addr;
13
Running FreeRTOS 477
At line 8 a new pool is defined so that it contains eight elements each one with a size equal to
sizeof(MEM_BLOCK) (the size is automatically computed by the macro). Then the pool is effectively
created at line 14 and one of the eight bock is retrieved from the pool at line 17 by using the
osPoolAlloc() routine.
which returns the number of “unused” words of the thread stack. For example, assume a thread
defined with a stack of 100 words (that is, 400 bytes on an STM32). Suppose that, in the worst
scenario, the thread uses 90 words of its stack. Then the uxTaskGetStackHighWaterMark() returns
the value 10.
The TaskHandle_t type of the parameter xTask is nothing more than the osThreadId returned by
the osThreadCreate() function, and if we call the uxTaskGetStackHighWaterMark() from the same
thread we can pass NULL.
This function is available only if:
²⁷We will talk again about this topic in a subsequent chapter about advanced debugging.
Running FreeRTOS 478
Figure 12: how FreeRTOS fills the stack with a fixed value (0xA5) to detect stack overflows
How does the uxTaskGetStackHighWaterMark() know how much stack has been used?
There is nothing magic performed by that function. When one of the above macros is
defined, FreeRTOS fills the stack of a thread with a “magic” number (defined by the macro
tskSTACK_FILL_BYTE inside the task.c file), as shown in Figure 12. This is a “watermark”
used to derive the number of free memory locations (that is the number of locations through
the end of the thread stack still containing that value). This is one of the most efficient
techniques used to detect buffer overflows.
The uxTaskGetStackHighWaterMark() function can be also used to verity the effective usage of the
thread stack, and hence reduce its size if too much space is wasted.
FreeRTOS offers two additional methods to detect at run-time a stack overflow. Both of them consist
in setting the configCHECK_FOR_STACK_OVERFLOW macro in the FreeRTOSConfig.h file. If we set it to
1, then every time a thread runs out, FreeRTOS check for the value of the current stack pointer: if it
is higher than the top of the thread stack, then it is likely that a stack overflow is happened. In this
case, the callback function:
is automatically called. By defining this function in our application we can detect the stack overflow
and debug it. For example, during a debug session we could place a software breakpoint in it:
Running FreeRTOS 479
This method is fast, but it could miss stack overflows that happen in the middle of a context switch.
So, by configuring the macro configCHECK_FOR_STACK_OVERFLOW to 2, FreeRTOS will apply the
same method of the function uxTaskGetStackHighWaterMark(), that is it will fill the stack with
a watermark value and it will call the vApplicationStackOverflowHook in case the latest 20 bytes
of the stack have changed from their expected value. Since FreeRTOS performs this check at every
context switch, this mode impacts on overall performances, and it should be used only during the
firmware development (especially for high tick frequencies).
Queues are an optional data structure in the CMSIS-RTOS layer, which must be enabled by setting
the osFeature_MessageQ to 1 in cmsis_os.h file. A queue is defined by the following C struct:
To easily define a queue, we can use the osMessageQDef() macro. A queue is effectively created by
using the function:
which accepts an instance of the struct osMessageQDef_t created with the macro osMessageQDef()
and the id of thread associated to the queue. However, the FreeRTOS API does not permit to associate
a thread to a queue, so that parameter is simply ignored and you can safely pass the NULL value.
To enqueue a new element in the queue we use the function
where queue_id is the id of the queue returned by the function osMessageCreate, while info can
be both the data (an unsigned long integer literal) to enqueue or the address of a memory location
containing a more articulated C data structure (for example, a block coming from a memory pool).
Finally, the millisec parameter represents the timeout, that is it indicates the amount of milliseconds
we are willing to wait if the queue is full: if sufficient room is not made available before the timeout
period, then the osMessagePut() function returns the value osErrorTimeoutResource²⁹. Passing
osWaitForever will cause osMessagePut() to wait indefinitely.
which returns an instance of the C struct osEvent that is defined in the following way:
²⁹The osMessagePut() and osMessageGet() can return other status codes, according if they are called from a thread or an ISR.
For more information, consult the official CMSIS-RTOS specification (http://bit.ly/1VAAz57).
Running FreeRTOS 481
typedef struct {
osStatus status; /* Status code: event or error information */
union {
uint32_t v; /* Message as 32-bit value */
void *p; /* Message or mail as void pointer */
int32_t signals; /* Signal flags */
} value; /* Event value */
...
} osEvent;
As you can see, an instance of that struct is able to provide both the status code (which is equal
to osEventMessage if an element is successfully dequeued, osEventTimeout in case of timeout) and
the dequeued element, which is contained inside the osEvent.value.v field (or we can also use the
*p field of the union if the queued value is an address of a memory location containing a more
articulated data structure instance).
If we want to leave an element in the queue, without physically removing it, we can use the function
Take in account that FreeRTOS provides two separated APIs to manipulate queues from
a thread or from an ISR. For example, the xQueueReceive() function is used to dequeue
an element from a thread, while the xQueueReceiveFromISR() is used to safely dequeue
elements from an ISR. The CMSIS-RTOS layer developed by ST is designed to abstract this
aspect, and it automatically checks if we are performing the call from a thread or from an
ISR. As usual, at the expense of speed.
The following example shows how a queue can be used to exchange data between two threads, one
acting as producer (UARTThread()) and one as consumer (blinkThread()), which can run really slow
if a really large timeout is specified.
Filename: src/main-ex3.c
The UARTThread, defined at lines [51:60] uses the I/O retargeting technique seen in Chapter 8,
allowing us to use the classical printf()/scanf() routines of the C standard library. The thread
reads an uint16_t value from the UART and places it inside the queue MsgBox. The blinkThread(),
defined at lines [37:49] takes these values from the queue and uses them as delay values for
the osDelay() function. This simple application allows us to pass the wanted LD2 LED blinking
frequency from a terminal emulator.
Running FreeRTOS 483
If you specify a large delay value, you can easily see how queues can be used when a producer thread
runs faster than a consumer one. By passing a delay equal to 10000, we can then immediately put
another delay value equal to 50 inside the queue (because the queue has sufficient room to store
another value). As you will see, we need about 10 seconds before the LED starts blinking at a rate
of 20Hz, since blinkThread() is blocked by the osDelay() function.
The CMSIS-RTOS API specifies another type of queues, called mail queues. A mail queue resembles
a message queue, but the data that is being transferred consists of memory blocks that need to be
allocated (before putting data in) and freed (after taking data out). The mail queue uses a memory
pool to create formatted memory blocks and passes pointers to these blocks in a message queue.
This allows the data to stay in an allocated memory block while only a pointer is moved between
the separate threads. This is an advantage over messages that can transfer only a 32-bit value or
a pointer. Using the mail queue functions, you can control, send, receive, or wait for “mails”. Mail
queues are implemented by ST using indeed message queues and memory pools. We will not go into
details of mail queues.
14.5.2 Semaphores
In concurrent programming, a semaphore is a datatype used to control the access, by multiple
execution streams, to a common resource. A really simple form of semaphore is represented by
a boolean variable: the state of the variable is used as a condition to control the access to a resource.
For example, if the variable is equal to False, then a thread is placed in the blocked state until that
variable becomes True again. A semaphore is said to be taken from the thread that acquires it, that
is the thread that firstly finds the semaphore equal to True. This is indeed a binary semaphore, since
it can assume only two states, and in FreeRTOS is implemented as a queue with only one element.
If the queue is empty, then the first thread that tries to acquire it places a “flag” value in the queue,
and it continues its execution; other threads will not be able to add other “flags” until the thread
that has acquired the semaphore does not dequeue its flag.
A more general form of semaphore is the counting semaphore, which allows more than one threads to
acquire it. Just as binary semaphores are implemented as queues that have a length of one, a counting
semaphore can be thought as queues that have a length more than one. A counting semaphore
usually has an initial value, which is decremented every time a thread acquires it. While binary
semaphores are usually used to discipline the concurrent access to just one resource, a counting
semaphore can be used to:
• discipline the access to pools of common resources: in this case the count value indicates
the number of available resources;
• count the number of recurring events: in this case an execution stream (for simplicity
assume that it is an ISR) will release a semaphore (causing that its counter increases) to signal
to another thread that a given event is occurred (e.g. a data coming from the UART is ready to
be processed); this threads can then take the semaphore and start performing its activities; if
another “event” takes place (new data arrived), then the ISR will increase again the semaphore
Running FreeRTOS 484
by releasing it; in this way the processing thread will be able to take again the semaphore and
perform its activities.
However, a simple variable cannot be used as a semaphore, since there is no guarantee that the
operation of “taking” a semaphore is carried out in an atomic manner. So to acquire a semaphore
we need the intervention of a “third party”, that is the OS kernel, which suspends the execution of
other threads during the acquisition process.
FreeRTOS provides two distinct APIs to manage binary and counting semaphores, while the
CMSIS-RTOS specifies that semaphores are implemented as counting semaphore (leaving to the
mutexes the role of binary semaphores). However, the usage of counting semaphores increases the
FreeRTOS codebase, which may have a dramatic impact on microcontrollers with small amount
of FLASH memory. For this reason, FreeRTOS provides them only if the macro configUSE_COUNT-
ING_SEMAPHORES in the FreeRTOSConfig.h file is defined and equal to 1. The CMSIS-RTOS layer
developed by ST is able to detect this case, and it uses FreeRTOS counting semaphores if available,
otherwise it uses binary semaphores. In this case, all settings related to the counter value of the
semaphore are meaningless.
In the CMSIS-RTOS layer semaphores are optional, and they must be enabled by setting the
osFeature_Semaphore macro to 1 in the cmsis_os.h file. In the CMSIS-RTOS API a semaphore
is defined using the macro osSemaphoreDef(), which simply accepts the semaphore name as the
only one parameter. Then the semaphore is effectively created by using the function
As said before, count is the starting value of the semaphore, which is meaningless if configUSE_-
COUNTING_SEMAPHORES is undefined or equal to 0. To acquire a semaphore we use the function
which accepts the semaphore id and the timeout (millisec) value. If the semaphore counter is higher
then zero, the thread acquires it (reducing the counter) and it can continue. Otherwise it is placed
in blocked state for a period equal to the timeout value, until the counter increases again. A thread
can wait indefinitely by specifying the osWaitForever value. The osSemaphoreWait() returns osOK
if the thread has successfully acquired the semaphore, otherwise it return osErrorOS³⁰. To release a
semaphore we use the function
A semaphore is dynamically allocated by the OS upon its creation, and it must be explicitly destroyed
by using the function
³⁰As you can see, the osSemaphoreWait() is designed to return an int32_t instead of the classical osStatus return value. This
because the CMSIS-RTOS API specifies that it should return the semaphore counter after this has been decremented by the acquiring
procedure. However, FreeRTOS does not provide this facility.
Running FreeRTOS 485
As seen for the APIs related to queues manipulation, FreeRTOS provides two sepa-
rated APIs to manipulate semaphores from a thread or from an ISR. For example, the
xSemaphoreTake() function is used to acquire a semaphore from a thread, while the
xSemaphoreTakeFromISR() is used to perform this operation from an ISR. The CMSIS-
RTOS layer developed by ST is designed to abstract this aspect.
The following example shows how to use a semaphore as notification primitive. This is again the
classical blinking application, but this time the delay of the blinkThread() is established by another
thread, delayThread(), which “unlock” the blinking thread by releasing a binary semaphore.
Filename: src/main-ex4.c
14 osSemaphoreId semid;
15
16 int main(void) {
17 HAL_Init();
18
19 Nucleo_BSP_Init();
20
21 RetargetInit(&huart2);
22
23 osThreadDef(blink, blinkThread, osPriorityNormal, 0, 100);
24 osThreadCreate(osThread(blink), NULL);
25
26 osThreadDef(delay, delayThread, osPriorityNormal, 0, 100);
27 osThreadCreate(osThread(delay), NULL);
28
29 osSemaphoreDef(sem);
30 semid = osSemaphoreCreate(osSemaphore(sem), 1);
31 osSemaphoreWait(semid, osWaitForever);
32
33 osKernelStart();
34
35 /* Infinite loop */
36 while (1);
37 }
38
39 void blinkThread(void const *argument) {
40 while(1) {
41 osSemaphoreWait(semid, osWaitForever);
42 HAL_GPIO_TogglePin(LD2_GPIO_Port, LD2_Pin);
43 }
Running FreeRTOS 486
44 osThreadTerminate(NULL);
45 }
46
47 void delayThread(void const *argument) {
48 while(1) {
49 osDelay(500);
50 osSemaphoreRelease(semid);
51 }
Lines [29:31] define and create a binary semaphore named sem: the semaphore is immediately
acquired, causing its counter to become equal to zero. blinkThread() and delayThread() are
scheduled, but the first one is placed in blocked state as soon as it reaches the osSemaphoreWait()
call: being the semaphore already “acquired”, the thread will be swapped out until the semaphore is
released by the delayThread() thread, which performs this operation every 500ms. This will cause
the LD2 LED to blink at a 2Hz rate.
Signals have their benefits and drawbacks: they are faster than semaphores and need less RAM, but
they cannot be used to exchange data between threads and they cannot be used to trigger multiple
threads at once.
If we want to trigger a thread signal, we have to set it using the function
where the parameter thread_id is clearly the thread id and signal is the id of the signal we want to
trigger. Once a signal is set, it remains in this state until we expressly clear it by using the function
A thread can be placed in blocked state waiting for a signal by using the function
Running FreeRTOS 487
If you remember, always in that chapter we have seen that the HAL tries to protect
concurrent accesses to peripherals by using the __HAL_LOCK() macro. However, there is no
guarantee that in a multithreaded environment that macro will prevent race conditions,
since the locking operation is not performed atomically.
While semaphores are best suited to synchronize thread activities, mutexes and critical sections
are a way to protect shared resources in concurrent programming. FreeRTOS provides us both the
primitives, while the CMSIS-RTOS layer only defines the notion of mutex. However, critical sections
come in handy in several situations, and sometimes they represent a better solution to problems
that would require more programming effort from the developer to avoid subtle conditions, like the
priority inversion.
14.6.1 Mutexes
Mutex is acronym for MUTual EXclusion, and they are a sort of binary semaphores used to
control the access to shared resources. From a conceptual point of view, mutexes differentiate from
semaphore for two reasons:
• a mutex must be always taken and then released to signal that the protected resource is now
available again, while a semaphore can even be released to wake up a blocking thread (we
have seen this mode in the example 4); moreover, usually a mutex is taken and released by
the same thread³¹;
³¹However, different from other Operating Systems, FreeRTOS is not implemented to check that only the thread that has acquired
the mutex can release it.
Running FreeRTOS 488
• a mutex implement the priority inheritance, a feature we will analyze later used to minimize
the priority inversion problem.
To use mutexes, we need to define the macro configUSE_MUTEXES inside the FreeRTOSConfig.h file
and set it to 1. A mutex is defined using the macro osMutexDef(), which accepts the mutex name
as the only parameter, and it is effectively created by the function
Mutexes may introduce an unwanted subtle problem, well known in literature as the priority
inversion problem. Let us consider this scenario with the help of the Figure 13.
ThreadL(), ThreadM() and ThreadH() are three threads with an increasing priority (L stands for low,
M for medium and H for high). ThreadL() starts its execution and it acquires a mutex used to protect
Running FreeRTOS 489
a shared resource. During its execution, ThreadH() returns in ready mode and it is scheduled for
execution having a higher priority. However, it also needs to acquire the same mutex and it goes back
in blocked state. Suddenly, the medium-priority thread ThreadM() goes available, and it is scheduled
for execution having a priority higher than ThreadL(). This cannot so finish its job and the mutex
remain locked, preventing ThreadH() from being executed. In this case, we have the practical effect
that the priority between ThreadL() and ThreadH() is inverted, since ThreadH() cannot be executed
until ThreadL() releases the mutex.
The priority inversion problem should be avoided at all by rearranging application in a different
manner. However, FreeRTOS tries to minimize the impact of this issue by temporarily increasing
the priority of the mutex holder (in our case ThreadL()) to the priority of the highest priority thread
that is attempting to acquire the same mutex.
Figure 14: how the priority inversion problem is addressed by temporary increasing the priority of ThreadL
The Figure 14 clearly shows this process. ThreadL() starts its execution and it acquires a mutex.
During its execution, ThreadH() returns in ready mode and it is scheduled for execution having
a higher priority. However, it also needs to acquire the same mutex and it goes back in blocked
state. This time, the priority of the ThreadL() is increased to the same of ThreadH(), preventing the
ThreadM() from being executed. ThreadL() is scheduled again and it can release the mutex, allowing
ThreadH() to run. Finally, ThreadM() can execute, since the priority of ThreadL() is decreased to its
original priority when it releases the mutex.
Sometimes it happens that, especially when our application is fragmented in several APIs, a thread
accidentally acquire a mutex more than once. Since a mutex can be acquired only once, any
subsequent attempt from the same thread to acquire the same mutex will cause a deadlock (because
a successive call to the osMutexWait() will place the thread in blocking state, but it is the only thread
designed to release the mutex).
To prevent this unwanted behaviour, FreeRTOS introduces the notion of recursive mutexes, that is
mutexes than can be acquired more than once. Clearly, a recursive mutex needs to be released the
Running FreeRTOS 490
same number of times it has been acquired. Since the CMSIS-RTOS API does not provide APIs to
handle recursive mutexes, we will not go into details of this topic. You can consult the FreeRTOS
documentation³² for more about this.
...
__disable_irq();
//All IRQs are disabled and we are sure that the next code will not be preempted
...
//Critical code here
...
__enable_irq();
//All IRQs are now enabled again, and normal behaviour of the RTOS is restored
Implementing a critical section using CMSIS APIs is not a trivial task, because we need to take care
of special hardware situations may occur. However, FreeRTOS provide us four routines that we can
use to define critical sections in our application.
The taskENTER_CRITICAL() and taskEXIT_CRITICAL() functions allow to define a critical section
inside a thread. Those routines are designed to keep tracking of the nesting, that is each time the
taskENTER_CRITICAL() is called a counter is incremented, and it is decremented on a subsequent
call to the taskEXIT_CRITICAL() function. This means that we have to be sure to respect the calling
order.
³²http://www.freertos.org/RTOS-Recursive-Mutexes.html
Running FreeRTOS 491
Critical sections works well only if they are used to protect really few lines of code, that perform
their activities in short time. Otherwise, the whole application can be impacted by their usage.
The taskENTER_CRITICAL() and taskEXIT_CRITICAL() functions should never called from an
ISR: the corresponding The taskENTER_CRITICAL_FROM_ISR() and taskEXIT_CRITICAL_FROM_ISR()
functions are suited for this application. For more information consult the FreeRTOS documentation.
In this scenario, we could have a dedicated thread for each ISR. This thread would spend a lot of
time in blocking mode waiting for a given signal. When the IRQ fires, we could trigger that signal,
causing that the blocked thread is resumed to carry out the job that would be performed by the
corresponding ISR. By assigning different priorities to threads, we may establish an execution order
in case of concurrent ISRs. Another approach is to use a queue to transfer the data coming to the
peripheral to a worker thread, which will process it later. This is especially useful when the consumer
thread is slower than the peripheral ISR, which acts as a consumer thread in this case.
FreeRTOS provides another convenient way to defer the ISR execution to another execution stream.
This is called centralized deferred interrupt processing and it consists in deferring the execution of a
routine in the FreeRTOS daemon task³³. This method uses the xTimerPendFunctionCallFromISR()
which is documented in the FreeRTOS manual³⁴.
³³The FreeRTOS daemon task is also called the timer service task because it is the thread that handles the execution of timers
callback routines, which we will analyze later.
³⁴http://www.freertos.org/xTimerPendFunctionCallFromISR.html
Running FreeRTOS 492
However, take in mind that either deferring the execution to another thread or using a queue to
exchange data implies that several operations are performed by the CPU, and this may impact on
the reliability of ISR management. If your peripheral runs really fast, it is better to use other ways to
transfer data, for example using the DMA. Always considering the example of the UART transfer,
if our application exchanges fixed-length messages over the UART we could setup the DMA to
transfer a message and then use the DMA IRQ to move the whole message inside a queue. This
would certainly minimize the overhead connected with the transfer of individual bytes.
So far we have seen that FreeRTOS provides some APIs that are expressly designed to be called
within ISRs. For a given FreeRTOS function, there exists a corresponding ISR-safe routine ending
with FromISR() (for example, the xQueueReceiveFromISR() for the xQueueReceive() routine). These
routines are designed so that interrupts are masked (by entering and then exiting a critical section),
preventing the execution of other interrupts that could generate race conditions by calling other
FreeRTOS functions.
The interrupts masking is required because interrupts are a source of multiprogramming handled by
the hardware. While threads are different program flows handled by the RTOS, which avoids race
conditions by simply suspending the execution of the scheduler, ISR are generated by the hardware
and there is little we can do to avoid race conditions unless we mask their execution or define a
strict priority-based execution order. Moreover, the nesting mechanism offered by Cortex-M cores
increases the risk of race conditions in our code. For example, an ISR starting acquiring a semaphore
may be preempted by another ISR with higher priority performing the same operation. This will
have a catastrophic effect for sure.
Even if the CMSIS-RTOS layer is designed to abstract this dual API system, we must place special
care when calling FreeRTOS APIs from ISR routines in Cortex-M3/4/7 based microcontrollers. This
happens because these cores allow to selectively mask interrupts on a priority level basis. In Chapter
7 we have seen that the BASEPRI register allows to disable selectively ISRs execution by masking all
those IRQs having a priority lower than a given value. FreeRTOS uses this mechanism to allow the
execution of higher priority interrupts, which are assumed to be non-interruptible, while suspending
lower ones. This means that it is not safe to call FreeRTOS APIs from all ISRs, but it is only safe to
call FreeRTOS functions from those ISRs having a given (or lower) priority level.
We can set this maximum priority level by defining the macro configLIBRARY_MAX_SYSCALL_IN-
TERRUPT_PRIORITY³⁵ in the FreeRTOSConfig.h file. CubeMX automatically performs this operation
for us, and usually the maximum priority level is set to 5. Special care must be placed when we
³⁵If you read the official FreeRTOS documentation, you can see that the macro used to setup the maximum interruptible
priority level is configMAX_SYSCALL_INTERRUPT_PRIORITY. However, being FreeRTOS portable among several silicon vendors,
the priority level specified with that macro is the exact value of the IPR register, that accepts only the upper 4 bits in STM32
MCUs (for example, a priority equal to 0x2 must be specified as 0x20). ST engineers have defined the macro configLIBRARY_MAX_-
SYSCALL_INTERRUPT_PRIORITY so that we can specify the priority level according the HAL convention (in LSB form), while the
configMAX_SYSCALL_INTERRUPT_PRIORITY is defined in the following way: #define configMAX_SYSCALL_INTERRUPT_PRIORITY (
configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY << (8 - configPRIO_BITS) )
Running FreeRTOS 493
enable IRQs using CubeMX: even if recent releases of CubeMX seem to handle this aspect correctly,
always ensure that an ISR that calls FreeRTOS functions is configured with a priority equal to
configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY or lower.
Despite to the fact that this macro is also defined in projects generated by CubeMX for STM32F0/L0
MCUs, this has no practical effects since the FreeRTOS port for those families uses the PRIMASK
register to mask all interrupts (Cortex-M0/0+ cores do not offer a way to selectively disable IRQs).
So, that macro is simply ignored.
Finally, it is important to remember that FreeRTOS is designed so that the tick interrupt (that is
the IRQ associated to the timer that acts as timebase generator for the kernel) must be set to the
lowest possible interrupt, which is equal to 7 in STM32F0/L0 families and to 15 for all other MCUs.
The macro configLIBRARY_LOWEST_INTERRUPT_PRIORITY in FreeRTOSConfig.h file sets this, and it
is strongly suggested to leave it as is.
which allows to specify the timer type and an optional argument to pass to the callback routine.
The CMSIS-RTOS API provides two kinds of software timers: one-shot timers, that is timers that
execute the callback only once, and periodic timers, which act like hardware STM32 timers that
restarts counting again after they overflow.
To start a timer, we use the function
where the millisec parameter represents the period of the timer. To stop it we use the function
Running FreeRTOS 494
Finally, a timer is dynamically allocated by the OS and needs to be destroyed when no longer needed
by using the function
The following example shows our omnipresent blinking application made with a software timer.
Filename: src/main-ex5.c
13 int main(void) {
14 osTimerId stim1;
15
16 HAL_Init();
17
18 Nucleo_BSP_Init();
19
20 RetargetInit(&huart2);
21
22 osTimerDef(stim1, blinkFunc);
23 stim1 = osTimerCreate(osTimer(stim1), osTimerPeriodic, NULL);
24 osTimerStart(stim1, 500);
25
26 osKernelStart();
27
28 /* Infinite loop */
29 while (1);
30 }
31
32 void blinkFunc(void const *argument) {
33 HAL_GPIO_TogglePin(LD2_GPIO_Port, LD2_Pin);
34 }
The code is really self-explaining. Lines [22:24] define a new timer, named stim1. This timer is
configured to execute the blinkFunc() routine when it expires, and it is started with a delay of
500ms. This will cause the Nucleo LD2 LED to blink at 2Hz rate.
which has a priority defined by the macro configTIMER_TASK_PRIORITY and a stack with a size
defined by the macro configTIMER_TASK_STACK_DEPTH. Moreover, it has an internal pool of timer
objects, whose size is defined by the macro configTIMER_QUEUE_LENGTH.
Another interesting aspect to consider is how FreeRTOS computes the time internally. FreeRTOS
measure the time in function of the tick frequency, which is in turn defined by the overflow
frequency of the timer chosen as timebase generator. This means that, if we use the SysTick timer
configured to overflow ever 1ms, then internal software timers have a resolution of 1ms (which
corresponds to 1 tick). The millisec value passed to the osTimerStart() routine is hence converted
in ticks. This means that, in the case of the example 5, if the tick time is 1ms, then 500ms will be
equal to 500 ticks. If the tick time is set to 500μs, the 500ms delay is converted to 1000 ticks.
This is a really advanced topic, that requires the knowledge of many concepts underlying an
RTOS. Moreover, a decent knowledge of the concepts illustrated in Chapter 12 is required.
Un-experienced users can safely skip this part.
no thread is in ready state, then the OS execute the idle thread, until another thread becomes ready.
This means that, when the idle thread is scheduled, it is likely to be the right time to place the MCU
in sleep mode to reduce power consumption.
For this reason, FreeRTOS gives to the user the ability to define an idle hook, that is a callback
function invoked within the idle thread. To enable the hook, we have to define the macro
configUSE_IDLE_HOOK inside the FreeRTOSConfig.h file and set it to 1. Next, we can define the
function vApplicationIdleHook(void) somewhere in our source code.
For example, to place the MCU in sleep mode every time the idle thread is scheduled, we can define
that function in this way:
The power saving that can be achieved by this simple method is limited by the necessity to
periodically exit and then re-enter the low-power mode to process tick interrupts (which are related
to the underflow frequency of the SysTick timer), as shown in Figure 15. Moreover, if the frequency
of the tick interrupt is too high, the energy and time consumed entering and then exiting a low-
power mode for every tick will outweigh any potential power saving gain for all but the lightest
power saving modes.
For these reasons, it is completely impracticable to enter deeper sleep modes, like the stop one.
Moreover, the overhead connected with the entering and exiting from low-power mode affects the
reliability of the tick counter, causing shifts that impact on software timers and timeout delays.
arbitrary: it can be several milliseconds, some seconds, minutes or even days. When the MCU exits
from low-power mode, FreeRTOS makes a correcting adjustment to the tick count value when the
tick interrupt is restarted, if needed (more about this soon). This means that FreeRTOS does not stop
the timer at all: it just configures the timer so that it reaches its maximum update period before
overflowing. When the MCU wakes up again, the kernel reads the counter value of the timer and
computes the number of elapsed ticks during the sleep time.
For example, assume a 16-bit timer clocked at the core SYSCLK frequency of 48MHz. The maximum
values for the Period and Prescaler registers are equal to 0xFFFF. So instead of configuring the
timer so that it overflow ever 1ms, we can configure it to overflow after:
48.000.000
U pdateEvent = ≈ 90s
0xF F F F × 0xF F F F
FreeRTOS provides a built-in tickless functionality, which is enabled by defining the macro confi-
gUSE_TICKLESS_IDLE as 1 in FreeRTOSConfig.h. The built-in tickless mode is platform dependent:
for this reason, it is implemented inside the port.c file. The built-in tickless is available for all
Cortex-M cores, but it has one relevant limitation: it relies on the SysTick timer, because it is the
only timer available in all MCUs based on this architecture.
What’s wrong with it? The SysTick timer is a 24-bit down-counter timer, clocked at the same core
clock frequency. Unfortunately, it cannot be easily prescaled like regular STM32 timers (it has just
one prescaler value, equal to 8, in all STM32 MCUs). For example, for an STM32F030 running at
48MHz we have that, applying the equation [1] from Chapter 11, the SysTick timer will overflow
every:
48.000.000
U pdateEvent = ≈ 0.350Hz ≈ 2.8s
8 × 0xF F F F F F
Since we cannot lose the overflow event at all, otherwise the global tick count would be compro-
Running FreeRTOS 498
mised³⁷, we have to wake up again even if we have nothing relevant to do. For the most of low-power
applications this is a really short time between two consecutive sleep periods.
A solution may be represented by lowering the HCLK speed to further increase the overflow period,
but we have to pay attention to lowering the core frequency too much, because when the MCU
exits from low-power mode to service an interrupt a low HCLK speed could compromise the system
reliability. And to increase the clock speed from an ISR is not a smart thing.
Another limitation in using the SysTick timer arises from the fact that it cannot be used in stop
modes, because the HCLK clock source is turned off. This is one of the typical applications of the
low-power timers (LPTIM) provided by the most of STM32L microcontrollers. LPTIM timers, in fact,
are able to run independently from the system clock: this allows to use them even in stop modes.
For all those reasons, we are now going to provide a custom implementation of the tickless idle
functionality, which can be provided for any FreeRTOS port (including those that provide a built in
implementation) by defining configUSE_TICKLESS_IDLE to 2 in FreeRTOSConfig.h. When this con-
figuration is chosen, we can override two FreeRTOS functions: void prvSetupTimerInterrupt()³⁸
and void vPortSuppressTicksAndSleep(). The former is used by the kernel to setup the timer used
as tick generator. The latter is automatically called by the kernel when some conditions (that we
will see later) are satisfied, and we can enter in low-power modes delaying or suspending at all the
periodic timer interrupt.
Before we dive into the real source code needed to implement those two routines, it is best to take
a look to the underlying logic without struggling with implementation details.
³⁷As we will discover later, under certain circumstances we can safely stop incrementing the global tick counter. This can be
done when we are not going to use software timers and timeouts: if all threads are blocked or suspended indefinitely, then it is safe
to completely turn OFF the timebase generator.
³⁸In Cortex-M3/4 ports this function is called vPortSetupTimerInterrupt().
Running FreeRTOS 499
The first routine we are going to override is the vPortSetupTimerInterrupt() one. It simply uses
one of the available STM32 timers as timebase generator, configuring the right Period and Prescaler
values to achieve a tick interrupt with a frequency equal to 1KHz. The timer ISR (shown later) will
have the responsibility to increment the global tick counter.
Read carefully
In Chapter 10 we have seen that the HAL is designed to automatically invoke the
SystemCoreClockUpdate() when we change the HCLK frequency. This ensures us that
the SysTick interrupt is generated every 1ms even if the core clock changes. If, instead, we
use another timer for the RTOS tick counter, then it is up to us to carefully ensure that the
timer is reconfigured accordingly when the APB bus clock speed where the timer belongs
to changes.
The next lines of code show a possible implementation for the vPortSuppressTicksAndSleep(),
which is called when the following two conditions are both true:
1. The idle thread is the only thread able to run because all the application threads are either in
the blocked or in the suspended state.
2. At least n further complete tick periods will pass before the kernel moves an application thread
out of the blocked state, where n is set by the configEXPECTED_IDLE_TIME_BEFORE_SLEEP
macro in FreeRTOSConfig.h file³⁹.
³⁹This is a user-defined parameter that represents a further delay before to start the tick suppression procedure. Since this
procedure is computational intensive, and it may introduce minor shifts in the global tick count, we can programmatically decide
to wait at least n consecutive ticks before starting the procedure.
Running FreeRTOS 500
If the above conditions are satisfied, then the scheduler is suspended and the vPortSuppressTick-
sAndSleep() function is called, allowing us to temporarily suppress the tick interrupt or to delay its
execution.
61 SleepMode();
62
63 /* Determine how long the microcontroller was actually in a low power
64 state for, which will be less than xExpectedIdleTime if the
65 microcontroller was brought out of low power mode by an interrupt
66 other than that configured by the vSetWakeTimeInterrupt() call.
67 Note that the scheduler is suspended before
68 vPortSuppressTicksAndSleep() is called, and resumed when it returns.
69 Therefore no other tasks will execute until this function completes. */
70 ulLowPowerTimeAfterSleep = __HAL_TIM_GET_COUNTER(TIMx);
71
72 /* Correct the kernels tick count to account for the time the
73 microcontroller spent in its low power state. */
74 vTaskStepTick( ulLowPowerTimeAfterSleep – ulLowPowerTimeBeforeSleep );
75 }
76
77 /* Exit the critical section - it might be possible to do this immediately
78 after the prvSleep() calls. */
79 __disable_irq();
80
81 /* Restart the timer that is generating the tick interrupt. */
82 HAL_TIM_Base_Stop_IT(TIMx);
83 }
The routine starts by saving the current counter value of the timer before it is stopped. All
interrupts are disabled to prevent race conditions, entering in a critical section by calling the
CMSIS function __disable_irq(). As said before, vPortSetupTimerInterrupt() is called when
the scheduler is suspended, but an interrupt firing before we enter the critical section at line 35
may ask to the kernel to resume the execution of another thread in blocked state⁴⁰. By calling the
eTaskConfirmSleepModeStatus() we can know if we need to abort the tick suppression procedure,
resuming the timer. If the function returns the value eAbortSleep, then we restart the tick generator
timer and we immediately exit from the critical section by re-enabling all interrupts (line 45). If,
instead, the function returns the value eNoTasksWaitingTimeout, it means that there are no running
threads, no software timers⁴¹ or other threads blocked with a definite timeout. Since there is no need
to preserve the tick count accuracy in this case (no timers, no running threads, no timeouts), we can
so enter in stop mode, which will cause that the timer clock is gated. The MCU will exit from the
StopMode() routine when an external interrupt wakes up the MCU.
If, instead, the eTaskConfirmSleepModeStatus() function returns the value eStandardSleep, the
else at line 53 matches and we can sleep for a time equal to the xExpectedIdleTime parameter,
which corresponds to the total number of tick periods before a thread is moved back into the ready
⁴⁰This happens because this routine is called within an IRQ with the lowest possible priority, as seen before. So, a more privileged
IRQ may resume the execution of another blocked task.
⁴¹Please, take note that it is not sufficient we do not use timers in our code. The macro configUSE_TIMERS in FreeRTOSConfig.h
must be set to 0, otherwise the eTaskConfirmSleepModeStatus() never return the eNoTasksWaitingTimeout value.
Running FreeRTOS 502
state. The parameter value is therefore the time the microcontroller can safely remain in a low-power
state, with the tick interrupt temporarily suppressed. The timer ISR will wake up the MCU, exiting
from the SleepMode() routine and the global tick count is adjusted at line 74.
The above pseudo-code represents a schema that all programmers can use to implement their custom
tickless mode. For example, if we know that our software does not make use of software timers and
non-indefinite timeouts, then we can safely handle only the deep sleep mode case.
Now we are going to implement a custom tickless mode policy, analyzing real code made to work on
an STM32F030 MCU. Refer to the book example for other STM32 MCUs, even if the implementation
is almost the same.
Filename: src/tickless-mode.c
The first two functions we are going to analyze are related to the setup of the timer used as
tick generator and the handling of the related overflow interrupt. The prvSetupTimerInterrupt()
function is automatically invoked by FreeRTOS when the osKernelStart() routine is called. It
configures the TIM6 timer so that it expires every 1ms. The corresponding interrupt is enabled,
and the ISR priority is set to the lowest one (remember that, unless different needed, it is always
important to setup the timer ISR with the lowest priority). The HAL_TIM_PeriodElapsedCallback()
callback simply increases the global tick count by 1. Don’t care about the instructions at lines [65:68],
because they will be clear later.
Now we are going to analyze the most complex part: the vPortSuppressTicksAndSleep() function.
Running FreeRTOS 504
We will divide it in blocks, so that it is simpler to analyze its code. It is strongly suggested to keep
the real code in the IDE at your hands.
Filename: src/tickless-mode.c
The function starts checking if the expected idle time, that is the time window within we can safely
stop the tick generation, is less than the xMaximumPossibleSuppressedTicks: this value is computed
inside the prvSetupTimerInterrupt() routine according the given Prescaler and Period values.
Then, at line 91, it computes the Period value to use so that the timer will overflow after the
xExpectedIdleTime time. To avoid race conditions, we then enter in a critical section (line 94) and
we invoke the eTaskConfirmSleepModeStatus() to decide how to proceed in the tick suppression
procedure. If the function returns eNoTasksWaitingTimeout, then we can stop the TIM6 timer at all,
and we can enter in stop mode until the MCU is woken up by an event or an interrupt.
Filename: src/tickless-mode.c
135 else {
136 /* Stop TIM6 momentarily. The time TIM6 is stopped for is not accounted for
137 in this implementation (as it is in the generic implementation) because the
138 clock is so slow it is unlikely to be stopped for a complete count period
139 anyway. */
140 HAL_TIM_Base_Stop_IT(&htim6);
141
142 /* The tick flag is set to false before sleeping. If it is true when sleep
143 mode is exited then sleep mode was probably exited because the tick was
144 suppressed for the entire xExpectedIdleTime period. */
145 ucTickFlag = pdFALSE;
146
147 /* Trap underflow before the next calculation. */
148 configASSERT(ulCounterValue >= __HAL_TIM_GET_COUNTER(&htim6));
149
150 /* Adjust the TIM6 value to take into account that the current time
151 slice is already partially complete. */
Running FreeRTOS 506
too short).
Filename: src/tickless-mode.c
189 /* Allow the application to define some post sleep processing. This is
190 the standard configPOST_SLEEP_PROCESSING() macro, as described on the
191 FreeRTOS.org website. */
192 configPOST_SLEEP_PROCESSING( xModifiableIdleTime );
193
194 /* Re-enable interrupts. If the timer has overflowed during this period
195 then this will cause that the TIM6_IRQHandler() is called. So the
196 global tick counter is incremented by 1 and the ulTickFlag variable
197 is set to pdTRUE.
198 Take note that in the STM32L example in the official FreeRTOS
199 distribution interrupts are re-enabled after the TIM6 is stopped.
200 This is wrong, because it causes that the IRQ is leaved pending,
201 even if has been set. So we must first re-enable interrupts - this
202 causes that a pending TIM6 IRQ fires - and then stop the timer. */
203 __enable_irq();
204
205 /* Stop TIM6. Again, the time the clock is stopped for in not accounted
206 for here (as it would normally be) because the clock is so slow it is
207 unlikely it will be stopped for a complete count period anyway. */
208 HAL_TIM_Base_Stop_IT(&htim6);
209
210 if (ucTickFlag != pdFALSE) {
211 /* The MCU has been woken up by the TIM6. So we trap overflows
212 before the next calculation. */
213 configASSERT(
214 ulPeriodValueForOneTick >= (uint32_t ) __HAL_TIM_GET_COUNTER(&htim6));
215
216 /* The tick interrupt has already executed, although because this
217 function is called with the scheduler suspended the actual tick
218 processing will not occur until after this function has exited.
219 Reset the reload value with whatever remains of this tick period. */
220 ulCounterValue = ulPeriodValueForOneTick
221 - (uint32_t) __HAL_TIM_GET_COUNTER(&htim6);
222
223 /* Trap under/overflows before the calculated value is used. */
224 configASSERT(ulCounterValue <= ( uint32_t ) USHRT_MAX);
225 configASSERT(ulCounterValue != 0);
226
227 /* Use the calculated reload value. */
228 __HAL_TIM_SET_AUTORELOAD(&htim6, ulCounterValue);
229 __HAL_TIM_SET_COUNTER(&htim6, 0);
230
Running FreeRTOS 508
231 /* The tick interrupt handler will already have pended the tick
232 processing in the kernel. As the pending tick will be processed as
233 soon as this function exits, the tick value maintained by the tick
234 is stepped forward by one less than the time spent sleeping. The
235 actual stepping of the tick appears later in this function. */
236 ulCompleteTickPeriods = xExpectedIdleTime - 1UL;
237 } else {
238 /* Something other than the tick interrupt ended the sleep. How
239 many complete tick periods passed while the processor was
240 sleeping? */
241 ulCompleteTickPeriods = ((uint32_t) __HAL_TIM_GET_COUNTER(&htim6))
242 / ulPeriodValueForOneTick;
243
244 /* Check for over/under flows before the following calculation. */
245 configASSERT(
246 ((uint32_t ) __HAL_TIM_GET_COUNTER(&htim6)) >=
247 (ulCompleteTickPeriods * ulPeriodValueForOneTick));
248
249 /* The reload value is set to whatever fraction of a single tick
250 period remains. */
251 ulCounterValue = ((uint32_t) __HAL_TIM_GET_COUNTER(&htim6))
252 - (ulCompleteTickPeriods * ulPeriodValueForOneTick);
253 configASSERT(ulCounterValue <= ( uint32_t ) USHRT_MAX);
254 if (ulCounterValue == 0) {
255 /* There is no fraction remaining. */
256 ulCounterValue = ulPeriodValueForOneTick;
257 ulCompleteTickPeriods++;
258 }
259 __HAL_TIM_SET_AUTORELOAD(&htim6, ulCounterValue);
260 __HAL_TIM_SET_COUNTER(&htim6, 0);
261 }
262
263 /* Restart TIM6 so it runs up to the reload value. The reload value
264 will get set to the value required to generate exactly one tick period
265 the next time the TIM6 interrupt executes. */
266 HAL_TIM_Base_Start_IT(&htim6);
267
268 /* Wind the tick forward by the number of tick periods that the CPU
269 remained in a low power state. */
270 vTaskStepTick(ulCompleteTickPeriods);
271 }
272 }
When the MCU exists from the sleep mode, either because the timer has overflowed or another
interrupt has been generated, the configPOST_SLEEP_PROCESSING() macro allows us to perform
Running FreeRTOS 509
needed operations, such as restoring some peripherals or increasing the clock speed. Now the tricky
part takes place, and we need to careful explain the operation involved.
After the MCU ha exited from low-power mode, ISRs are unmasked by exiting critical section
(line 203). This will cause that the TIM6_IRQHandler() ISR is called if we have exited from the
sleep mode due to a timer overflow. When this happens the HAL_TIM_PeriodElapsedCallback()
function is called: this causes that the ucTickFlag is set to TRUE and the timer Period to the standard
value (29). If, instead, the MCU has exited from the low-power mode for another reason (for example,
it has been awakened by the UART_RX interrupt), the ucTickFlag is equal to FALSE.
The code checks the status of the ucTickFlag at line 210. If it is equal to TRUE, then the global tick
counter is increased for a value equal to xExpectedIdleTime minus one, because the tick counter
has been already incremented by the HAL_TIM_PeriodElapsedCallback() routine by one (the ISR
is called as soon as we leave the critical section at line 203). If, instead, it is equal to FALSE, then we
compute how long the MCU has spent in sleep mode and we increase the tick counter accordingly.
This policy could be adapted according your actual needs. For example, if you are working on an
STM32L platform you may consider to use a LPTIM timer during the stop mode, so that you can
know how many ticks are elapsed during the stop mode (a regular STM32 timer do not work in stop
mode).
The macro works so that if the assert condition is false then all interrupts are disabled (by setting
the PRIMASK register on Cortex-M0/0+ cores and rising the BASEPRI value in other STM32 MCUs)
and an infinite loop takes place. While this behaviour is ok during a debug session, it can be a source
of a lot of headaches if our device is not running under a debugger, because it is hard to say why
the firmware stopped working. So, this author prefers to define the macro in this other ways:
void __configASSERT(uint8_t x) {
if ((x) == 0) {
taskDISABLE_INTERRUPTS();
if((CoreDebug->DHCSR & 0x1) == 0x1) { /* If under debug */
HAL_GPIO_TogglePin(GPIOA, GPIO_PIN_5);
HAL_Delay(1000);
asm("BKTP #0");
} else {
HAL_GPIO_TogglePin(GPIOA, GPIO_PIN_5);
HAL_Delay(100);
}
}
}
The __configASSERT() function uses the Cortex-M CoreDebug interface to check if the MCU is
under debug: debuggers set the first bit of the Debug Halting Control and Status Register (DHCSR)
when the MCU is under debugging. If so, a software breakpoint is placed when the assert condition
is false. However, this function has two relevant limitations:
returns the status information of every thread in the system, by populating an instance of the
TaskStatus_t structure for each thread. The TaskStatus_t structures is defined in the following
way:
faster the timebase the more accurate the statistics will be - but also the sooner the timer value will
overflow.
When the configGENERATE_RUN_TIME_STATS macro is set to 1, we have to provide two additional
macros. The first one, portCONFIGURE_TIMER_FOR_RUN_TIME_STATS(), is used to setup the timer
needed for run-time statistics. The second one, portGET_RUN_TIME_COUNTER_VALUE(), is used by
FreeRTOS to retrieve the cumulative value of the timer counter. Since this timer needs to run really
fast, it is not suggested to setup its ISR and to increase a global variable when it expires: this would
affect the overall system performance. In STM32 MCUs providing a 32-bit timer it is sufficient to
use one of these, setting the Period to the maximum value (0xFFFFFFFF). Another alternative, on
Cortex-M3/4/7 consists in using the DWT cycle counter, as seen in Chapter 11. The following code
shows a possible implementation for the two macros:
#define portCONFIGURE_TIMER_FOR_RUN_TIME_STATS() \
do { \
DWT->CTRL |= 1 ; /* enable the counter */ \
DWT->CYCCNT = 0; \
}while(0)
We are now going to analyze a complete tracing implementation, which consists in having a
dedicated thread that prints on the UART2 interface statistic information when the Nucleo USER
button is pressed.
Filename: src/main-ex7.c
The code should be fairly easy to understand. When the USER button is pressed, this thread allocates
a buffer (pxTaskStatusArray) that will contain the TaskStatus_t structures for each thread in the
system. The uxTaskGetSystemState() at line 48 populates this array, and for each thread contained
in it some statistics are printed on the Nucleo VCP.
Whereas uxTaskGetSystemState() populates a TaskStatus_t structure for each thread in the
system, vTaskGetInfo() populates a TaskStatus_t structures for just a single task, and it can be
useful if we want retrieve information about a specific thread.
Finally, FreeRTOS provides some convenient routines to automatically format the raw data statistics
into a human readable (ASCII) format. For example, the vTaskGetRunTimeStats() formats the raw
data generated by uxTaskGetSystemState() into a human readable (ASCII) table that shows the
amount of time each task has spent in the running state (how much CPU time each task has
Running FreeRTOS 514
consumed). For more information, refer to this page⁴³ of the on-line FreeRTOS documentation.
14.10.1 ChibiOS
If you are not new to the STM32 platform, probably you already know about ChibiOS⁴⁴. ChibiOS
is an independent and open source project started by an ST Microelectronics engineer, Giovanni Di
Sirio, who works at the ST site in Naples (Italy). ChibiOS is quite popular in the STM32 community,
due to the fact that Giovanni has a deep knowledge of the STM32 platform, and this has allowed
to him to create probably one of the most optimized solution for STM32 microcontrollers, even if
ChibiOS is designed to run on other MCU architectures too.
ChibiOS is essentially composed by two layers: the kernel (named ChibiOS/RT) and a complete HAL
(named Chibios/HAL), which allows to abstract from the underlying hardware peculiarities. While
it is perfectly possible to mix the official ST CubeHAL with the ChibiOS/RT kernel, probably the
ChibiOS/HAL is a valid solution to program STM32 devices, at least for the supported peripherals.
Even if this author does not have a direct experience with it, ChibiOS has a really good reputation
among a lot of people he knows and some readers of this book. Moreover, you can find several
projects and good tutorials around in the web⁴⁵ based on this RTOS and its related HAL. Different
from the current production release of FreeRTOS, Chibios uses a full static memory allocation model,
allowing to use it in those application domains where dynamic allocation is prohibited. Finally,
Giovanni also provides a pre-configured version of Eclipse, named ChibiStudio, which ships all
required tools (GCC tool-chain, OpenOCD, etc.) already pre-configured. Unfortunately, it runs only
on the Windows OS at the time of writing this chapter.
The only relevant limit of ChibiOS is its license model. Recent releases of ChibiOS/RT kerel are
distributed under the GPL 3 (the HAL, instead, is distributed under the more permissive Apache 2.0
license), which prevents the usage of the software if you sell electronic devices without releasing
the firmware source code publicly. A “free commercial license” exists, but it requires a registration
process and it is limited to 500 MCU cores, which is a too small number of devices even for micro-
sized companies that may not be able to afford the price of the complete license.
14.10.2 Contiki OS
Contiki⁴⁶ is another open source RTOS, which has a strong accent on wireless low-power sensors
and IoT devices. It is a project started by Adam Dunkels in 2003, but it is currently supported by
⁴³http://www.freertos.org/rtos-run-time-stats.html
⁴⁴http://www.chibios.org/
⁴⁵http://www.playembedded.org/
⁴⁶http://www.contiki-os.org/
Running FreeRTOS 515
several large companies including Texas Instruments and Atmel. It is quite popular among CC2xxx
devices from TI. It is based on a kernel scheduler and an independent TCP/IP stack designed for low-
resources devices, which provides IPv4 networking, the uIPv6 stack and the Rime stack, which is a
set of custom lightweight networking protocols designed for low-power wireless networks. The IPv6
stack was contributed by Cisco and was, when released, the smallest IPv6 stack to receive the IPv6
Ready certification. The IPv6 stack also contains the Routing Protocol for Low power and Lossy
Networks (RPL) routing protocol for low-power lossy IPv6 networks and the 6LoWPAN header
compression and adaptation layer for IEEE 802.15.4 links.
ST provides an application note, the UM2000⁴⁷, which describes how to get started with the Contiki
OS on its microcontrollers, in conjunction with the SPIRIT transceiver to develop sub-1GHz wireless
devices.
Contiki is distributed with a BSD-style license, which allows to use its source code in commercial
applications without any form of limitations.
14.10.3 OpenRTOS
OPENRTOS is the commercial edition of FreeRTOS, described in this chapter and officially
supported by ST. OPENRTOS and FreeRTOS share the same code base. The additional value offered
by OPENRTOS is a “commercial and legal wrapper” for FreeRTOS users.
Developers upgrade to an OPENRTOS license for two main reasons: the ability to sell their devices
and/or to ship derived code without having to share source code publicly, and the dedicated support
in developing custom solutions based on OPENRTOS. For large companies the possibility to receive
paid support is really important.
⁴⁷http://bit.ly/1URnLZc
15. Getting started with a new design
If you use STM32 microcontrollers for work, or you are going to create your latest funny project
as a hobbyist, soon or later you will need to leave a development kit like the Nucleo, and you have
to design a custom board around a given STM32 MCU. For every hardware engineer this is always
an exiting process. You start from an idea, or a list of requirements, and you will obtain a piece of
hardware able to do magic things.
The development process of a new board can be divided in two main steps: the hardware design part,
related to components selection and placement, and the software development part, that consists in
a starting configuration and all the code needed to make the board working. This chapter aims to
provide a brief introduction to this topic. The chapter is logically divided in two parts: one related
to the hardware design and one to software. Even if you are one of those lucky people working
in companies where the hardware engineer is a separated figure from the firmware developer, it is
strongly suggested to have a look to this chapter, which is essentially based on the hardware design.
Otherwise, if you are the classical one man band¹, reading this chapter at least once could help you
if you are totally new to the STM32 world.
Read carefully
This chapter must be considered preliminary and subject to change. In an ideal world, it
would come to the end of the book, but several readers asked me to anticipate this topic
now (the first version of this chapter was released when the book had only eight chapters).
However, I will keep the right to expand and integrate it with other topics. Moreover, take
in account that it is based on the limited experience of this author, who is not exactly an
electronic engineer. Always refer to the official ST documentation related to the chosen
MCU before starting to design a new device.
516
Getting started with a new design 517
However, if you are going to make a new board with an STM32 MCU, you have to completely forget
this kind of design. This is because not only do not exist STM32 microcontrollers provided in a THT
package. These MCUs require that special attention must be placed to the PCB layout process, even
for the low value line STM32F030. The PCB design become really critical if you are planning to use
the fastest STM32 MCUs, like the F4 and F7 series, in conjunction with external devices like fast
QSPI memories and external SDRAM.
For each STM32 family, ST provides a dedicated datasheet named “Getting started with STM32xx
hardware development”. For example, for the STM32F4 family, the AN4488³ is the corresponding
document. It is strongly suggested to read carefully these documents, since they contain the most
important information to design a new PCB correctly. For all my designs based on these MCUs, I
have always followed the information provided by ST, and I have never had any issues. The next
paragraphs summarize the most important aspects and decision steps, according to me, to follow
during the design process of a new board based on an STM32 MCU.
• More layers simplify the routing process, and this is really important if you have space
constraints or if you need to route differential pair nets.
• They allow better routing for power as well as ground connections; if the power is also on a
plane, it is available to all points in the circuit simply by adding vias.
• They provide an intrinsic distributed capacitance between the power and ground planes,
reducing high-frequency noise especially if your board relies on an external SRAM or a fast
FLASH.
• For the same reason as before, they allow to significantly reduce EMI/RFI emissions, simpli-
fying the development cost and the CE/FCC certification phase.
However, 4-layers PCBs have a really higher cost compared to 2-layers ones, and this cost is often not
affordable for some low-cost and higher volumes productions. Moreover, it is right to say that the
Cortex-M portfolio (and hence the STM32 one) ranges from “low-cost” solutions able to run correctly
on 2-layers boards to more powerful MCUs really close to general purpose microprocessors (like the
Cortex-M7 series), which demand a more advanced PCB stackup.
My personal experience is based on PCB designed with STM32F030 and STM32F401 MCUs, both
implemented with 2-layers PCBs, and I had no significantly issues during the boards testing. Using
ground-planes on both layers allow to simplify the routing process and to reduce overall EMI
emissions of the board.
• They are easy to solder, even by hand for really low-volume productions or for prototypes.
With a little bit of practice, they can be soldered with the drag soldering technique⁶, or simply
placed on a PCB pre-covered with the solder paste using a stencil.
• They are easy to inspect using conventional Automatic Optical Inspection (AOI) machines,
and they do not require x-ray inspection, which increases the production cost of your boards.
• They cost less for low and mid-volume productions, compared to other type of packages.
• They can be used on 2-layers low class PCBs (even a pattern class equal to 6 is sufficient⁷),
different from other packages (like the BGA ones) that usually require more advanced PCB
due the use of vias with a really reduced annular ring.
⁶Youtube is full of videos that show how to solder SMD packages with this technique.
⁷Take a look to this document(http://bit.ly/1NVgYeI) from Eurocircuits to discover more about PCB production classes.
Getting started with a new design 519
• They provide a lot of signal I/O to interface external peripherals (this is obviously, but it is
always good to remark it).
However, if space is a strict requirement for your design, then you have to consider BGA and similar
packages, which offer more signal I/O in a smaller footprint.
• Each power couple (VDD, VSS) should be connected to a parallel ceramic capacitor of about
100nF (which is a widespread proven value) plus one 4.7uF ceramic capacitor for the overall
MCU. It is best to choose 0805 or smaller capacitors (the smaller is the better is, since smaller
capacitors have less ESR - for an STM32F7, 0402 capacitors is an option to consider). These
capacitors need to be placed as close as possible to the appropriate pins, or the underside of
the PCB if a BGA package is used for the fastest STM32 MCUs. If a ground plane is used, it
is safe to connect VSS pins directly to the ground plane if this is extensive in the area of that
pin.
• This author also uses a large electrolytic capacitor (typically 10 uF - a tantalum capacitor is
also OK if your budget allows it) no more than 3cm away from the chip. The purpose of this
capacitor is to be a reservoir of charge to supply the instantaneous charge requirements of the
circuits locally so the charge need not come through the inductance of the power trace.
• A small ferrite bead placed in series between the analog power supply (AVDD) and digital
power supply (VDD)⁸. It is used to:
– Localizes the noise in the system.
– Keeps external high frequency noise from the IC.
– Keeps internally generated noise from propagating to the rest of the system.
• If your STM32 MCU provides a VBAT pin, it can be connected to the external battery (1.65 V
< VBAT < 3.6 V). If no external battery is used, it is recommended to connect this pin to VDD
with a 100nF external ceramic decoupling capacitor.
Figure 2 shows the reference schematics of an STM32F030CC MCU, while Figure 3 shows the typical
layout style used by this author to proper decouple power pins. As you can see, a solid ground plane
ensures that decoupling capacitors are connected to the ground with the shortest possible path⁹.
This document¹⁰ from Texas Instruments is a good introduction to this.
⁸ST discourages the use of this ferrite if VDD is below 1.8V.
⁹However, keep in mind that the grounding scheme depends on the actual implementation. Some designs need a strong
separation between analog and digital ground, plus some EMC-friendly devices (like ferrite beads) to connect them. Welcome
to the “obscure” world of EMC :-)
¹⁰http://www.ti.com/lit/ml/sloa089/sloa089.pdf
Getting started with a new design 520
Figure 3: the preferred way by the author of this book to place decoupling capacitors
Getting started with a new design 521
15.1.4 Clocks
If your design needs an external clock source, either the LSE or HSE one, special attention must be
placed to the position of the external crystal and the selection of the capacitors used to match its load
capacitance (this value is established by the crystal manufacturer, and it must be carefully checked
during the selection process).
ST provides a really excellent guide (AN2867¹¹) about oscillator design. Summarizing that guide is
outside the scope of this paragraph, so I strongly suggest to have a look to that application note.
However, it is important to underline some things.
The most starting up errors (that is, the MCU does not want to properly boot in our final design
when the external crystal is used) arises from bad choice of the external capacitors and bad placing
of the crystal. For example, assuming a stray capacitance equal to 5pF and a crystal capacitance
equal to 15pF, the following formula can be used to compute the value of external capacitors:
C1,2 = 2(CL - Cstray ) = 2(15pF - 5pF) = 20pF.
Moreover, it is best to place the crystal as close as possible to the MCU pins, surrounding it by a
separated ground plane, in turn connected to the bottom ground plane, as shown in Figure 4 (the
bottom ground plane is not shown).
ST shows several “bad examples” in its Application Note. Moreover, all STM32 MCUs provide a
really useful feature to debug external oscillator issues: the Clock Security System (CSS). CSS is a
self-diagnostic peripheral that detects the failure of the HSE. If this happens, HSE is automatically
disabled (this means that the internal HSI is automatically enabled) and a NMI interrupt is raised
to inform the software that something is wrong with the HSE. So, if your board refuses to work
correctly, I strongly suggest you to write down the exception handler for NMI, as described in
Chapter 10. If the code hangs inside it, then there is a problem with your oscillator design.
Finally, consider that a lot of EMC issues come from bad placing of external clocks. Pay attention to
the instructions contained in the ST application note.
The most of STM32 MCUs allow to connect an external or internal clock source (a PLL,
the HSI or HSE and so on) to an output pin, called Master Clock Output(MCO). This is
useful in some application, where this clock source may be used to drive an external IC or
in audio applications. However, pay attention to avoid long traces between the MCU and
the device connected to the MCO pin. In this case you have to consider the MCU like a
normal clock source, and hence you have to pay attention both to the length of the trace
and to cross-talks between MCO and other adjacent or underlying traces.
¹¹http://bit.ly/1RFYZbZ
Getting started with a new design 522
Figure 4: a good design way to place external crystals using a separated ground plane
the Nucleo too) are designed so that you can disconnect the target MCU from the ST-LINK interface
and connect it to your board.
Figure 5 shows how to use a Nucleo as external debugger for a custom board. First, remove the two
jumpers from the CN2 pin header. Next, connect the PIN1 of SWD pin header to a VDD (3.3V or
lower) source of your custom board, PIN2 to the SWDCLK pin of the STM32 MCU in your board, PIN3
to the GND, PIN4 to SWDIO pin and finally the PIN5 to the NRST pin of the target STM32 MCU (this
step is optional, at least in theory). The connection may be easily done simply routing those signals
to a convenient pin header, which plays the role of debug port for your custom board.
Another useful feature to have on this debug port may be at least the USART TX pin of one of the
available MCU USARTs. This could help you a lot during the development process, using it to print
messages on a console to trace the firmware execution, even if it is not under debugging. Again, you
could use the Nucleo board to interface the target MCU TTL USART to the Nucleo VCP, connecting
USART pins to the CN3 connector on the Nucleo board, as shown in Figure 6. If so, you may need
to desolder SB13 and SB13 jumper on your Nucleo, or leave PA2 and PA3 of the target Nucleo MCU
floating.
Figure 6: the CN3 connector allow to use the ST-LINK VCP with any other USART
Getting started with a new design 524
Read carefully
As said before, the SWD interface requires just two pins. These are named SWDIO and
SWDCLK. You can easily identify them using CubeMX (more about this later), or down-
loading the right datasheet for your MCU. However, it is strongly suggested to use also
the NRST pin for debugging. This is required because the STM32 microcontrollers allow
to change the function of SWD pins, both for wanted design reason and for an invalid
firmware state after a fault condition (e.g. a an invalid memory access has corrupted the
peripheral memory). Without routing the NRST signal to the debug port, it is impossible to
connect to the target MCU “under reset”, that is resetting the MCU just few CPU cycles
before the MCU is placed under debug. This will really help you in some critical situations.
So, to resume, always route to the custom “debug connector” on your board at least
SWDIO, SWDCLK and NRST pins, plus VDD and GND.
finished, and gerber files are sent to the PCB fab, you start developing the firmware (this is what
often happens especially if you have to complete the project one day before you start developing
it). After a while, you discover that the 8k of SRAM provided by this MCU are not sufficient for
your project. So, you decide to switch to the STM32F030RC model, which provides 32K of SRAM
and 256K of internal FLASH. However, after struggling several hours trying to understand why you
cannot flash it, you discover that this model requires four additional power sources (PF4, PF5, PF6
and PF7), as you can see in Figure 7.
Figure 7: the STM32F030RC MCU requires four additional power sources compared to the STM32F030R8 one
So, how to avoid these kind of mistakes? The best option is to plan for the worst case. In this specific
case you may do a layout of your board that connects those pins (PF4, PF5, PF6 and PF7) to power
sources even if you are going to use the STM32F030R8 model (being those pins regular I/O pins, it
is ok to connect them to VDD and VSS, in parallel with decoupling capacitors).
If you are designing a device that will enter deeper sleep modes, like the standby one, and you want
your device to be woken up by the user (maybe by pressing a dedicated button), then remember
that usually just two I/Os can be used to this task (they are called wake up pins). So, avoid to assign
those pins to other usages.
Once you have started a new project with this MCU, CubeMX shows you the MCU representation
in the Chip view, as shown below.
¹⁵Honestly speaking, what CubeMX generates is not so good from a project organization point of view.
Getting started with a new design 527
• You can quickly derive that your board will need 6 decoupling capacitors, 5 for the power
sources (4x100nF + 1x4.7uF) and 1x100nF for the NRST pin.
• PIN7 is the NRST pin and it must be decoupled.
• PIN44 is the BOOT0 pin and it must be pulled-down.
Read carefully
Never forget to tie to the ground the BOOT0 pin using a pull-down resistor (this reduce
the power leakage). It is a really common mistake for novices of this platform to leave that
pin unconnected, or worst connecting it to a voltage source. STM32 hardware designers
are divided in two groups: those that have forgotten to tie BOOT0 to the ground and those
that will forget to do it.
The next step involves enabling all the required peripherals, the LSE and the SWD interface, leaving
out the 5 GPIOs for the moment. We obtaining the following representation in CubeMX:
Getting started with a new design 528
Ok. Now it is the good time to start writing down the board schematics, connecting the other devices
to the MCU pins. Once you have completed this part of the schematics, you can start doing the layout
process. In this phase, you discover that it is not simple to route the SPI1 signals to PA5, PA6 and
PA7. So, doing a Ctrl+Click on the SPI1 signals you discover that you can remap them to PB3, PB4
and PB5, obtaining the following representation:
Now you can update your schematics and hence complete the layout of this part. Once the layout
is almost complete, you can assign the 5 GPIO to the MCU pins, deciding which one best fits your
layout. This is the reason why CubeMX can be used iteratively.
Another important thing regarding CubeMX is the ability to give custom names to signals. This is
really important to “document” your project. Moreover, always use the same label to give names to
Getting started with a new design 529
• A good layout is all about component placing: if you are new to this task, remember that all
starts from placing components on the final board. Every board can logically and physically
divided in sub-modules: power part, MCU and digital part, analog part and so no. Don’t start
routing signals before you have placed all components on the final board. Moreover, a good
subdivision in sub-modules allows you to reuse design for different boards.
• Follow these steps when doing the layout of an STM32 MCU:
– start placing the MCU;
– if your board need external clock sources, place them immediately close to the MCU
pins;
– next place all decoupling capacitors needed;
– connect power sources to the corresponding power lines or power planes if your layer
stackup allows them;
– never forget to tie to the ground BOOT0 pin if needed, and to decouple NRST pin;
– if your design need an external SRAM or a fast FLASH memory, start placing them and
route differential pair first;
– route all high speed signals;
– route remaining signals;
– avoid to use too many vias during the signal routing and use CubeMX looking for better
alternatives (that is, use other equivalent signal I/Os if possible).
If you have already developed the firmware using a development board, and you need to adapt it to
your custom design, you may proceed in this way:
• Generate a fresh new CubeMX project both for your development board (e.g. the Nucleo-
F030), enabling the needed peripherals, and for the custom board you have designed.
• Do a comparison between the initialization routines for the used peripherals: if they differ,
start replacing them one by one in the project made for the development board, and do a
complete project compilation before to continue with the next peripheral. This will allow you
to keep the control of what is changing in your firmware.
• To simplify the porting process, never change the peripheral initialization code generated by
CubeMX, but use CubeMX to change peripheral settings.
• Try to use macros to wrap peripheral handlers. Once you change them, you only need to
redefine the macros (for example, if your firmware developed with the Nucleo uses the
USART2 peripheral, define a global macro in this way: #define USART_PERIPHERAL huart2
and base your code on that macro; if your new design uses the USART1, then you have to
redefine only that macro accordingly).
Remember that CubeMX essentially generates 5 or 6 files. If you reduce the modification to these
files at minimum, it will be easy to rearrange the code.
Having a minimum viable firmware made with a development kit helps a lot during the debugging
of your custom board. It happens really often that, during the testing of a new board, you are in
doubt if your issues arise from the hardware or the software. Knowing that the firmware works
simplifies the hardware debugging stage.
Appendix
531
A. Miscellaneous HAL functions and
STM32 features
This appendix chapter contains an overview of some HAL functions and STM32 features that makes
little sense to treat in a separate chapter.
void HAL_NVIC_SystemReset(void);
initiates a system reset of the MCU. It uses the void NVIC_SystemReset(void) provided by the
CMSIS package.
Unfortunately, the position in memory of this ID is not common to all STM32 microcontrollers,
but its memory mapped address changes between each STM32-series. Table 1 shows the memory-
mapped address of the Unique MCU ID for the MCUs equipping the Nucleos.
532
A. Miscellaneous HAL functions and STM32 features 533
For example, in an STM32F401xE MCU it is mapped at 0x1FFF 7A10. To access to the unique ID we
can use the following code fragment:
...
uint32_t *uniqueID = (uint32_t*)0x1FFF7A10;
This happens because the GNU ARM plug-in cannot locate the GNU cross-compiler folder. To
address this issue, open the Eclipse preferences clicking on the Window->Preferences menu, then
go to C/C++->Build->Global Tools Paths section. Ensure that the Build tools folder path points to
the directory containing the Build Tools (C:\STM32Toolchain\Build Tools\bin if you followed the
instructions in Chapter 3, or arrange the path accordingly), and the Toolchain folder paths point to
the GCC ARM installation folder (C:\STM32Toolchain\gcc-arm\bin). The following image shows
the right configuration:
534
B. Troubleshooting guide 535
To check the used breakpoints in your application, go to the Debug perspective, then in the
Breakpoints pane (see figure below) and disable or delete unneeded breakpoints.
Eclipse needs to reload ARM assembly instructions at every steps (one C instruction can correspond
to a lot of assembly instructions), and this really slows down the debugging session. It is not an
issue related to OpenOCD or the ST-LINK interface, but instead is just an overhead connected with
Eclipse. Switch to another view (or simply close the Disassembly view) to resolve the issue.
To resolve this issue we need to distinguish between two cases: if you are developing the firmware
for a development board like the Nucleo or for a custom designed board (this difference is just to
simplify the analysis).
If you are developing the firmware using a development board then, especially if you are new to
this platform (but tiredness can play nasty tricks even to experienced users…), probably two things
may be wrong:
• The definition of memory sections inside the linker script mem.ld file is wrong, either for the
FLASH region or the SRAM region (usually, the FLASH region simply does not start from
0x08000000).
• The startup file is wrong or simply you forgot to rename its extension from lower .s to capital
.S.
If, instead, you are developing the firmware for a custom board, then besides controlling the previous
two points you must also check that:
• The configuration of BOOT pins is right (at least BOOT0 pin tight to ground, BOOT1 floating).
• The NRST pin is correctly decoupled using a 100nF capacitor.
Sometimes it happens that, even if all the previous points are correct, the micro still refuses to
boot. This often suddenly happens after a debug session, or after you have tested a buggy firmware
designed to access in write mode to the internal FLASH memory. If so, probably you have a corrupted
Option bytes memory region. The ST-LINK Utility can help you a lot to debug this situation. Once
you have connected the ST-LINK debugger, go to Target->Option Bytes menu and check that BOOT
configuration correctly matches your MCU.
B. Troubleshooting guide 538
Finally, sometimes a full chip erase may also help in solving obscure booting issues ;-)
C. Nucleo pin-out
In the next paragraphs, you can find the correct pin-out for all Nucleo boards. The pictures are taken
from the mbed.org website¹⁶.
Nucleo Release
Nucleo-F446RE
Nucleo-F411RE
Nucleo-F410RB
Nucleo-F401RE
Nucleo-F334R8
Nucleo-F303RE
Nucleo-F302R8
Nucleo-F103RB
Nucleo-F091RC
Nucleo-F072RB
Nucleo-F070RB
Nucleo-F030R8
Nucleo-L476RG
Nucleo-L152RE
Nucleo-L073RZ
Nucleo-L053R8
¹⁶https://developer.mbed.org/platforms/?tvend=10
539
C. Nucleo pin-out 540
Nucleo-F446RE
Morpho headers
C. Nucleo pin-out 541
Nucleo-F411RE
Morpho headers
C. Nucleo pin-out 542
Nucleo-F410RB
Morpho headers
C. Nucleo pin-out 543
Nucleo-F401RE
Morpho headers
C. Nucleo pin-out 544
Nucleo-F334R8
Morpho headers
C. Nucleo pin-out 545
Nucleo-F303RE
Morpho headers
C. Nucleo pin-out 546
Nucleo-F302R8
Morpho headers
C. Nucleo pin-out 547
Nucleo-F103RB
Morpho headers
C. Nucleo pin-out 548
Nucleo-F091RC
Morpho headers
C. Nucleo pin-out 549
Nucleo-F072RB
Morpho headers
C. Nucleo pin-out 550
Nucleo-F070RB
Morpho headers
C. Nucleo pin-out 551
Nucleo-F030R8
Morpho headers
C. Nucleo pin-out 552
Nucleo-L476RG
Morpho headers
C. Nucleo pin-out 553
Nucleo-L152RE
Morpho headers
C. Nucleo pin-out 554
Nucleo-L073R8
Morpho headers
C. Nucleo pin-out 555
Nucleo-L053R8
Morpho headers
D. STM32 packages
Here you will find the most common packages used for STM32 MCU. They are here only as quick
reference. The images are taken from official ST Microelectronics datasheets. They are therefore
copyright of ST Microelectronics.
LFBGA
LQFP
556
D. STM32 packages 557
TFBGA
TSSOP
UFBGA
D. STM32 packages 558
UFQFPN
VFQFP
WLCSP
D. STM32 packages 559
E. History of this book
Being this an in-progress book, it is interesting to publish a complete history of modifications.
• Changed the Table 1 in Chapter 1: it wrongly stated that Cortex-M0/0+ allows 16 external
configurable interrupts. Instead, it is 32.
• Paragraph 1.1.1.6 wrongly stated that the number of cycles required to service an interrupt is
12 for all Cortex-M processors. Instead it is equal to 12 cycles for all Cortex-M3/4/7 cores, 15
cycles for Cortex-M0, 16 cycles for Cortex-M0+.
• Fixed a lot of errors in the text. Really thanks to Enrico Colombini (aka Erix - http://www.
erix.it) who is doing this dirty job.
• Changed again the Table 1 in Chapter 1: it did not indicate which Cortex exceptions are not
available in Cortex-M0/0+ based processors.
• Added several remarks to Chapter 4 (thanks again to Enrico Colombini) that better clarify
some steps during the import of CubeMX generated output in the Eclipse project. Moreover,
it is better explained why the startup file differs between Cortex-M0/0+ and Cortex-M3/4/7
processors.
560
E. History of this book 561
• Changed in Chapter 4 (∼pg. 140) the description of project generated by CubeMX, since ST has
updated the template files after this author submitted a bug report. Now the code generated
is generic and works with all Nucleo boards (even the F302 one).
• Tool-chain installation instructions have been successfully tested on Windows XP, 7, 8.1 and
the latest Windows 10.
• Added in chapter 4 the description of the CubeMXImporter, a tool made by this author to
automatically import a CubeMX project into an Eclipse project made with the GNU ARM
plug-in.
• Chapter 9 about how to start a new custom design with STM32 MCUs.
E. History of this book 562
• Better clarified in paragraphs 7.1 and 7.2 the relation between NVIC and EXTI controller.
• In chapter 9 clarified that the BusMatrix also allows to automatically interconnect several
peripherals between them. This topic will be explored in a subsequent chapter.
• Clarified at page 266 that the we have to enable the DMA controller, using the macro __-
DMA1_CLK_ENABLE(), before we can use it.
• The Figure 4 in Chapter 1, and the text describing it, was completely wrong. It wrongly placed
the boot loaders at the beginning of code area (0x0000 0000), while they are contained inside
the System memory. Moreover, the role of the aliasing of FLASH addresses is better clarified,
both there and in Chapter 7.
• Better clarified the role of I-Bus, D-Bus and S-Bus in Chapter 9.
• Fixed several errors in the text. Really thanks to Omar Shaker who is helping me.
This release also better introduces the whole Nucleo lineup in Chapter 1. Moreover, BB-8 droid by
Sphero is now among us. We welcome BB-8 (can you find it? :-)).
E. History of this book 563
• In paragraph 4.1.1.2 the meaning of each IP Tree pane symbol has been better clarified.
• Fixed several errors in the text. Again, really thanks to Omar Shaker who is helping me.
• The GCC tool-chain has been updated to the latest 5.2 release. There is nothing special to
report.
• The paragraph 9.2.6 has been updated: after several tests, I reach to the conclusion that the
peripheral-to-peripheral transfer is possible only if the bus matrix is expressly designed to
trigger transfers between the two peripherals.
• The paragraph 9.2.7 has been completely rewritten to better specify how to use the HAL_UART
module in DMA mode.
• Added the paragraph 9.4 that explains the correct way to declare buffers for DMA transfers.
• Added the paragraph 10.1.1.1 about the MSI RC clock source in STM32L MCUs.
• Added the paragraph 10.1.3 about clock source options in Nucleo boards.
• Added in Appendix C the Nucleo-L073 and Nucleo-F410 pinout diagrams.
E. History of this book 564
• Installation instructions have been updated to the latest CubeMX 4.14, which now officially
supports MacOS and Linux.
• Added another figure in Chapter 7 (the actual Figure 20), which better explains what happens
when the priority grouping is lowered from 4 to 1 in that example. Thanks to Omar Shaker
that helped me in refining this part.
• Paragraph 11.3.10.4 has been completely rewritten to better describe the update process of
TIMx->ARR register.
• Clarified in Chapter 9 that, when using the UART in DMA mode, it is also important to enable
the corresponding UART interrupt and to add a call to the HAL_UART_IRQHandler() from the
ISR.
• Added an Eclipse intermezzo at the end of Chapter 6: it shows how to customize Eclipse
appearance with themes.
• Added paragraph 12.3.3 regarding an important issue encountered with STM32F103 MCUs.
• Now the book has a brand new and professionally designed cover ;-)