0% found this document useful (0 votes)

32 views142 pages

Fhtec Da Lang

This master thesis focuses on model checking and static analysis of Intel MCS-51 assembly code with the [mc]square framework. The presented abstraction technique centers around the coherence among boolean operators with particular regard to the 3-valued microcontroller memory model. A novel data-flow analysis termed Register Bank Analysis is described in order to handle register bank swapping.

Uploaded by

bjrj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views142 pages

Fhtec Da Lang

Uploaded by

bjrj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 142

DIPLOMA THESIS

for an academic degree "Master of Science in Engineering"

Model Checking and Static Analysis of Intel MCS-51 Assembly Code

by Thomas Reinbacher, BSc A-2020 Kleinstelzendorf, Weidenweg 44 1. Examiner: FH-Prof. Dipl.-Ing. Dr. Martin Horauer 2. Examiner: Ing. Dipl.-Ing. Michael Kramer Vienna, June 3rd, 2009

Written at the University of Applied Sciences Technikum Wien Master Degree Programme Embedded Systems

Adavit
I hereby declare by oath that I have written this paper myself. Any ideas and concepts taken from other sources either directly or indirectly have been referred to as such. The paper has neither in the same nor similar form been handed in to an examination board, nor has it been published.

Place, Date

Signature

Abstract
Verication of embedded systems software is crucial for providing awless functionality of nowadays intelligent computer systems found in automobiles, elevators, aircrafts, medical devices, robots, etc. The common approach most widely used in industry relies on testing of dened corner cases. Although everyone is aware of the fact that only a very limited set of the test space can be covered in this way, no other more complete approaches have been widely adopted so far. Formal verication methods such as model checking complemented with various techniques to reduce state spaces has recently gained some momentum in this regard. Nevertheless, formal verication of embedded systems still played a minor role in the past. Practical restrictions of this approach are (i) due to the problem to (manually) create a model of the system beforehand and (ii) due to the resulting large state spaces. This master thesis focuses on model checking and static analysis of Intel MCS-51 assembly code with the [mc]square framework. In the presented approach, issue (i) is solved by using a dedicated target CPU simulator. In order to tackle (ii) existing abstraction techniques are adapted for the Intel MCS-51 target architecture. A novel state space reduction technique termed Delayed Nondeterminism with Look Ahead is introduced. The presented abstraction technique centers around the coherence among boolean operators with particular regard to the 3-valued microcontroller memory model. Besides, the Intel MCS-51 CPU simulator is integrated into the existing static analysis framework of [mc]square. A novel data-ow analysis termed Register Bank Analysis is described in order to handle register bank swapping. Register bank swapping is a particular feature of some embedded microcontrollers such as the Intel MCS-51. This approach allows narrowing and rening the subsequent data-ow analyses, leading to more precise analysis results. The additional precision in turn contributes to a reduction of state spaces during model checking. In order to evaluate the benets and to show the applicability of the introduced concepts, a real world case study is conducted. The case study source code is taken from an industrial application. The microcontroller software is model checked with [mc]square by taking advantage of the presented state space abstractions and static analysis techniques.

Keywords: Assembly code model checking, static analysis of assembly code, abstraction techniques, case study, [mc]square

Kurzfassung
Die Verikation von Software fr Embedded Systems ist ein notwendiges Kriterium um die fehlerfreie Funktion von intelligenten Computersystemen in Automobilen, Aufzgen, Flugzeugen, medizintechnischen Gerten, Robotern, usw. zu garantieren. Die in der Industrie weit verbreitete Standardmethode beruht auf dem Abdecken von einigen wenigen reprsentativen Testfllen. Es ist bekannt, dass dieser Ansatz nur eine sehr kleine Menge des tatschlichen Testraums abdecken kann, trotzdem gibt es nur wenig ausgereifte Konzepte um diesen Verikationsmistand zu beseitigen. Formale Verikationsmethoden wie Model Checking sind vielversprechende Anstze um die Fehlerfreiheit von Software zu zeigen. Im Kontext von Embedded Systems spielten diese formalen Anstze in der Vergangenheit nur eine untergeordneter Rolle. In der Praxis zeigen sich Schwierigkeiten durch (i) die manuell durchgefhrte Modellierung des Systems und (ii) die unhandbar groen Zustandsrume. Diese Masterarbeit beschftigt sich mit Model Checking und Statischer Analyse von Intel MCS-51 Assembler Code unter Zuhilfenahme des [mc]square Frameworks. Der vorgestellte Ansatz versucht das Problem (i) durch einen speziellen Mikrocontrollersimulator zu lsen. Bestehende Abstraktionstechniken werden fr den Intel MCS-51 Mikrocontroller angepasst um die entstehenden Zustandsrume zu minimieren (ii). Eine neue Zustandsreduktion namens Delayed Nondeterminism with Look Ahead wird vorgestellt. Dieser Ansatz basiert auf den Zusammenhngen zwischen Boolescher Logik und dem dreiwertigen Speichermodell des Mikrocontrollersimulators. Weiters wird der vorhandene Intel MCS-51 Simulator in das Statische Analyse Framework von [mc]square integriert. Eine neuartige Datenussanalyse (Register Bank Analysis) wird entwickelt um das architekturbedingte Umschalten von Registerbnken zu bercksichtigen. Dieser Ansatz erlaubt es die nachfolgenden Analyseergebnisse einzugrenzen und zu przisieren. Diese gewonnene Przision erlaubt eine weitere Zustandsreduktion whrend des Model Checkings. Um die Vorteile und die Anwendbarkeit der vorgestellten Konzepte zu demonstrieren wird eine Fallstudie vorgestellt. Die Software der Fallstudie stammt aus einer industriellen Anwendung. Das Mikrocontrollerprogramm wir unter Zuhilfenahme der vorgestellten Abstraktionstechniken und statischen Analysen mit dem [mc]square Model Checker veriziert.

Schlagwrter: Assembler Code Model Checking, Statische Analyse von Assembler Code, Abstraktionstechniken, Fallstudie, [mc]square

Acknowledgements
Not because it is customary, but because it is appropriate: I would like to thank my advisor, FH-Prof. Dr. Martin Horauer, for his excellent guidance and for giving me the opportunity to join one of his research projects within the Department of Embedded Systems at the University of Applied Sciences FH Technikum Wien. He allowed me a great degree of freedom in my work and kindly helped me to gain ground in academic work. Most valuable to me were his numerous tips, his pragmatic approach of doing things, and our fruitful discussions both related and unrelated to work. I highly enjoyed the time working together. Next, I want to thank Dr. Bastian Schlich and his team from the Embedded Software Laboratory at the RWTH Aachen University. Even though we were most time geographically separated, he greatly contributed to set up a smooth and rich collaboration. He was always willing to listen to my problems and gave me plenty of support to get started with model checking and [mc]square. Last but denitely not least I thank my family and friends. They supported me in everything I did and greatly helped me to make my way.

Thomas Reinbacher Vienna, June 2009

Contents
1 Motivation and Introduction 2 Contribution 2.1 Status Quo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Thesis Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Long-Term Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Background 3.1 A World Where Nothing Works and Nobody Knows Why . . 3.2 Formal Verication . . . . . . . . . . . . . . . . . . . . . . . . 3.3 The Verication Problem . . . . . . . . . . . . . . . . . . . . 3.4 Model Checking . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 The Model Checking Problem . . . . . . . . . . . . . . 3.4.2 The Kripke Structure . . . . . . . . . . . . . . . . . . 3.4.3 The Temporal Logic CTL . . . . . . . . . . . . . . . . 3.4.4 The Model Checking Workow . . . . . . . . . . . . . 3.4.5 Coee Vending Machine Example . . . . . . . . . . . . 3.4.6 Local vs. Global Model Checking Algorithms . . . . . 3.4.7 The Pros and Cons of Model Checking . . . . . . . . . 3.5 Assembly Code Model Checking and [mc]square . . . . . . . 3.6 C51Simulator Intel MCS-51 Simulator Component . . . . . 3.6.1 The Intel MCS-51 Microcontroller . . . . . . . . . . . 3.6.2 The Big Picture . . . . . . . . . . . . . . . . . . . . . 3.6.3 Test and Verication of the C51Simulator Component 3.6.4 The Software Architecture of the C51Simulator . . . . 3.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7.1 The Assembly Code Model Checking Approach . . . . 3.7.2 3-valued Abstraction Techniques . . . . . . . . . . . . 3.7.3 Static Analysis . . . . . . . . . . . . . . . . . . . . . . 3.7.4 Simulators for [mc]square . . . . . . . . . . . . . . . 1 3 3 3 4 5 5 6 7 8 9 9 9 11 12 13 13 14 16 16 18 18 19 21 21 22 23 23 25 25 25 26 28 30 30 31 38

. . . . . . . . . . . . . . . . . . . . . .

4 Abstraction Techniques 4.1 Abstraction in Model Checking . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Reducing System Complexity through Abstraction . . . . . . . . . . 4.1.2 Turings Halting Problem and Why Model Checking Works Anyway 4.1.3 Nondeterministic Behavior in Assembly Code Model Checking . . . . 4.2 Implementation Abstraction Techniques for the C51Simulator . . . . . . . 4.2.1 Delayed Nondeterminism . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Delayed Nondeterminism with Look Ahead . . . . . . . . . . . . . . 4.2.3 Nondeterministic Program Status Word . . . . . . . . . . . . . . . .

5 Static Analysis 5.1 Background Static Analysis of Embedded Systems Code . . . . . . 5.1.1 Control Flow Graphs . . . . . . . . . . . . . . . . . . . . . . . 5.1.2 Data-ow Analysis . . . . . . . . . . . . . . . . . . . . . . . . 5.1.3 Forward Data-ow Analysis - RDA . . . . . . . . . . . . . . . 5.1.4 Backward Data-ow Analysis - LVA . . . . . . . . . . . . . . 5.1.5 Solving Data-ow Equations . . . . . . . . . . . . . . . . . . . 5.2 Implementation Static Analysis for the C51Simulator . . . . . . . . 5.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Control Flow Graph Building . . . . . . . . . . . . . . . . . . 5.2.3 Action List Building . . . . . . . . . . . . . . . . . . . . . . . 5.2.4 Live Variable Analysis . . . . . . . . . . . . . . . . . . . . . . 5.2.5 Reaching Denitions Analysis . . . . . . . . . . . . . . . . . . 5.2.6 Register Bank Analysis . . . . . . . . . . . . . . . . . . . . . 5.2.7 Stack Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.8 Interrupt Flag Analysis . . . . . . . . . . . . . . . . . . . . . 5.2.9 Path Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.10 Implementation Summary . . . . . . . . . . . . . . . . . . . . 5.3 Remaining Challenges in Static Analysis of (Intel MCS-51) Assembly 5.3.1 Indirect Addressing . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Indirect Control Flow . . . . . . . . . . . . . . . . . . . . . . 5.3.3 Self-Modifying Code . . . . . . . . . . . . . . . . . . . . . . . 5.3.4 Loop Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Real Life Case Study 6.1 Introduction and Motivation . . . . . . . . . . . . . . . . . . . . 6.2 The Knitting Machine Monitoring Device Hardware Overview 6.3 The Knitting Machine Monitoring Device Software Overview 6.3.1 The Main Bulding Blocks . . . . . . . . . . . . . . . . . 6.3.2 Serial Receive and Transmit Ringbuer . . . . . . . . . 6.3.3 The Communication Protocol . . . . . . . . . . . . . . . 6.4 Extracting CTL Properties Out of the Textual Specication . . 6.4.1 The Given Textual Specication . . . . . . . . . . . . . 6.4.2 CTL Properties . . . . . . . . . . . . . . . . . . . . . . . 6.4.3 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.4 Reviewing Properties #4a to #8a . . . . . . . . . . . . 6.4.5 Communication Protocol Verication . . . . . . . . . . . 6.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.1 Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.2 Stack Analysis . . . . . . . . . . . . . . . . . . . . . . . 6.5.3 The Circular Buer Implementation . . . . . . . . . . . 6.5.4 The Receiver State Machine . . . . . . . . . . . . . . . . 6.5.5 Properties #4a to #8a . . . . . . . . . . . . . . . . . . . 6.5.6 The Communication Protocol . . . . . . . . . . . . . . . 6.5.7 Compiler Criticism . . . . . . . . . . . . . . . . . . . . . 6.5.8 Comparison of Abstraction Techniques . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Code . . . . . . . . . . . . . . . . . . . .

43 43 44 45 45 47 51 52 53 54 56 57 58 59 66 67 67 68 68 68 69 72 72 73 75 75 76 78 79 80 81 81 81 81 89 90 91 99 99 99 99 101 101 101 103 103

. . . . . . . . . . . . . . . . . . . . .

7 Remaining Challenges and Future Work 7.1 Local Model Checking and Resulting Counterexamples . . . 7.2 Getting the Intel MCS-51 Simulator Implementation Right . 7.3 The Automatic Generated Target Simulator . . . . . . . . . 7.4 Counterexample Validation . . . . . . . . . . . . . . . . . . 7.5 Coping the State-Explosion Problem . . . . . . . . . . . . . 8 Conclusion Bibliography List of Figures List of Tables List of Algorithms List of Listings List of Abbreviations

. . . . .

105 105 105 106 106 107 109 111 120 121 123 125 127

III

1 Motivation and Introduction

It is fair to state, that in this digital era correct systems for information processing are more valuable than gold. (Henk Barendregt)

Embedded Systems are becoming ubiquitous. Most existing intelligent computer systems do not even have a screen or input devices. They are embedded and therefore hidden in various kinds of objects: automobiles, elevators, aircrafts, medical devices, industrial robots etc. The demand of eciency and exibility in information processing, leads to a movement from manual, mechanical, and hydraulic systems towards highly integrated embedded solutions. Each day, we are putting an increasing trust in these software and hardware systems. Software is the main enabler for innovative features and new application areas and most times the elaborate part of the system. However, a fact that is often overseen is the natural imperfection of the design team involved in the software implementation process. The ever increasing system complexity is another contributor to the vulnerability of state of the art embedded systems [1]. The assembly code that was written for the rst moon landing in 1969 was the estimated equivalent of 7500 lines of C code. The code had to t into the few kByte of program memory featured by the mission computer [2]. Nowadays embedded solutions can scale up easily to an amount of a few million lines of code. A lot of things may have changed since then, but one decisive question remains: How to guarantee and prove that the software is working correctly, without any aws? Formal verication methods such as model checking, theorem proving, and abstract interpretation have gained some momentum in verifying those systems. Indeed, almost all notable software companies [3, 4, 5] have developed and deployed model checking tools to ensure design correctness. In 2008, the achievements of model checking were greatly honored when the Association for Computing Machinery (ACM) awarded the prestigious Turing Award the Nobel Prize in computer science to the pioneers in this eld: Edmund Clarke, Allen Emerson, and Joseph Sifakis. As of today, model based software development and formal verication is well established in most of todays software engineering processes. Nevertheless, formal verication has played a minor role in the context of embedded systems in the past. The reasons are manifold, e.g., past model checking tools were only capable of handling small designs with a few hundred lines of machine code and generating the required behavioral models is most times tedious, challenging, and error-prone. This is especially true for the area of embedded systems. Software written for embedded systems is always linked to a certain application and a target hardware platform. Microcontroller specic programming language extensions are used to access particular hardware features that cannot be enabled by high level

1 Motivation and Introduction

programming language syntax. Formal verication of the high level application code is often not sucient to master the verication challenges of high-reliable and safety critical applications. Target platform peculiarities make formal verication of existing embedded software a tough job. Recently, model checking of assembly code became the focus of research projects [6, 7, 8]. It has some remarkable advantages compared to model checking programs written in high level programming languages. The code that is deployed to the hardware is checked and not just an intermediate representation, thus, any errors introduced during the development process can be found (e.g., compiler errors, toolchain errors, wrong periphery setup, and errors not visible in the C code at all). First tools, such as [mc]square (Model Checking for Micro Controllers) [9] from the Technical University of Aachen emerged and proved their feasibility in research and academia. Although this approach seems promising to formally verify embedded software, further abstraction techniques are needed to mitigate the prevalent state-explosion problem.

2 Contribution
2.1 Status Quo
In 2004, the RWTH Aachen University started o research incentives towards a model checker for microcontroller assembly code. A rst architecture was proposed in [6], and a tool named [mc]square was developed. While early versions of the tool focused exclusively on model checking, static source code analysis gradually took over a major part in [mc]square. The initial target microcontroller supported was the ATMEL ATmega family. In 2007, the Department of Embedded Systems of the University of Applied Sciences Wien established a research cooperation [10] with the RWTH Aachen University. Henceforth, the Department of Embedded Systems was actively involved in assembly code model checking research as well as in the further development of [mc]square. One of the rst tasks was to extend [mc]square by an Intel MCS-51 simulator component, thus, allowing a wider area of application for the toolchain. Consequently, the Intel MCS-51 simulator integration brought along signicant know-how for microcontroller families that might be included in future versions of [mc]square. First research results were presented to the scientic community in the paper Challenges in Embedded Model Checking a Simulator for the [mc]square Model Checker [11] presented at the Symposium on Industrial Embedded Systems (SIES) 2008. More details and a rst example code veried by [mc]square using the Intel MCS-51 simulator were published in [12].

2.2 Thesis Contribution

The main contributions of the present master thesis are (i) the further development of the C51Simulator component as well as (ii) matured abstraction techniques for state space reduction. Furthermore, the C51Simulator component is (iii) integrated into the existing static analysis framework where the main focus lies on mastering architectural features of the Intel MCS-51 target. Finally, the feasibility of the [mc]square approach to assembly code model checking is demonstrated by (iv) formally verifying a real life industry application. The application source code is provided by an external company, which uses the source code in one of their products. The present master thesis introduces a novel and powerful abstraction technique termed Delayed Nondeterminism with Look Ahead [13] for state space reduction during model checking. Furthermore, a new data-ow analysis termed Register Bank Analysis [14] is presented to narrow and rene static analysis results for the Intel MCS-51 target. Moreover, limits and limitations of the [mc]square approach are pointed out and possible solutions to overcome existing shortcomings are discussed.

2 Contribution

2.3 Long-Term Vision

Assembly code model checking as every formal verication method aims at solving one of the biggest challenges in nowadays software development: obtaining awless and specication compliant source code, thus, guaranteeing applications and products working seamless within state-of-the-art, safety-critical, and highly reliable applications enabling all the comforts and services in our modern society. Thus, the further development of [mc]square towards an industrial applicable tool can be seen as a contribution to leverage the verication problem in nowadays embedded software development processes1 .

However, it would be foolhardy to state that tools such as [mc]square will ever become the holy grail of program verication. Nevertheless, without fail, they are a step in the right direction.

3 Background
If builders built buildings the way programmers wrote programs, then the rst woodpecker that came along would destroy civilization. (Gerald Weinbergs Second Law)

This chapter presents (theoretical) background related to formal verication and model checking. In what follows, the need of formal verication in the embedded systems domain is motivated by examples of famous software bugs. Next, a classication of formal verication methods is given and the term verication problem is dened. Later, the foundations of model checking are presented and the temporal logic CTL is covered. Then, advantages and disadvantages of model checking are discussed. Later on, the Intel MCS-51 simulator component of the [mc]square model checker is described. The chapter concludes with a summary of related work.

3.1 A World Where Nothing Works and Nobody Knows Why

Reliability is a major concern in nowadays software engineering processes. As system complexity is continually rising, traditional testing methods fail to cope the verication challenge. Software development, even for embedded systems, has become a global task where developers of various branch oces are involved. Geographically separated engineers are producing thousands or even million lines of code and perhaps have never seen each other. Quality control of modern software production processes has become signicantly dicult [15]. On the other hand, malfunction of software is costly in terms of failure of the application itself but also due to the resulting consequences, such as fatal accidents, loss of money, shutting down of vital systems, reputation loss, and repayments. With this in mind, a few selected, famous software failures are presented and briey discussed. Explosion of the Ariane 5 launcher on its maiden ight (1996). The maiden ight of the Ariane 5 launcher in June 1996 failed because of a malfunction in the control software. An untreated software trap caused the self-destruction of the rocket only 37 seconds after the launch. A failed data conversion from 64 bit long oating point to 16 bit long signed integer is arguable one of the most expensive software bugs in the aerospace industry (cf. [16]). Loss of the NASA Mars Climate Orbiter (1999). The spacecraft was intended to enter the Mars orbit at an altitude of 140-150 km above the surface. The navigation software failed, and caused the spacecraft to reach an altitude as low as 57 km. The spacecraft was destroyed by atmospheric stresses and friction at this low altitude. The

3 Background

root cause for the loss of the spacecraft was the failure to use imperial units instead of metric units, leading to an erroneous trajectory computed using this incorrect data (cf. [17]). US-Northeast blackout (2003). A massive power outage on August 14th , 2003, aected over 50 million people in northeastern USA and eastern Canada. A previously unknown software aw in a widely-deployed energy management system contributed to the devastating scope of the blackout. The software aw caused alarm systems to stall because of a race condition (cf. [18]). Toyota Prius software causes stopping and stalling on highways (2005). A software bug in the Electronic Control Module (ECM) causes Toyota Prius gas-electric hybrid cars to stall or shut down while driving at highway speeds. Approximately 75,000 vehicles were aected by this software bug (cf. [19]). Microsoft Excel multiplication bug (2007). Any multiplication evaluating to 65,535 will deliver incorrect results in early version of Microsoft Excel 2007. For instance, the multiplication of 850 by 77.1 results in 100,000 instead of the correct value of 65,535 (cf. [20]). A1 mobile network breakdown (2008). A software problem was responsible for the breakdown of the mobile network service in October 2008, aecting nearly 500,000 customers in Lower Austria and Vienna (cf. [21]). BB train ticketing machine selling single fare tickets for 3720.8 e (2008). A single fare ticket for the domestic railway line between Hollabrunn (Lower Austria) and Handelskai (Vienna) is normally sold for 6.8 e. However, in some rare cases the fully automatic ticket machine at the platform charges the passenger 3720.8 e. That happens only in case the language is changed from German to English before the ticket buying process is initiated (speaking from the authors own experience). Even when strictly abiding software programming rules and design guidelines, software is man-made and, therefore, may never be perfect. The development and use of methods attempting to remove man-made errors in software engineering is crucial to pave the way for further advances in software engineering. This is the ultimate goal of formal software verication. Hence, the formal approach of software verication may be seen as a major contributor to software correctness, reliability, and safety of present and future applications.

3.2 Formal Verication

Over the past decade we have learnt that software programs and hardware designs in general even after intensive testing eorts are containing bugs (see Section 3.1). More than half of the development time for modern embedded designs is spent on testing and debugging in order to approach a reliable design. While the software industry is rather supporting the development of improved testing methods, computer scientists tend to nd alternative approaches to close the predominant verication gap in modern designs. Numerous research endeavors propose formal verication as the answer to some verication issues. Formal verication has been one of the hot topics in computer science research for

3.3 The Verification Problem

more than four decades [22]. Figure 3.1 gives a rough classication of formal verication methods. The main concept behind formal verication relies on the observation that computer programs can be seen as mathematical objects with well-determined behavior. Mathematical logic is used to describe the desired behavior of the computer program which is subject to verication. The process of formally verifying a program is now to give a mathematical proof to show that the program works as specied.

Formal verication

Testing & simulation

Symbolic model checking

Explicit model checking

Model checking

Theorem proving

Figure 3.1: Formal verication methods classication. Basically, literature distinguishes two main areas of formal software verication approaches. The rst one is a rather mathematical related one, called theorem proving. In theorem proving, a proof of correctness is achieved through the derivation of a theorem. A short overview of theorem proving is given in [23]. However, software verication can also be achieved without explicitly establishing mathematical proofs. The more popular approach to formal verication is called model checking and is very well received in modern-day software development processes.

3.3 The Verication Problem

The verication problem can be simply stated as [22]: Given a program M and its specication determine whether or not the behavior of M meets the specication, i.e., does M |= hold? Alan Turing [24] formulated the problem in terms of Turing Machines. Given a Turing Machine T and a specication decide whether T will eventually halt, e.g., on a blank input tape. That leads to the halting problem which is proven to be algorithmically unsolvable. Although the halting problem is unsolvable, practical formal verication proves its strength in closing the verication gap since one usually focuses on nite state systems rather than reasoning about innite behavior (cf. Section 4.1.2). State of the art embedded designs are reaching an unprecedented level of complexity and

3 Background

the observed shift from stand-alone to real ubiquitous, pervasive, and networked safetycritical applications calls for eective methods to formally prove the correct behavior of a design. Until now, the advances in formal verication helped to successfully verify simple programs of moderate size that are used in safety critical applications. As recently pronounced by Hoare and Misra the forthcoming challenge in the eld of formal verication is seen as the process of merging the elaborated theoretical understanding of computer programs as well as existing tools in order to enable fully automatic verication of real life, large scale, and complex embedded designs. In [25], Hoare and Misra proclaim the verication grand challenge as an international project to construct a program verier that would use logical proof to give an automatic check of the correctness of programs submitted to it. What sounds for a moment out of touch with reality is, based on their assumptions, within the reach of the next 20 years. In their vision the verication grand challenge will lead to a tool that can be seen as the swiss army knife of formal verication, solving the verication challenge for future hardware and software designs. Hoare and Misra estimated more than a thousand person-years of eort to accomplish this project. To get an idea of the complexity of such a project: the Linux Kernel v2.6 one of the worlds largest software projects started its development back in 1991 and since then the development eort has gained an accumulated number of ve thousand person-years [26]. The verication grand challenge is undoubted an ambitious and catchy project, nevertheless, if it succeeds it will revolutionize the way how we develop (safety-critical) software and it will make essential contributions to reliability, safety, and trustworthiness of future software developments. In 2002, the US Department of Commerce estimated annual costs to the US economy of about 60 billion US dollars due to avoidable software errors [27]. Thus, producing error-free software is not only safe for people using those systems it is even highly economical advantageous. It is a long and steep way to the fully automatic, formal software verication and contributions made in order to achieve this ambitious goal come piece by piece. Hence, the work put into this thesis can be seen as a small step towards Hoare and Misras vision of a fully automatic software verication.

3.4 Model Checking

Model checking [28, 29] is an automatic, model-based, property verication approach with the aim to automatically verify nite state systems. The main concept behind model checking is basically a straightforward brute force exploration of the states of a given system to check whether the given system model satises a certain property (specication). Model checking was introduced in the early 1980s and pioneered independently by Clarke and Emerson [30] in the US and by Quielle and Sifakis [31] in France. Comprehensive research on (i) ecient search algorithms in order to ensure minimal eort when traversing system states and (ii) abstraction techniques to combat the stateexplosion problem was a major contributer to shift model checking from research applications to industry practice. With the ever increasing available computational power it is now possible to check systems ranging close to real life industry applications.

3.4 Model Checking

3.4.1 The Model Checking Problem

The model checking problem is an instance of the verication problem (cf. Section 3.3). Model checking provides an automated method for verifying concurrent (nominally) nite state systems that uses an ecient and exible graph search, to determine whether or not the ongoing behavior described by a temporal property holds of the systems state graph. The method is algorithmic and often ecient because the system is nite state, despite reasoning about innite behavior [32].

3.4.2 The Kripke Structure

In model checking, nite (nondeterministic) state machines are used to represent the behavior of the system. A special type of these state machines are Kripke structures. Every single state is labeled with Atomic Propositions (AP) which are boolean variables and the evaluations of expressions in that state. These expressions correlate to the particular system properties, e.g., boolean expressions over variables or registers. A Kripke structure M is represented as an ordered sequence of four objects:

M = (S, s0 , R, L) S: nite set of states s0 : initial state s0 S R: transition relation R S S L: interpretation function L : 2AP
The transition relation R species for each state whether and which successor states are possible, i.e., for each state s S there is a successor state s S. The interpretation function L labels each state with the set of AP that are true in that state. A path in the Kripke structure M from a state s is a sequence of states = s0 s1 s2 ... such that s0 = s and R(si , si+1 ) holds for all i 0 [28].

3.4.3 The Temporal Logic CTL

Computational Tree Logic (CTL) is a combination of a linear temporal logic and a branching-time logic and was proposed by Clarke and Emerson in 1980 [33]. The model of time is a tree-like structure in which the future is not determined. In model checking, temporal logic is used to express the systems specication, i.e., the property . In a linear temporal logic, various operators are provided to describe events along a single computation path. In contrary, a branching-time logic provides operators to quantify over a set of states that are successors of a given (the current) state. CTL combines these two kinds of operators and properties are therefore constructed from path quantiers and temporal operators. CTL Path Quantiers A for All paths from a certain state on E there Exists at least one single path leaving from a certain state

3 Background

CTL Temporal Operators X holds neXt time F holds sometime in the Future G holds Globally in the future p U p holds Until holds In CTL, a temporal operator always must be preceded by a path quantier. The suggestion of using temporal logic for reasoning about ongoing concurrent programs (reactive systems) goes back to Pnueli in 1977 [34]. This thesis focuses exclusively on CTL model checking. A survey on other temporal logics is given in [35]. A few examples of common CTL expressions are given in Figure 3.2. Another well received temporal logic is Linear Temporal Logic (LTL). Whereas CTL considers the whole computation tree, LTL does only consider individual runs of the automata. Thus, CTL allows to reason about the branching behavior, considering multiple possible runs at once. However, CTL and LTL have a large overlap, thus, a considerable number of properties are expressible in both temporal logics. Although they have a common superset, namely Computational Tree Logic* (CTL*), not all properties can be expressed in both logics. For instance, a property commonly known as resetability is expressed in CTL as AG (EF ) from any state there is always a path where eventually holds and cannot be expressed in LTL. Consequently, some LTL properties such as A (FG ) along every path, there is some state from which will hold forever and fairness constraints (cf. Section 6.4.5), cannot be expressed in CTL either. More on the expressiveness of CTL*, CTL, and LTL is given in [36, 29].
Finally p Globally p neXt p p Until q

(a) AF p

(b) AG p

(d) A p U q

Finally p

Globally p

neXt p

p Until q

(e) EF p

(f) EG p

(g) EX p

(h) E p U q

Figure 3.2: CTL examples and intuitions.

3.4 Model Checking

3.4.4 The Model Checking Workow

In practice, the system model M is described by a semantical model, i.e., a Kripke structure and the specication (property) is described by a formula given in temporal logic.

S2 (1) AG (event U abort) (2) EF event > 100 (3) EF event = 20 S3

System model M

System property

Model checker

M |= ?

Notication

yes

Counterexample

Figure 3.3: The model checking workow. Proving a certain property is performed by determining the truth of formulas in certain system states. In order to apply model checking, one needs a modeling language in which the system is described as well as a notation for the formulation of properties and algorithms to step through the state space. As shown in Figure 3.3, a typical model checking workow is composed of three major steps: Dene a formal model of the system that is subject to verication by creating a model of the system in a language that ts the model checkers input language. Those modeling languages are usually tight coupled to the model checker itself, such as Process or Protocol Meta Language (PROMELA) used by the SPIN model checker [37]. System modeling usually involves the process of abstraction (see Section 4.1), i.e., simplifying the original system. System modeling focuses on the main properties in order to better manage the system complexity. Provide a particular system property that should be proved. In other words, a question

3 Background

about the system behavior is formulated that should be answered by the model checker. The system property is usually derived from the specication and given in a temporal logic. Invoke the model checking tool and receive a notication whether the given system property was fullled or not. In case the system property could not be veried, a counterexample is generated to nger-point to the source of error in the system model.

3.4.5 Coee Vending Machine Example

A simple model of a coee vending machine is introduced to exemplify the use of Kripke structures and CTL. Its textual specication reads as follows: After inserting a coin, the user can choose her/his favorite coee. A coee is only brewed after a valid selection is made. The user is able to abort the procedure at any time. Figure 3.4 shows the resulting Kripke structure, with all the states, transitions, and state variables. Each state is labeled with the atomic propositions AP that are true or false in the state. The labels given on the transitions are not part of the Kripke structure itself. The coee vending machine can be formally written as:

M = (S, s0 , R, L) S := {S1, S2, S3} s0 := {S1} R := {(S1, S2), (S2, S1), (S2, S3), (S3, S1), (S3, S3)} L(s1 ) = {coin, brew, selection} L(s2 ) = {coin, brew, selection} L(s3 ) = {coin, brew, selection}
As noted in Section 3.4.1, model checking is based on a graph search, therefore, the transition system is transformed to computation paths. This is done by unwinding the Kripke structure to obtain a computation tree, as shown in Figure 3.4(b). Most model checkers expect system properties given in some temporal logic. For the coee vending machine meaningful system properties might be: Coee is brewed after a selection was made. Coee is brewed sometime. These properties can be written in CTL as: AG[selection brew]1 Whenever a selection is made coee is brewed for sure. EF[brew] There is a state where coee is brewed.
1

represents implication (rst order logic).

3.4 Model Checking

coin brew selection start S1

insert coin

coin brew selection S2

select

coin brew selection S3

S3 ... S3

abort give change brewing

S1 ... S2 ... S3 ... S1 ...

(a) Kripke structure.

(b) The rst computation paths.

Figure 3.4: The coee vending machine example.

3.4.6 Local vs. Global Model Checking Algorithms

In literature, there are two dierent approaches of exploring the state space of a given system, i.e., local and global model checking algorithms. A global model checking algorithm rst builds the whole state space. Search and labeling algorithms are applied afterwards to nd particular states in which the system property cannot be proven. In global model checking the state space is traversed backwards to nd counterexamples. As the whole state space is available, a global model checking algorithm is able to present all possible counterexamples. It may compare the length of the counterexamples and only present the shortest to the user. A major drawback of global model checking is the generation of states that are not relevant to prove the given formula, thus, making the state-explosion problem even worse. Consequently, in local or on-the-y model checking only states are visited that are needed to prove the truth value of the formula in a given state. Hence, on-the-y state space building is possible when using local model checking algorithms. It is obvious that a local model checking algorithm can hardly nd the shortest counterexample. Nevertheless, local model checking is a rst step to alleviate the state-explosion problem. [mc]square implements a local model checking algorithm as described by Heljanko in [38]. A comparison of local and global model checking is elaborated in [39].

3.4.7 The Pros and Cons of Model Checking

Compared to traditional approaches, such as simulation and testing, model checking oers two major advantages: Model checking is a fully automatic approach. It does neither require user guidance nor does it claim for user expertise in the elds of mathematics, logic, or theorem proving. Anyone who uses design and simulation tools is able to apply model checking, since modern tools aim to oer a push-button solution. Model checkers are integrated within existing design tool chains. Counterexample generation. Whenever the model checker reveals that a given property failed to hold, the process of model checking allows to produce a counterexample/witness. A counterexample nger-points the user to the root cause of the problem, by demonstrating a behavior that falsies the property. For the process of

3 Background

debugging, such an error trace is profoundly advantageous, since the counterexample gives a complete insight into the systems behavior. Nevertheless, all that glitters is not gold. The broad application of model checking in industry is taking place quite slowly, mainly because of its three major disadvantages: The state-explosion problem. The main challenge in model checking is to cope the problem of state-explosion. In general, a model checker aims to enumerate and analyze the set of states a system may ever reach. The overall number of system states, even when dealing with small systems, is often too large to be handled with reasonable computing resources. Peled [40] summarizes eective strategies for ghting against state-explosion and proposes a combination of Binary Decision Diagrams (BDD)2 , Partial Order Reduction (POR), and Symmetry. More details are also given by Clarke et al. in [28]. Reported errors may be false negatives [40]. Model checking requires, as the name implies, modeling of the system. In order to alleviate the state-explosion problem, abstraction is needed (cf. Section 4.1). Thus, the program that is veried may not be the original one and consequently, if model checking reports a property violation in the abstracted model of the system, one has to make sure that the error is indeed a real one, i.e., it can be reconstructed on the real target platform. The process of checking the counterexample on the real system is often carried out manually. False negatives arise from the dierences between an actual systems behavior and the behavior represented by the abstracted model. Manually ruling out false negatives is time intensive and an error prone task itself. Therefore, a major future challenge for the model checking community may be the automated elimination of false negatives. A more detailed discussion on how to overcome the problem of false negatives is carried out in Section 7.1. Model checking can only verify a given specication. Thus, an important point is the completeness of the specication. It is challenging to make sure that the specication covers all properties that the system should satisfy and to establish a one to one match of a given textual specication and the derived formal specication.

3.5 Assembly Code Model Checking and [mc]square

An important point in embedded software verication is the mismatch between what gets veried during system verication and the actual version of the application running on the embedded target processor. In other words, there may be a mismatch between the system model and the actual version which is deployed in the eld. As widely known and discussed in [41], embedded processors do not execute the high level representation of the software application directly, e.g., C or C++ code, they can only execute mnemonics that are part of the instruction set. Especially for the embedded systems domain, custom-designed microcontrollers are in use. Most of the specic microcontroller features cannot be directly invoked through the high level programming language. Therefore, compiler and toolchain provider extend
2

A BDD or a Propositional Directed Acyclic Graph (PDAG) is a data structure that is used to represent a boolean function. It can be seen as a compressed representation of sets.

3.5 Assembly Code Model Checking and [mc]square

standardized programming languages with so called microcontroller specic extensions. These additional language features allow the engineer to enable/disable interrupts, read and write data to/from peripheral units, use special data types, invoke additional hardware blocks, etc. Not surprisingly, model checking of high level descriptions often fails to meet the needs to verify embedded systems code. Fortunately, model checking and static analysis of assembly code gained the attention of recent research projects [42, 7, 8]. Formal verication based on assembly code has some tremendous advantages over model checking of high level system models. The code that is deployed to the hardware is veried and not just an intermediate representation. A compiler, which is a highly complex piece of software itself, translates the high level code to microcontroller instructions. In most approaches to embedded code verication, a high level behavior of the system is analyzed, but there is a lack of a cross-check verifying whether the behavior of the model remains unchanged after code compilation. Thus, when using model checking of assembly code one can detect any errors introduced during the whole development process, including compiler errors, toolchain errors, wrong periphery setup, errors not visible in the C code at all, etc. 0101010001001000 0100111101001101 0100000101010011 Assembly source code / Hex le [mc]square Model checker
System model M

(1) AG evt U abort (2) EF evt > 100 (3) AF evt = 20

System property

M |= ?

yes Notication

Counterexample

Figure 3.5: The model checking workow of the [mc]square approach (cf. Figure 3.3). With [mc]square (Model Checking for Micro Controllers), the Department of Computer Science XI of the Technical University of Aachen developed a model checker that is precisely tailored for formal verication in the context of microcontrollers. [mc]square is an explicit, timeless, CTL based, assembly code model checker and features model check-

3 Background

ing and static source code analysis of software written for embedded targets. Supported target platforms are the ATMEL ATMega series [9], the Intel MCS-51 [11], the Inneon XC16x [43], and Programmable Logic Controllers (PLCs) [44]. The [mc]square model checker uses an accurate and customized Central Processing Unit (CPU) simulator to automatically derive the system model out of an implementation. Thus, the manual and often error-prone process of model creation can be shifted from the test engineer towards the implementation of the verication tool. This leads to the revised model checking workow as shown in Figure 3.5. In the following, a high level introduction to the C51Simulator component of [mc]square is given and only those parts of the model checker are discussed that are relevant for the elaboration of this thesis. More details on assembly code model checking and the tool [mc]square are given by Schlich in [9].

3.6 C51Simulator Intel MCS-51 Simulator Component

[mc]square uses a customized microcontroller simulator component for state space building. Hence, in order to support new target platforms a microcontroller simulator has to be created. The following section describes this process for the Intel MCS-51 simulator component. More details on the actual implementation can be found in [45, 11].

3.6.1 The Intel MCS-51 Microcontroller

The Intel MCS-51 success story started back in 1980, when Intel started to ship its brand new microcontroller family, widely known as 8051, which later on became one of the most popular and successful microcontrollers ever. Nowadays, almost every well-known Integrated Circuit (IC) manufacturer3 has the Intel MCS-51 in its product portfolio or they even made their own instruction set compatible derivatives. Moreover, open-source synthesizable Intel MCS-51 Intellectual Property (IP) cores are available in Register Transfer Level (RTL) code such as Very High Speed Integrated Circuit Hardware Description Language (VHDL) or Verilog, ready to be used within Field Programmable Gate Array (FPGA) and Application Specic Integrated Circuit (ASIC) based designs. The original Intel MCS-51 design directly inuenced a remarkable number of recent microcontroller architectures. Basically, it is an 8 bit Complex Instruction Set Computer (CISC) microcontroller organized as Harvard Architecture. Code and data memory are strictly separated and instructions dier in their length. Main Features [46, 47]: 128 bytes of Internal Random Access Memory (IRAM) 4096 bytes of internal Program Read Only Memory (ROM) 32 byte of bitaddressable memory block Four 8 bit wide general purpose I/O ports
3

an estimated number of over fty companies worldwide.

3.6 C51Simulator Intel MCS-51 Simulator Component

Two 16 bit timer units Full-Duplex Universal Asynchronous Receiver Transmitter (UART) Five dierent interrupt sources and two levels of interrupt priorities 256 dierent instructions Five dierent addressing modes The majority of instructions are executed within 12 system clock cycles Registers as well as I/O ports are memory mapped, therefore, accessed like any other memory location. The stack is located within the IRAM area and grows to higher data memory addresses. A particular and powerful architecture feature is the bit-manipulating capability of the CPU. Single bits can be set, cleared, or involved in other logical calculations. Four separate register banks are located at the bottom of the IRAM occupying the rst 32 bytes of data memory. Register banks are altered by modifying two dedicated register bank selection bits within the Program Status Word (PSW). 21 Special Function Registers (SFRs) allow the conguration of peripherals. A few of them are bitaddressable, some are only byteaddressable and some can be accessed in either mode. Instruction Set The instruction set covers 256 dierent instructions, hence, resulting in 8 bit wide opcodes. Caused by the CISC architecture, instructions are either one, two, or three byte long. They can be separated into ve groups: logical, arithmetic, program branching, data transfer, and boolean operations. Supported Addressing Modes Data and program memory are accessed by one of the ve available addressing modes: Immediate addressing is used whenever the source operand is a constant value rather than a variable. The constant value can be either included as a single byte into the instruction, or be derived from the opcode itself. Direct addressing is used for accessing any IRAM location including SFRs. Indirect addressing uses the registers R0 or R1 from the active register bank as base registers. The value stored into these registers indicates an address in IRAM where data should be read from or written to. Any pointer makes use of indirect addressing. Extended direct addressing is basically the same as direct addressing but it is rather used to access additional external memory locations than IRAM locations. Indirect from program memory enables reading from program memory. The interested reader is referred to the Intel MCS-51 datasheet [46] for more details on the architecture and the instruction core.

3 Background

3.6.2 The Big Picture

[mc]square uses a well dened and slim interface to communicate and control the C51Simulator. The main task of the C51Simulator is to generate possible successor states for a given Program Counter (PC) location. In order to do so, the C51Simulator has to model and implement the whole instruction set, data memory management as well as peripheral units of the real target microcontroller. However, a few requirements (cf. [11]) for the simulator forbid the use of existing CPU simulators. [mc]square abstracts from time, hence, the use of an existing and o-the shelf CPU simulator is not suitable for the [mc]square approach to assembly code model checking. Almost all Commercial O The Shelf (COTS) microcontroller simulator engines are based on a cycle accurate approach. Thus, without applying further modications it is nearly infeasible to use conventional cycle-accurate simulator engines to build [mc]square conform state spaces. Moreover, some abstraction techniques are applied on-the-y, i.e., during the state space generation, requiring extra behavior not found in standard CPU simulators. binary & debug les

program parser

state space

static analyzer

simulators C51 ... c167

counterexample gen.

CTL property

CTL parser

PLC

AVR

Figure 3.6: The [mc]square framework. As shown in Figure 3.6 the [mc]square framework provides a full CTL model checker, a counterexample generator, a comfortable Graphical User Interface (GUI), and an assembly code static analyzer. Whenever [mc]square needs data generated by the hardware the respective simulator component is invoked. A nice side eect of the simulator based approach is a full CPU simulator, allowing the user to analyze and debug the code prior to model checking. It is notable that [mc]square oers a new way of analyzing microcontroller programs, which is quite dierent to standard COTS tools, since the simulation covers the whole state space of the application.

3.6.3 Test and Verication of the C51Simulator Component

A major point of criticism on tool based model checking, is the justiable question regarding the verication of the tool itself. How to make sure that the tool doesnt contain software bugs by itself, leading to false outputs during the model checking process? Hence, special

3.6 C51Simulator Intel MCS-51 Simulator Component

care must be taken at verifying the implementation of the simulator component. It is achieved by verifying the actual implementation against commercial available Intel MCS51 simulators such as the Keil Vision debugger or CSim which is included in the Small Device C Compiler (SDCC) [48] toolchain. The conceptional test approach is shown in Figure 3.7. A test pattern le is loaded into both simulators and each instruction is independently executed by the two simulators. After the execution, the whole memory area of both simulators is dumped into separated les and these les are compared against each other. More on the applied test and verication strategy is given in [45].

test pattern le
MOV MOV PSW,#0 A,#10 R0,#10 A,R0

execute commercial simulator dump memory content to le

MOV ADD

SUBB A,#20 JZ MOV DONE_2 P1,#2

execute C51Simulator [mc]square dump memory content to le

LJMP FAILED

compare dump les

match?

instruction veried

yes

troubleshooting

Figure 3.7: C51Simulator verication process.

3.6.4 The Software Architecture of the C51Simulator

The core task of the C51Simulator is, of course, the emulation of the Intel MCS-51 instruction set and peripheral units. At this particular point the simulator behaves as any other CPU simulator. Instructions are fetched and decoded from the program memory, involved memory locations are read, modied, and written back to their origin. A simplied architectural overview is given in Figure 3.8, showing that the C51Simulator is build around ve main building blocks. In the remainder of this section these building blocks are discussed in brief.

3 Background

[mc]square Interface

Instruction set core

Memory model

Splitter

Determinizer

Figure 3.8: Software architecture of the C51Simulator.

Instruction Set Core A basic, straightforward implementation of the semantics of the opcodes supported by the microcontroller as dened in the corresponding datasheet [46].

Memory Model The memory model acts as a representation of the Intel MCS-51 data and program memory. As described in [49], [mc]square uses abstraction techniques that center around the idea of a 3-valued memory representation. Such a ternary memory representation allows certain memory locations to be marked as unknown in order to avoid the creation of unneeded successor paths. For this reason, the memory model requires shadow memory to indicate whether the actual value is known. Consequently, the simulator manages two blocks of memory. As shown in Table 3.1, every byte of memory is represented by its actual value and a second byte, serving as mask indicating whether or not a certain bit is deterministic (Those bits with value Nondeterministic (ND) are indicated by a *). Location @ 0x0A @ 0x0B @ 0x0C @ 0x0D @ 0x0E @ 0x0F Binary value b 11110000 b 00001111 b 10101010 b 00000000 b 00110011 b 01010101 ND-mask b 00001100 b 11110000 b 01010101 b 01100110 b 00000000 b 11111111 Ternary value 1111**00 ****1111 1*1*1*1* 0**00**0 00110011 ********

Table 3.1: Memory representation in [mc]square. More on the benets of this 3-valued memory representation is given in [13, 9] and in Section 4.1.3.

3.7 Related Work

Splitter At certain points in the model checking ow it is necessary to predicate over memory location in order to prove a given specication. Thus, in the case a memory location involved in the formula is marked as ND, there must be a mechanism to strip down ND memory locations to every possible value combination resulting out of the ND. That is exactly what the Splitter is used for. The actual implementation of the Splitter can become quite tricky and complex, one of the main reasons are the various addressing modes supported by the respective target hardware. A few straightforward examples are given in Table 3.2. Location @ 0x0A @ 0x0B @ 0x0F Ternary value 1111**00 ****1111 ******** Value combinations 22 = 4 24 = 16 28 = 256

Table 3.2: ND memory representations and resulting value combinations.

Determinizer The Determinizer is, in principle, the decision making part acting whenever the C51Simulator has to resolve nondeterministic behavior. For any given state in the state space it is capable of generating all possible successor states. Further on, the Determinizer takes over the proper handling of interrupts and branches to Interrupt Service Routines (ISRs). Interface A slim interface connects the C51Simulator to the [mc]square model checker as well as to the GUI.

3.7 Related Work

There has been extensive research into the topic of formal verication in the past. This section divides the existing related work into four main areas.

3.7.1 The Assembly Code Model Checking Approach

Several model checkers such as SLAM [50], BLAST [51], MAGIC [52], MOPS [53], OPEN/CAESAR [54], or SOCKETMC [55] work on C code. These tools, however, are not applicable to embedded systems because of the special nature of programs targeted for microcontrollers [56]. A model checker for embedded systems has to support special features, for instance, direct memory access, interrupt handling, inline assembly instructions, usage of timers, or communication interfaces. Hence, model checking of machine code seems mandatory when trying to automate the process of model construction. Related model checkers that work on the machine code level are StEAM [57], MCESS [42], and Estes [8], however, only the latter two are targeting embedded systems.

3 Background

Estes model checks assembly code for the 68HC11 microcontroller constructing the state space either with a simulator or real hardware using the GNU debugger. In practice, this approach is only feasible for small programs. Constructing the state space for the model via the hardware takes time (unless dedicated hardware support is provided). Furthermore, using an out-of-the-box simulator/debugger to construct the model, on the other hand, restricts optimizations in order to minimize the state space. MCESS, in contrast, translates the assembly code of ATMEL ATmega 16 microcontrollers into hardware-independent byte-code for a specic virtual machine that is able to check properties given in LTL. However, due to this approach most hardware issues are abstracted rather coarse eventually removing essential information that may invalidate the entire verication process. Unlike these approaches, [mc]square constructs the model with special, tailored simulators for microcontrollers.

3.7.2 3-valued Abstraction Techniques

3-valued logic was initially dened by Kleene [58]. 3-valued logic is used in many research areas connected with verication. There are model checking algorithms that directly work with 3-valued logic. Bruns and Godefroid [59] describe a 3-valued CTL model checking algorithm. Another approach is described in a paper written by Yahav [60]. In this paper, 3-valued logic is used to verify safety properties of concurrent Java programs. In contrast to these approaches, the model checking algorithms used by the [mc]square approach work with Boolean logic. 3-valued logic is only utilized in the memory representation that is used by the simulator, which builds the state space. All memory locations that are accessed by the model checking algorithms use Boolean logic. Symbolic or X-valued simulation [61] is another technique that is related to 3-valued logic. Here, symbolic values are used in place of explicit values. In our approach parts of the states used can be symbolic, but whenever the simulator or the model checker needs to access symbolic parts of a state, these parts are instantiated, and hence become explicit. All parts of a state that are not accessed remain symbolic. In [62] a symbolic simulation scheme is used to verify embedded array systems such as memory management units of high performance microcontrollers. Symbolic Trajectory Evaluation (STE) [63] is a lattice-based model checking technology that uses a form of symbolic simulation for hardware circuit verication. In [61], a symbolic simulator is used to verify hardware systems. Similar [64] combines a linear-time logic model checking algorithm with lightweight theorem proving in higherorder logic. Whenever an X (denoted by ND in our approach) is accessed and a value is needed, new symbolic variables are added and simulation has to be repeated. In our method a dynamic renement is conducted. There are some approaches combining explicit and symbolic executions (cf. [65, 66]), but these approaches do explicit execution and symbolic execution in parallel. There are also some approaches using 3-valued logic in static analysis. Reps et al. [67] describe an approach to use 3-valued logic in abstract interpretation. In another paper, Sagiv et al. [68] present a way to use 3-valued logic for shape analysis. Both analyses are special purpose analyses. In our approach, we use the 3-valued logic in a memory model utilized within model checking, which is a dynamic analysis that is more general.

3.7 Related Work

3.7.3 Static Analysis

Typical static analyzers for C are not capable of dealing with features specic to embedded hardware due to the lack of a precise hardware model. This can be integrated though, as described by Fehnker et al. [69]. In their work, a static analyzer for C/C++ code called Goanna was extended to detect misuse of hardware features of the ATMEL ATmega16. Regehr and Reid [70] describe a system specically suited for embedded software, which automatically generates abstractions using the specication of the microcontroller. An approach to automatic generation of transfer functions for data-ow analyses is described by Regehr and Duongsaa [71]. Their approach is to automatically derive abstractions and transfer functions from a specication, while our approach involves modeling such abstractions by hand. An earlier approach by Bergeron et al. [72] transforms the assembly code into a higher-level representation, on which static analysis is performed, but they do not consider interrupts, which makes this approach unsuitable for interrupt-driven software frequently found in embedded systems. Brylow et al. [73] describe static analysis for interrupt-driven software, but their approach supports only immediate values written into status registers. In practice, values written into status register are often stored and manipulated in registers. Martin et al. [74] have described a loop analysis algorithm for cache prediction. In this approach, loop bodies are transformed into separate functions and interprocedural analysis algorithms are applied to perform a precise analysis of loops, which is similar to a context-sensitive analysis. A stack analysis using a context-sensitive abstract interpretation is described by Regehr et al. [75]. This analysis is used for a worst-case prediction of stack sizes. While interrupts are considered, recursion is unrolled only until a xed bound specied by the user. A thorough description of challenges during static analysis of microcontroller assembly code is included. Another approach to stack analysis of x86 assembly programs is described by Linn et al. [76], which is not suitable in presence of interrupts. An intraprocedural static slicing algorithm for assembly code is described by Cifuentes and Fraboulet [77], but stack variables are not supported at all. The occurrence of interrupts in embedded software can be seen as a restricted form of multi-threading. Numerous approaches for static analysis of concurrent programs have been developed. An approach by Lal and Reps [78] adapts static analyses for sequential programs and extends them to work in a concurrent setting. Other approaches, such as the work by Qadeer and Rehof [79] or Lal et al. [80], tackle state-explosion due to thread interleavings by imposing an upper bound on the number of context switches. In contrast, our approach of Register Bank Analysis aims at rening assembly code static analysis for the Intel MCS-51 microcontroller by proposing a tailored analysis to cope with the architectural feature of register bank swapping.

3.7.4 Simulators for [mc]square

Other simulators for [mc]square were previously implemented by Schlich [9] (ATMega family), Scheuer [43] (Inneon Xc167), and Wernerus [44] (PLC).

3 Background

4 Abstraction Techniques
All the world is an abstract interpretation (of all the world). (David Schmidt)

In this chapter the concept of abstraction is introduced and the need of abstraction in model checking is emphasized. First, the terms over-approximation and underapproximation are explained. Next, a thought experiment is conducted, showing the exponential connection between the state space size and the amount of data memory of a microcontroller. Then, nondeterministic behavior in assembly code model checking is addressed and a 3-valued memory model is presented. Finally, three state space abstraction techniques and their actual implementation into the Intel MCS-51 simulator component are described.

4.1 Abstraction in Model Checking

Abstraction refers to the progress of obtaining a simpler version of the checked system, by reducing the number of details that need to be taken care of. Abstraction is performed in order to retain only information that is relevant for a particular purpose.

4.1.1 Reducing System Complexity through Abstraction

Abstraction is quite natural and human, e.g., the human ear is able to recognize frequencies in a narrow bandwidth only. The bandwidth typically stretches from about 16 Hz up to 20 kHz. Thus, all other frequencies are neglected, or in other words abstracted, since they are out of the relevant range. Ever since the early beginnings of model checking in the late 1970s research teams [32] are facing a problem generally known as the stateexplosion problem, describing the limitation set by available computation power and the resulting states that can be stored, examined, and veried against the user-stated claims. Abstraction or simplication of the analyzed model towards manageable versions of the analyzed system is crucial for the application of formal methods and a key concept to mitigate the state-explosion problem. Nevertheless, abstraction introduces new verication challenges among the original system and the simplied one [40]: Proving that the essential properties are preserved between the original system and its simpler version (Bisimulation relation1 ). Proving the correctness of the simplied version. This task may be achievable after the simplication through model checking.
1

Bisimulation refers to a relation between state transition systems, associating systems which behave in the same way in the sense that one system simulates the other and vice-versa.

4 Abstraction Techniques

Abstraction is usually based on using additional human knowledge through manual or semiautomatic tools. Applying abstraction is challenging and usually a walk on a thin line between sound results and a miss of important properties in the abstracted model. Literature denes the terms over-approximation for system models containing more information as needed and as a counterpart the term under-approximation for system models lacking important system properties one is interested in (cf. Figure 4.1). exact world over-approximation under-approximation

universe exact set of behavior safe missing details

Figure 4.1: Over- and under-approximation in abstraction [81].

4.1.2 Turings Halting Problem and Why Model Checking Works Anyway
Alan Turing rst proved that there is no way of deciding once a computer has started a calculation whether that calculation will terminate. In other words, it is not decidable whether a Turing Machine [24, 40] will come to a halt given a particular program input. The problem is known as the Halting Problem for Turing Machines and was rst discussed in 1936 [24]. For the eld of software verication the halting problem means that it is in general not possible to write a program that automatically checks another program given as input parameter. Thus, the halting problem is the foundation for the mathematical fact that in general verication of a program is undecidable. More on limitations on what can be decided by an algorithm is dened by the theory of computability [82]. A legitimate question that now arises is, why formal program verication is still gaining tremendous attention in recent research [32] and even commercial tools are celebrating great achievements in the eld of automatic program verication when Alan Turing back in 1936 already proved that all those problems are in general undecidable. Computers that we are using today are not comparable to Turing Machines. A Turing Machine is a mathematical model, which uses a linear tape as a storage device. The tape is divided into cells and each cell is labeled by a symbol from a given alphabet. The tape has a xed left end, and is innite on the right. A single cell on the tape corresponds to a register in main memory within modern-day computers. Whereas, the storage device on a Turing Machine has innite capacity (due to the innite tape), memory is always limited in conventional computers, especially for embedded systems. It follows, that a Turing Machine can reside in an innite number of distinct systemstates. This is not true for conventional computers. Since physical memory is always limited, the number of system states is limited to a nite number of states. Therefore, program code that runs on conventional computers can be described by a Finite State Machine (FSM). A FSM has a nite number of states and a nite number of transitions

4.1 Abstraction in Model Checking

between those states. The upper limit of possible states is dened by all possible register and memory congurations. The transitions are depending on the underlying hardware architecture. It is even possible to generate a nite state graph for all possible programs that may run on the computer. Each program would have a dierent entry node in the state graph. Depending on the current instruction of the program that represents the transitions it is possible to follow the graph in order to observe the intended behavior by the program. As one can imagine, those (complete) state graphs are huge, even though their generation is theoretically possible. Summarized, the undecidable Halting Problem for Turing Machines is reduced for real life computer systems with limited memory to a decidable one since the focus lies on: model checking of nite state machines, i.e., nite state reactive systems propositional temporal logics to describe properties of the FSM model Nevertheless, model checking of assembly code remains a tough job, mainly caused by the state-explosion problem. To illustrate the state-explosion problem, a thought experiment is conducted. Imagine an ordinary microcontroller, featuring a read-only program memory and a read-write data memory. Each memory location is 8 bit wide. Table 4.1 shows the relation between the number of data memory bytes and the resulting states the system may reside in. It is evident that resulting state space is exponential in the number of the data memory size. Data memory size 1 byte 2 byte 3 byte 4 byte 5 byte 6 byte 7 byte 8 byte 16 byte Resulting system states (state space size) = 256 2562 = 65536 2563 = 16777216 2564 = 4294967296 2565 = 1099511627776 2566 = 281474976710656 2567 = 72057594037927936 2568 = 18446744073709551616 25616 = 340282366920938463463374607431768211456 2561

Table 4.1: Data memory size and resulting system states. In fact, for the Intel MCS-51 target, the IRAM is compiled out of 256 bytes of memory, leading to an approximate of 256256 possible system congurations2 . Under the spell of Moores Law the number of transistors that can be placed inexpensively on an integrated circuit doubles every two years [83] the number of possible system congurations increases tremendously with every new microcontroller family. As the presented examples make clear, even for tiny systems with only a few bytes of memory the number of possible system states is tremendous, thus, claiming the use of abstraction in order to alleviate the state-explosion problem.
2

256256 equals to a number with 616 decimal places.

4 Abstraction Techniques

4.1.3 Nondeterministic Behavior in Assembly Code Model Checking

One of the biggest challenges in explicit (in our case assembly code) model checking is dealing with nondeterministic behavior. Nondeterminism is introduced by the environment in which the microcontroller is operating in, e.g., by unknown values of I/O ports. This uncertainty requires a dedicated treatment by the model checker. It leads to the creation of multiple successor states by instantiation of nondeterministic values with every possible value combination, which in turn means a further expansion of the overall state space. Recapitulating the working scheme of [mc]square, a model of the particular microcontroller is responsible for state space building. In general, the microcontroller software model faces nondeterminism by either performing communication with the external environment, e.g., reading a value over the I/O lines, undertaking serial communication, or by interrupts that are likely to occur at every system state as long as the corresponding interrupt source is enabled. For the Intel MCS-51 target sources of nondeterminism are: (i) The four I/O ports (ii) The four timer registers (iii) The serial communication receive register (iv) The ve interrupt ags Serial interrupt ag Timer 0/1 overow ag External event 0/1 ag To make the issue of introducing nondeterministic values clear, the assembly code snippet in Listing 4.1 is investigated. This assembly code instructs the microcontroller to read a single byte from the 8 bit wide I/O ports P0 and P1 and stores the fetched values within the internal Random Access Memory (RAM) at locations 0x20 and 0x21, respectively.
1 2

MOV MOV

0 x20 , P0 0 x21 , P1

Listing 4.1: Assembly code excerpt. Following the idea of explicit state space generation reveals that the two assembler instructions shown in Listing 4.1 generate altogether 256256 = 65536 successor states. The considerable number of successors is originated by the immediate instantiation of nondeterministic values contained in I/O ports. The two MOV instructions are stored successively in the program memory. The value of P0 is unknown, and therefore, the model checker creates 256 successor states to remove uncertainty concerning the actual value of the port. Afterwards, the second MOV instruction is executed. Each of the 256 successors then creates further 256 successors for the instantiation of P1, whose actual value is unknown too. Environment information is not present, hence, all 65536 successors are created in order to cover all conceivable situations. Let us consider the fact of immediate successor creation from a dierent point of view. Suppose, that the stated claim that is subject to verication does not include statements over memory location 0x20 nor over 0x21. In this case, there is no need to create successors states. It is sucient to nd a mechanism to mark certain bit positions whose value is unknown and, thus, can be read as ND.

4.1 Abstraction in Model Checking

To that end, a 3-valued logic approach for modeling the microcontroller memory is used. Whereas binary logic is composed out of elements that are valued on the set {0, 1}, i.e., each value obtains either true or false, 3-valued logic or ternary logic [58] is dened as follows in [84]: Ternary logic is a system whose elements called statements are valued in the set {0, 1, 2}. If x is a statement3 , the value of x can be interpreted as a mapping : {0, 1, 2} such that:

0; if x is perhaps true, perhaps false (x) := 1; if x is true 2; if x is false

(4.1)

In the remainder of this thesis the term ND is used for the rst line of the semantic representation stated in Equation 4.1.3. Ternary logic is well known in hardware description languages such as VHDL or Verilog to represent unknown values of, e.g., input circuit latches or uninitialized memory locations. Synthesis tools use this ND representation to reveal design errors, which the designer can correct before synthesis towards an actual circuit. From the state space view, the 3-valued memory representation introduces a certain type of states, namely lazy states. A lazy state combines both explicit and symbolic parts of the state space4 . Any state including memory locations marked as ND is called lazy state. Consequently, a single lazy state represents a set of explicit states. A lazy state and the corresponding nondeterministic state space representation are shown in Figure 4.2.

S(n)

S1 set of explicit states lazy state ...

S(n+1)

S5 ...

S6 ...

S(n+2)

S3 ...

S4 ...

Figure 4.2: Nondeterministic state space representation.

Note that in our approach a statement refers to a single bit location within the IRAM of the microcontroller. 4 [mc]square still uses explicit model checking algorithms.

4 Abstraction Techniques

4.2 Implementation Abstraction Techniques for the C51Simulator

As aforementioned, abstraction is the main concept to overcome the state-explosion problem. In what follows, the implemented abstraction techniques for the C51Simulator are presented. The dierent concepts have dierent eects on the achievable state space reductions as well as on the maintained expressiveness. The stronger the applied abstraction, the higher the over-approximation. The actual results strongly depend on the source code structure, the number of I/O accesses, the number of used interrupts, etc. A rough estimation, based on empirical knowledge, about the eects of the three introduced concepts are given in Table 4.2. Abstraction technique none Delayed Nondeterminism Delayed Nondeterminism with Look Ahead Nondeterministic Program Status Word State space reduction none low medium high Maintained expressiveness full high medium low

Table 4.2: Comparison of abstraction techniques for the C51Simulator.

4.2.1 Delayed Nondeterminism

Delayed Nondeterminism was rst presented in [49] and is an approach to state space reduction. As the name implies, resolving nondeterministic values by the Splitter (cf. Section 3.6.4) is postponed as long as possible. For that reason, [mc]square takes advantage of its 3-valued memory concept. Successor states are not necessarily produced when they are generated but only in case they are needed to prove a given system property or for a subsequent computation step. For example, a subsequent computation step is any conditional branch instruction, which requires the actual value of a nondeterministic memory location to solve the jump condition or to determine the target location where the branch leads to. Location @ 0xA @ 0xB @ 0xA @ 0xB Binary value ND-mask Ternary value before executing MOV [0xA, 0xB] b 11110000 b 00000001 1111000* b 00001111 b 11110000 ****1111 after executing MOV [0xA, 0xB] b 00001111 b 11110000 ****1111 b 00001111 b 11110000 ****1111

Table 4.3: Memory contents before and after the MOV instruction. To illustrate the concept of Delayed Nondeterminism, the instruction MOV [0xA, 0xB] is considered. With regard to the Delayed Nondeterminism approach, whenever the C51Simulator executes a MOV instruction, not only the value from 0xB is copied to 0xA as one would expect when reading the dened instruction semantics in the datasheet

4.2 Implementation Abstraction Techniques for the C51Simulator

it rather copies the corresponding ND-mask and the actual value. Hence, the generation of multiple successors is avoided by delaying the instantiation of the involved and perhaps nondeterministic memory location 0xB. This procedure is documented in Table 4.3 and illustrated in Figure 4.3. copy ND-mask value copy Figure 4.3: The Delayed Nondeterminism approach of handling the MOV [0xA, 0xB] instruction. A case study by Noll and Schlich [49] revealed the eect of Delayed Nondeterminism for dierent program congurations showing a possible state space reduction of 70% and above. Nevertheless, actual savings due to Delayed Nondeterminism depend on various factors, such as source code structure and the targeted hardware. ND-mask value

Source 0x0A

Destination 0x0B

4.2.2 Delayed Nondeterminism with Look Ahead

Delayed Nondeterminism with Look Ahead carries the thought of Delayed Nondeterminism a bit further. First results were presented to the scientic community in [13]. Even though modern microcontrollers come along with a lot of dierent functionality, there is one thing they have all in common. A typical instruction set oers at least four dierent kinds of operations: 1. Arithmetic operations are used whenever the microcontroller has to perform arithmetic calculations such as ADD, SUBB, INC, DEC, MUL, DIV, etc. 2. Logical operations are used whenever the microcontroller has to evaluate boolean equations. Typical examples are ANL, ORL, XRL, CPL, RLC, RRC, etc. 3. Data transfer operations are utilized whenever data/program memory is copied from a given location, i.e., the source, to a destination location. Those operations are commonly referred to as MOV instructions. 4. Program branching operations are utilized whenever conditional or unconditional branches are needed in order to change the ow of execution when stepping through the microcontroller program. Program branching leads to a modication of the PC, thus, allowing subroutines, loops, and branches in general. Typical representatives are SJMP, LCALL, JZ, CJNE, DJNZ, etc. Whereas Delayed Nondeterminism is only applicable for data transfer operations (3), the approach of Delayed Nondeterminism with Look Ahead focuses on logical operations (2), while still preserving all the advantages generated by Delayed Nondeterminism. Thus,

4 Abstraction Techniques

Delayed Nondeterminism with Look Ahead can be seen as an extension of Delayed Nondeterminism. Delayed Nondeterminism fails to prove its superiority when dealing with logic operations, since the straightforward approach of copying ND-masks, as described in Section 4.2.1, cannot be applied anymore. The main idea of the Delayed Nondeterminism with Look Ahead approach to state space reduction is to take the semantic relations of the instructions into account. Delayed Nondeterminism with Look Ahead centers around the coherence among the boolean operators , , and with particular regard to 3-valued logic. Relevant relations are summarized in Table 4.4 A true true true false false false ND ND ND B true ND false true ND false true ND false AB true true true true ND false true ND ND AB true ND false false false false ND ND false A false

true

Table 4.4: Truth table for 3-valued logic. Embedded systems code is tightly coupled to the environment of the microcontroller. Analog and digital values are read from sensors, giving the application the possibility to react upon changes in the environment. Therefore, reading data over I/O ports of the microcontroller is essential for embedded applications. Since reading unknown values from the environment is one of the major contributers to the state-explosion problem, special care has to be taken to avoid the generation of needless successor states from the very beginning. Delayed Nondeterminism with Look Ahead tackles the problem right from the point where data is read from the I/O ports. Reading values over the microcontroller I/O often involves bitwise operations performed by bit masks, since ports can be either accessed byte wise only, or the application is only interested in a certain number of bits, e.g., the lower nibble of an 8 bit wide port. Bit masking, or bit twiddling is a common way to individual operations on single bits. A summary of the most common usages of bitmasks is given in Table 4.5. Compilers translate those bit-twiddling statements from the high level language towards logic operations supported by the microcontrollers instruction set. In the following, a simple example is presented to explain the idea of Delayed Nondeterminism with Look Ahead. Example The C code in Listing 4.2 represents typical (low level) embedded code. In what follows, this code excerpt is used to discuss the concept of Delayed Nondeterminism with Look Ahead. The code reads the value of the 8 bit wide I/O port, termed Port1, and uses a bitmask to extract the upper two bits out of the I/O port.

4.2 Implementation Abstraction Techniques for the C51Simulator

Operation Setting bits to 0

C code syntax y &= ~(1 pos);

Operand 01101110

Mask 11110111

Setting bits to 1

y |= (1 pos);

10010101

00001000

Toggling a bit

y ^= (1 pos);

10011101

00001000

Testing a bit

y = x & (1 pos);

00011101

00001000

Extract low nibble

y = x & 0x0F;

10011101

00001111

Extract high nibble

y = x & 0xF0;

10011101

11110000

Example 01101110 11110111 01100110 10010101 00001000 10011101 10011101 00001000 10010101 00011101 00001000 00001000 10011101 00001111 00001101 10011101 11110000 10010000

Table 4.5: How bitmasks are used in embedded software.

1 2 3 4 5 6 7 8 9 10 11 12 13 14

unsigned char readValueFromIO ( void ) { / rea d p o r t v a l u e / v a l u e = Port1 ; / mask t h e upper two b i t s / v a l u e &= 0xC0 ; return v a l u e ; } int main ( void ) { while ( 1 ) { readValueFromIO ( ) ; / do s o m e t h i n g / } }

Listing 4.2: Embedded C code example program for the Intel MCS-51 target. The source code line 5 in Listing 4.2 is now mapped by the compiler (Keil C51 Compiler V8.01) to the opcode #0x53 representing the instruction ANL direct,#immediate. The ANL instruction compares the bits of the internal memory location (0x12) with the immediate value (#0xC0) and sets the corresponding bit in the resulting byte only if the particular bit is set in both of the operands, otherwise the resulting bits are cleared.
1 2

MOV ANL

v a l u e ( 0 x12 ) , Port1 ( 0 x90 ) v a l u e ( 0 x12 ) , #0xC0

Listing 4.3: Translated assembly code for source code lines 4-5 of Listing 4.2.

4 Abstraction Techniques

In the following, the eect of the assembly code in Listing 4.3 on the abstraction techniques Delayed Nondeterminism and Delayed Nondeterminism with Look Ahead are discussed and compared. Delayed Nondeterminism helps to avoid generating successor states for the initial MOV instruction by simply copying the actual value as well as the ND-mask from memory location 0x90 to memory location 0x12 (cf. Figure 4.3). Reading from the environment leads always to a full-nondeterministic read, since environment information is not present. However, Delayed Nondeterminism forces us to determinize (creating all possible successors) involved memory locations in preparation for the following ANL instruction. The variable value is unknown, thus, all 8 bits are marked as ND and the model checker invokes the simulator to generate all possible successors arising from this uncertainty. The number of successor states is easily calculated and results in 28 = 256 states. Consequently, Delayed Nondeterminism leads to a wide branch in the computation tree, having a negative impact on the state space and makes the state-explosion problem even worse. This scenario is depicted in Figure 4.4, showing the total number of 256 successor states generated.

S(n)

S(n+1)

S0
value :=0x00 b00000000

S1
value :=0x01 b00000001

S2
value :=0x02 b00000010

S 3-253

S253
value :=0xFD b11111101

S254
value :=0xFE b11111110

S255
value :=0xFF b11111111

Figure 4.4: The state-explosion problem. However, the described approach results in a valid over-approximation (cf. Section 4.1.1) by replacing the ND value of memory location value with actual values (one at a time) and performing the ANL afterwards. Nevertheless, this approach lacks a consideration of the second operand included in the operation, i.e., the constant value of the bitmask. As shown in the example code, the bitmask is of value 0xC0. Examining the bitmask on the binary level, it evaluates to b 11000000. Thus, the only two bits of interest in this calculation are the upper two, i.e., the two most signicant bits. The remaining six bits, will evaluate, according to the relations dened in Table 4.4, to false in any case. Hence, the number of successor states can be reduced from 28 = 256 down to 22 = 4. The resulting values are 0x00, 0x40, 0x80, and 0xC0 as detailed in Table 4.6. Figure 4.5 presents the dierences in the number of the resulting system states for the various abstraction techniques when executing the two assembler instructions of Listing 4.3. The Delayed Nondeterminism with Look Ahead approach helps to avoid overapproximation whenever logical operations are performed over ND memory locations. How-

4.2 Implementation Abstraction Techniques for the C51Simulator

Bit MSB 6 5 4 3 2 1 LSB

Value * * * * * * * *

Operation

Mask 1 1 0 0 0 0 0 0

Result * * 0 0 0 0 0 0

Combinations (i) 0x00 b 00000000 (ii) 0x40 b 01000000 (iii) 0x80 b 10000000 (iv) 0xC0 b 11000000

Table 4.6: Details on the Delayed Nondeterminism with Look Ahead approach. ever, the promising approach to state space abstraction cannot be applied to all logical instructions of the microcontrollers instruction set. An example is the XOR instruction. To exemplify this on the bit level representation, neither the result of XOR [1, ND] nor XOR [0, ND] can be decided without knowing the actual value of the ND bit. The same applies to the negation, i.e., NOT [ND]. Nevertheless, considering the frequent I/O accesses and the common method of bittwiddling in typical embedded systems code, the presented abstraction technique can be seen as a promising contributer to state space reduction. Regarding the C51Simulator implementation Delayed Nondeterminism with Look Ahead is applied to 32 out of 256 instructions in total. In [13] a saving in overall state space of 99% is achieved by Delayed Nondeterminism with Look Ahead compared to plain explicit state space building. It should be noted, that this result is only valid for the chosen example in [13]. Actual savings due to this method are depending on the source code structure and the number of accesses to nondeterministic memory locations, i.e., for source code without any I/O accesses this concept will not contribute to state space reduction (but wont increase the state space either). Implementation The actual implementation in the C51Simulator component uses a visitor pattern. The visitor design pattern is a common way of separating an operation from an object structure upon which it operates. The major benet lies in the ability to add new operations to existing objects without modifying those structures. More on the visitor design pattern is found in [85]. For the C51Simulator, the whole instruction set implementation is built around a visitor pattern. Based on the actual abstraction technique, the corresponding instruction visitor is selected and used to apply the desired abstraction mechanism. As an example, the Delayed Nondeterminism and Delayed Nondeterminism with Look Ahead instruction visitors for the ANL [direct, #immediate] instruction is presented in the following (the notation uses pseudo Java code). The Delayed Nondeterminism instruction visitor, shown in Listing 4.4, implements the ANL instruction as stated in the instruction set manual [46]. First, the involved memory location is read from the internal memory. Second, the logical ANL operation is performed and the new value is written back to the destination register. Note that for this particular instruction, the Delayed Nondeterminism visitor pattern is the same as for plain state space building without any

Resulting states = 513 (256+256+1)

Resulting states = 258 (256+1+1)

Resulting states = 6 (4+1+1)

S(n) S Port1 := ND

Port1 := ND

MOV [value, Port1]

S(n+1) S0 S1 S2 S253 S254

S 3-253

S255

value := ND

ANL [value, #0xC0]

S(n+2) S0 S1 S2

S 3-253

S253

S254

S255

S 3-253

S253

S254

S255

(a) Instantiate immediately (no abstraction).

(b) Delayed Nondeterminism.

(c) Delayed Nondeterminism with Look Ahead.

4 Abstraction Techniques

Figure 4.5: Successor state generation and resulting system states with options: instantiate immediately, Delayed Nondeterminism, and Delayed Nondeterminism with Look Ahead for the assembly code presented in Listing 4.3.

4.2 Implementation Abstraction Techniques for the C51Simulator

abstractions applied. Recall that Delayed Nondeterminism is only applied on data transfer instructions.
1 public 2 3 4 5 6 7 8 } v o i d v i s i t ( ANL_Direct_Const i n s t r u c t i o n ) { i n t tmp2 = 0 x00 ; / Read d i r e c t a d d r e s s b y t e from memory / tmp2 = mcu . r e a d R e g i s t e r B y A d d r e s s ( i n s t r u c t i o n . a d d r e s s ) ; / Perform AND and w r i t e back / mcu . w r i t e R e g i s t e r B y A d d r e s s ( i n s t r u c t i o n . a d d r e s s , i n s t r u c t i o n . c o n s t a n t & tmp2 ) ;

Listing 4.4: The Delayed Nondeterminism visitor pattern for the ANL [direct, #immediate] instruction. Consequently, the visitor pattern used by Delayed Nondeterminism with Look Ahead works in a dierent way, since it takes care about the relations dened in Table 4.4. It works as follows. First, if the involved memory location is deterministic, i.e., none of the bits is masked as ND, the algorithm calls the standard visitor pattern, as introduced in Listing 4.4 and returns. Second, the algorithm iterates over all bits of the involved memory location and extracts the bit values as well as the corresponding ND mask values. Since the ANL [direct, #immediate] instruction involves a constant immediate value, the ND mask of the constant value is always 0x00 (false). Consequently, the gathered information is evaluated according to Table 4.4, and written back to a temporal register, termed resultReg. This procedure continues until all 8 bits of the operand are handled. Finally, the ND mask and the actual value of the resultReg are written to the destination register (cf. lines 24-25 in Listing 4.5). As aforementioned, Delayed Nondeterminism with Look Ahead can be applied to 32 out of 256 instructions. Although the individual realization of the Delayed Nondeterminism with Look Ahead approach for the remaining instructions may dier, the main idea remains the same. The interested reader is referred to the source code of the C51Simulator component for further details.
1 public 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 } v o i d v i s i t ( ANL_Direct_Const i n s t r u c t i o n ) { i f ( mcu . i s A d d r e s s D e t e r m i n i s t i c ( i n s t r u c t i o n . a d d r e s s ) ) { super . v i s i t ( instruction ); return ; } C51Register resultReg = new C51Register ( " " , 0 ) ; b o o l e a n bitA , tbdA , bitB , tbdB ; for ( int i = bitA tbdA bitB tbdB 0 ; i < C 5 1 U t i l i t i e s .STD_REG_LENGTH; i ++) { = mcu . g e t R e g i s t e r B y A d d r e s s ( i n s t r u c t i o n . a d d r e s s ) . b i t G e t ( i ) ; = mcu . g e t R e g i s t e r B y A d d r e s s ( i n s t r u c t i o n . a d d r e s s ) . bitGetTBD ( i ) ; = C 5 1 U t i l i t i e s . extractBitFromByte ( i n s t r u c t i o n . constant , i ) ; = false ;

/ A=0, B =nd > RES = 0 / i f ( ( b i t A == f a l s e && tbdA == f a l s e && tbdB == t r u e ) / A =nd , B=0 > RES = 0 / ( tbdA == t r u e && b i t B == f a l s e && tbdB == f a l s e ) ) { resultReg . bitSetTo ( i , fa lse ) ; } else { r e s u l t R e g . bitSetTBD ( i , t r u e ) ; }

} mcu . g e t R e g i s t e r B y A d d r e s s ( i n s t r u c t i o n . a d d r e s s ) . setValueAndTBD ( r e s u l t R e g . g e t V a l u e ( ) , r e s u l t R e g . getTBDMask ( ) ) ;

Listing 4.5: The Delayed Nondeterminism with Look Ahead visitor pattern for the ANL [direct, #immediate] instruction. Summarized, the presented approach of Delayed Nondeterminism with Look Ahead helps to avoid the generation of successor states whenever a microcontroller executes logic operations by taking advantage of the 3-valued memory representation of the [mc]square model checker.

4 Abstraction Techniques

4.2.3 Nondeterministic Program Status Word

Another abstraction technique implemented by the C51Simulator is termed Nondeterministic Program Status Word. Nondeterministic Program Status Word was rst presented in [86]. This approach moves model checking with [mc]square slightly towards abstract interpretation. In abstract interpretation, a single instruction is only partially executed without performing all the included calculations as dened by the instruction set manual. Thus, the aim of abstract interpretation is to gather information about the semantics of the executed program rather than exploring the program in all its details. In direct comparison to Delayed Nondeterminism and Delayed Nondeterminism with Look Ahead, the abstraction technique Nondeterministic Program Status Word can also be applied to arithmetic operations. In this approach, the generation of successor states is avoided by introducing massive over-approximations. Again, the concept of the 3-valued memory model is the basis for the abstraction technique. In the following, an example is presented to explain the main idea of Nondeterministic Program Status Word. Example The instruction ADDC [A, R0] (Add register to Accumulator with carry) with the opcode 0x38 is dened as follows [46]: ADDC simultaneously adds the byte variable indicated, the carry ag and the Accumulator contents, leaving the result in the Accumulator. The carry and auxiliary-carry ags are set, respectively, if there is a carry-out from bit 7 or bit 3, and cleared otherwise. When adding unsigned integers, the carry ag indicates that an overow has occurred. OV is set if there is a carry-out of bit 6 but not out of bit 7, or a carry-out of bit 7 but not out of bit 6 otherwise OV is cleared. When adding signed integers, OV indicates a negative number produced as the sum of two positive operands, or a positive sum from two negative. As stated in the instruction set manual, the ADDC [A, R0] operation aects the following ags: C the carry ag OV the overow ag AC the auxiliary carry ag P the parity ag (is set implicitly whenever the Accumulator is written) Furthermore, the following memory locations are involved: The Accumulator A, containing the rst operand and serving as destination register after the calculation The working register R0, holding the second operand

4.2 Implementation Abstraction Techniques for the C51Simulator

Whenever one of the operands, i.e., A or R0, contains nondeterministic bits, [mc]square will force the C51Simulator to create successor states by replacing nondeterminism with actual values and performing the ADDC [A, R0] operation one after another. However, the Nondeterministic Program Status Word approach avoids the generation of successor states in this case by setting involved memory locations to nondeterministic. For the ADDC [A, R0] operation, Nondeterministic Program Status Word sets the Accumulator A and the ags C, OV, AC, and P to nondeterministic. The second operand R0 is not modied, since it is not actively written by the operation. This is detailed in Table 4.7. Binary value ND-mask before executing ADDC [A, R0] Accumulator A b 11100000 b 00000000 Working register R0 b 00001111 b 11110000 Flags C OV AC P b 0001 b 0000 after executing ADDC [A, R0] Accumulator A b 00000000 b 11111111 Working register R0 b 00001111 b 11110000 Flags C OV AC P b 0000 b **** Location Ternary value 11100000 ****1111 0001 ******** ****1111 ****

Table 4.7: The ADDC [A, R0] example. Thus, Nondeterministic Program Status Word avoids to create successor states even for arithmetic instructions, leading to additional savings in the state space. Nevertheless, whenever a program branching instruction such as JC (Jump if Carry ag set) is encountered and the carry ag itself is nondeterministic, two successors are generated to maintain a sound over-approximation. Even though the contribution of this particular abstraction technique to state space reduction is tremendous, additional behavior is added which might not be present when executing the program on the real target hardware (cf. Table 4.2 for a rough overview). Implementation Nondeterministic Program Status Word is again implemented using a visitor pattern. As an example, the corresponding visitor patterns for the ADDC [A, R0] operation are discussed in the following. As shown in Listing 4.6, the Delayed Nondeterminism visitor pattern executes the instruction as specied in the manual. First, the two operands are read and the addition is performed afterwards. Then, the corresponding ags are set (cf. source code lines 25-55). Finally, the result is written back to the Accumulator. Again, for this particular instruction, the Delayed Nondeterminism visitor pattern behaves exactly like plain state space building without any abstraction at all.
1 2 3 4 5 6 7 8 9 10 11 12 public void int int int int v i s i t (ADDC_A_Rn i n s t r u c t i o n ) { tmp0 = 0 x00 ; tmp1 = 0 x00 ; tmp2 = 0 x00 ; tmp3 = 0 x00 ;

tmp1 = mcu . r e a d A c c u m u l a t o r ( ) ; tmp2 = mcu . r e a d W o r k i n g R e g i s t e r ( i n s t r u c t i o n . regNumber ) ; / I f c a r r y f l a g s e t , then add 1 t o t h e i f ( mcu . psw . b i t G e t (C51PSW .FLAG_CY) ) { tmp0 = 1 ; r e s u l t /

4 Abstraction Techniques

13 14 15 16 17 18 19 20 21 22 23 } 24 25 p r i v a t e 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 }

} / Perform A ddition / tmp3 = tmp0 + tmp1 + tmp2 ; / S e t c o r r e s p o n d i n g f l a g s / setFlagsForADDC ( tmp0 , tmp1 , tmp2 , tmp3 ) ; tmp3 &= C 5 1 U t i l i t i e s .MASK_SCALE_TO_BYTE_VAL; mcu . w r i t e A c c u m u l a t o r ( tmp3 ) ;

v o i d setFlagsForADDC ( i n t tmp0 , b o o l e a n newCarry = f a l s e ; b o o l e a n carryAtPos6 = f a l s e ; b o o l e a n newACarry = f a l s e ;

i n t tmp1 ,

i n t tmp2 ,

i n t tmp3 ) {

/ C: Check i f t h e r e i s a carryout a t b i t 7 / newCarry = ( ( tmp3 & C 5 1 U t i l i t i e s .MASK_CARRY_CHECK) == C 5 1 U t i l i t i e s .MASK_CARRY_CHECK) ; i f ( newCarry ) { mcu . psw . b i t S e t T o (C51PSW .FLAG_CY, t r u e ) ; } else { mcu . psw . b i t S e t T o (C51PSW .FLAG_CY, f a l s e ) ; } / AC: Check i f t h e r e i s a carryout a t b i t 3 / newACarry = ( ( ( tmp1 & C 5 1 U t i l i t i e s .MASK_EXTRACT_LOWER_NIBBLE) + ( tmp2 & C 5 1 U t i l i t i e s .MASK_EXTRACT_LOWER_NIBBLE) + tmp0 ) > C 5 1 U t i l i t i e s .MASK_EXTRACT_LOWER_NIBBLE) ; i f ( newACarry ) { mcu . psw . b i t S e t T o (C51PSW .FLAG_AC, } else { false ); } / OV: Check i f Overflow B i t s h o u l d be s e t / c a r r y A t P o s 6 = ( ( ( ( tmp1 & 0 x7F ) + ( tmp2 & 0 x7F ) + tmp0 ) & 0 x80 ) == 0 x80 ) ; if ( ( c a r r y A t P o s 6 ) ^ ( newCarry ) ) { mcu . psw . b i t S e t T o (C51PSW .FLAG_OV, mcu . psw . b i t S e t T o (C51PSW .FLAG_OV, } mcu . psw . b i t S e t T o (C51PSW .FLAG_AC,

true ) ;

true ) ; false );

} else {

Listing 4.6: The Delayed Nondeterminism visitor pattern for the ADDC [A, R0] instruction. Listing 4.7 shows the Nondeterministic Program Status Word visitor pattern. First, the algorithm checks if all included memory locations are deterministic. If so, the instruction is executed as specied in the instruction set manual and the algorithm returns. In case that nondeterministic memory locations are included, no calculation is performed at all. However, the visitor pattern marks the modied memory location as ND, thus, implementing the concept as described in Table 4.7.
1 public 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 } v o i d v i s i t (ADDC_A_Rn i n s t r u c t i o n ) { i f ( mcu . i s A c c u m u l a t o r D e t e r m i n i s t i c ()&& mcu . i s A d d r e s s D e t e r m i n i s t i c ( mcu . g e t A d d r e s s F o r W o r k R e g i s t e r ( i n s t r u c t i o n . regNumber))&& mcu . getPSW ( ) . i s C a r r y D e t e r m i n i s t i c ( ) ) { super . v i s i t ( instruction ); return ; } / S e t i n v o l v e d r e g i s t e r s t o ND / mcu . setTBD ( C 5 1 U t i l i t i e s . REGISTER_ACC, 0 x f f ) ; mcu . getPSW ( ) . bitSetTBD (C51PSW .FLAG_CY, t r u e ) ; mcu . getPSW ( ) . bitSetTBD (C51PSW .FLAG_AC, t r u e ) ; mcu . getPSW ( ) . bitSetTBD (C51PSW .FLAG_OV, t r u e ) ; mcu . getPSW ( ) . bitSetTBD (C51PSW .FLAG_P, t r u e ) ;

Listing 4.7: The Nondeterministic Program Status Word visitor pattern for the ADDC [A, R0] instruction. Summarized, Nondeterministic Program Status Word shifts the model checking approach of [mc]square further to the idea of abstract interpretation, i.e., not executing the instructions with all its details. The broad application of over-approximation, by marking involved memory locations as nondeterministic, helps to further shrink the resulting state

4.2 Implementation Abstraction Techniques for the C51Simulator

space. This is an interesting observation since the massive use of over-approximation due to the Nondeterministic Program Status Word concept decreases the state space drastically. This is against what one would expect without knowing the internals of the [mc]square approach and the underlying 3-valued memory model. Nevertheless, Nondeterministic Program Status Word introduces behavior which may not exist in reality, thus, may yield false-negatives during model checking.

4 Abstraction Techniques

5 Static Analysis
When I use a model checker, it runs and runs for ever and never comes back. . . when I use a static analysis tool, it comes back immediately and says I dont know. (Patrick Cousot)

The following chapter focuses on static analysis of embedded systems assembly code. First, a brief introduction to Control Flow Graphs (CFGs) and data-ow analyses is given. Next, the [mc]square static analysis framework and relevant internals are presented. Then, the adaption and implementation of regular data-ow analysis for the Intel MCS-51 microcontroller and an algorithm for CFG building are described. Later on, a novel dataow analysis concerning the particular architectural feature of register bank swapping is discussed in length. Finally, remaining challenges in static analysis of Intel MCS-51 assembly code are pointed out and possible approaches to overcome them are stated.

5.1 Background Static Analysis of Embedded Systems Code

The classical use of static code analysis is to facilitate the construction of compilers generating optimal code. Compiler optimizations aim to minimize the run-time of a program, the amount of memory occupied, and especially for the embedded systems domain the power consumption. Similar concepts of analyzing source code without actually executing it are used by static code analysis for verication issues. However, the intention is a rather dierent one. In traditional static source code analysis one focuses on software inspection, checking the code against syntactical standards, and automated program analysis. Available static analysis tools are taking up the cause of automatically revealing software aws and supporting the development team to obtain reusable, structured and easy maintainable code. For that purpose, it is often sucient to focus on low-tech static analysis such as automatic software inspection or checking for syntactical standards. In the presented approach, high-tech static analysis such as data-ow analysis and nite-state verication of CFGs are utilized. The role of static code analysis in [mc]square is to compute information about the source code under verication that helps the model checker to reduce the state space [9]. In the following, the underlying concepts needed for static code analysis are described.

5 Static Analysis

5.1.1 Control Flow Graphs

A CFG is essential to most static code analysis. It is a representation of all paths that might be traversed through a program while it is executed. The CFG is a directed graph where the vertices represent basic blocks and edges present possible transfers of control ow from one basic block to another. For instance, transfer of control ow is induced by program branching instructions. Every CFG has two designated nodes through which all control ow enters and leaves the graph, i.e., the entry and the exit node. A basic block is a straight-line sequence of code with a single entry point and only one exit point, i.e., instructions within a basic block are executed consecutively without interruption. In our approach, a single node in the CFG correlates to a single instruction of the program memory. The process of generating a CFG out of program code is discussed by the example code in Listing 5.1. To simplify matters, the WHILE language is used that serves as a rudimentary, imperative, Pascal-like language as dened in [81].
1 2 3 4 5 6 7

[ y:=x ] ; [ z :=1]; while [ y >0] do [ z := z y ] [ y:=y 1] od ; [ y:=0];

Listing 5.1: Source code used for CFG building. The source code uses three variables, a few assignments and a conditional program branch. The resulting CFG is shown in Figure 5.1. It is composed out of six vertices and eight edges. entry [y:=x]

[z:=1]

[y>0]

[y:=0]

exit

[z:=z*y]

[y:=y-1] Figure 5.1: The resulting CFG for Listing 5.1. There is an edge from the entry to the rst executable node of the CFG, that is, to the node coming from the rst instruction of the program memory. There is an edge to the exit from any node that contains an instruction that could be the last executed instruction of

5.1 Background Static Analysis of Embedded Systems Code

the program. If the nal instruction of the program is not an unconditional jump, then the node containing the nal instruction is one predecessor of the exit node. The same applies to any node that includes a jump to code that is not inside the valid program memory range. CFGs and their importance for various compiler optimizations are discussed in length in [87]. In summary, a CFG is a representation of all paths that might be traversed through a microcontroller program.

5.1.2 Data-ow Analysis

Data-ow analysis refers to a collection of techniques used to gather information about the ow of data along all possible execution paths of a program. Data-ow analysis uses the CFG in order to obtain knowledge about: Assignments that produced the value of a variable at a certain program point Variables that contain values that are no longer used in the remaining program Range of possible values of variables at a certain program point Run time values of variables and their dependencies among each other Basically, depending on the kind of information that is in the focus of the analysis, two rudimentary data-ow concepts exist. The dierence is described by the direction the analysis traverses the CFG. Forward data-ow analysis propagates values forward through the CFG following the ow of control. It starts at the entry node and follows all paths until it reaches the exit node. Each node in the CFG has a transfer function, i.e., the semantics of the instruction. We denote the value of a variable prior to a node as entry value, and the value of the variable after the node as its exit value. Values ow from program points after predecessor nodes to program points before successor nodes. At joint points, values are combined using a join function. Backward data-ow analysis propagates values backward through the CFG against the ow of control. It starts at the exit node and follows all paths in the reverse direction until it reaches the entry node. Each node in the CFG has a transfer function, i.e., the semantics of the instruction. We denote the value of a variable prior to a node as exit value, and the value of the variable after the node as its entry value. Values ow from program points before successor nodes to program points after predecessor nodes. At solid points, values are combined using a join function. Data-ow analysis are the basis for classical intraprocedural analysis, such as Reaching Denition Analysis (RDA) and Live Variable Analysis (LVA). In the following, these analyses are described.

5.1.3 Forward Data-ow Analysis - RDA

The aim of RDA is to determine for each program location, which assignments may have been made and not overwritten, when program execution reaches this location along some path [81]. In other words, the reaching denitions for a given program location are those

5 Static Analysis

entry [y:=x] Forward data-ow [z:=1] [z:=1] [y:=x]

entry

Backward data-ow

[y>0]
entry value

[y:=0] exit

[y>0]
exit value

[y:=0] exit

[z:=z*y]
exit value

[z:=z*y]
entry value

[y:=y-1]

(a) Forward data-ow analysis.

(b) Backward data-ow analysis.

Figure 5.2: Data-ow analysis. assignments that may have dened the current value of variables. RDA aims at answering the following questions [88]: Which denitions of variable x reach a given use of x in an expression? Is x used anywhere before it is dened? A denition of a variable x is an operation that assigns, or may assign, an actual value to x. Furthermore, a denition R reaches a program location l if there is a path from the point immediately following R to l such that the denition R is not redened along the path [89]. A variable is redened between two program locations whenever there is an assignment that denes a new value of that variable. Considering the given example code in Listing 5.2, the denition of code line 1 reaches line 2, but the denition made at code line 3 does not reach code line 5 since y is redened in assignment 4.
1 2 3 4 5

[ y:=3]; [ z :=y + 1 ] ; [ y:=3]; [ y:=6]; [ z :=y + 1 ] ;

Listing 5.2: RDA example code. The aforementioned informal statements about reaching denitions can be expressed as data-ow equations [81, 9] (Note that l denotes the current program location and l its successor):

RDentry (l) =

if l is the nal state, {RDexit (l ) | l successor of l} otherwise.

RDexit (l) = (RDentry (l) \ killRD (l)) genRD (l)

5.1 Background Static Analysis of Embedded Systems Code

The presented data-ow equations use two assistant functions, i.e., killRD (l) and genRD (l), respectively. Whereas genRD (l) represents a set of denitions created by an operation at program location l, the term killRD (l) represents a set of denitions destroyed by an operation. RDentry (l) contains the set of denitions that are reaching the entry of program location l. The set of denitions that are reaching the exit of program location l are contained in RDexit (l). For the example code shown in Listing 5.3, the results of killRD (l) and genRD (l) as well as the results for RDentry (l) and RDexit (l) are gradually performed and listed in Table 5.1.3. The presented example is based on an example given in [81]. The statement at program location 3 (l = 3) simply checks whether the variable x is greater than a constant value, thus, both the killRD (3) and the genRD (3) function do not yield any results. Most important, the evaluation of RDentry (3) reveals that immediately before entering location 3, variable x was dened either at program location 1, denoted by (x, 1) or location 5, denoted by (x, 5). Location 5 is included for the case that the body of the while loop was previously executed. The result of the RDA is an over-approximation of denitions reaching this location. That is a mapping indicating for each variable where it was possibly written the last time.
1 2 3 4 5 6

[ x:=5]; [ y:=1]; while [ x >1] do [ y:=xy ] [ x:=x 1] od ;

Listing 5.3: RDA example code. l 1 2 3 4 5 killRD (l) (x, ?), (x, 1), (x, 5) (y, ?), (y, 2), (y, 4) (y, ?), (y, 2), (y, 4) (x, ?), (x, 1), (x, 5) genRD (l) (x, 1) (y, 2) (y, 4) (x, 5) RDentry (l) (x, ?), (y, ?) (y, ?), (x, 1) (x, 1), (y, 2), (y, 4), (x, 5) (x, 1), (y, 2), (y, 4), (x, 5) (x, 1), (y, 4), (x, 5) RDexit (l) (y, ?), (x, 1) (x, 1), (y, 2) (x, 1), (y, 2), (y, 4), (x, 5) (x, 1), (y, 4), (x, 5) (y, 4), (x, 4)

Table 5.1: Results after solving data-ow equations for source code Listing 5.3. In the case of [mc]square, RDA is used to obtain a set of possible values of memory locations within the microcontroller simulator. These variables are used as input data for further analysis, such as the global Interrupt Flag Analysis (IFA) [9] and the Register Bank Analysis (RBA) (see Sections 5.2.6 and 5.2.8).

5.1.4 Backward Data-ow Analysis - LVA

The aim of LVA is to determine for each program location, which variables may be live at the exit from that location [81]. A variable is considered as live at a program location if its current value may be read during the remaining execution of the program. In compiler optimizations, LVA is important for register allocation within basic blocks. After a value is computed in a register, and presumably used within a block, it is not

5 Static Analysis

necessary to permanently store that value if it is known to be dead at the end of the block [87]. It is important to note, that LVA is used within [mc]square with a dierent intention. The results obtained from the LVA are used by the model checker to combine single states that do only dier in the value of dead memory locations. Dead memory locations can be reseted and states that only dier in dead memory locations can be merged into single states. Thus, this analysis contributes to state space reduction and helps to contain over-approximation by the model checker. As aforementioned, information ow for LVA travels backwards through the CFG opposite to the ow of control since the analysis aims to prove that the use of a variable at program location l is propagated to all points prior to l along an execution path, so that one may know at the prior point l that the variable will have its value used. Similar to RDA, data-ow equations are used to express the LVA problem [81, 9] (l denotes the current program location and l its predecessor):

LVexit (l) =

if l is the nal state, {LVentry (l ) | l predecessor of l} otherwise.

LVentry (l) = (LVexit (l) \ killLV (l)) genLV (l)

Within the LVA, killLV (l) represents a set of variables dened by an operation at program location l whereas genLV (l) represents a set of variables that are consumed by an operation. LVentry (l) contains the set of variables that are live at the entry of program location l. The set of variables that are live at the exit of program location l are contained in LVexit (l).
[ x:=2]; [ y:=4]; [ x:=1]; ( if [ y>x ] then [ z :=y ] else [ z :=yy ] ) ; [ x:= z ]

1 2 3 4 5 6 7

Listing 5.4: LVA example code (cf. [81]).

Next, the functions killLV (l) and genLV (l) are evaluated for each location of the program shown in Listing 5.4. The results are used and applied to the data-ow equation, resulting in the following statements.

5.1 Background Static Analysis of Embedded Systems Code

LVentry (1) = (LVexit (1) \ killLV (1)) genLV (1) = LVexit (1) \ {x} = { } LVentry (2) = (LVexit (2) \ killLV (2)) genLV (2) = LVexit (2) \ {y} = { } LVentry (3) = (LVexit (3) \ killLV (3)) genLV (3) = LVexit (3) \ {x} = {y} LVentry (4) = (LVexit (4) \ killLV (4)) genLV (4) = LVexit (4) {x, y} = {x, y} LVentry (5) = (LVexit (5) \ killLV (5)) genLV (5) = (LVexit (5) \ {z}) {y} = {y} LVentry (6) = (LVexit (6) \ killLV (6)) genLV (6) = (LVexit (6) \ {z}) {y} = {y} LVentry (7) = (LVexit (7) \ killLV (7)) genLV (7) = {z} LVexit (1) = LVentry (2) = { } LVexit (2) = LVentry (3) = {y} LVexit (3) = LVentry (4) = {x, y} LVexit (4) = LVentry (5) LVentry (6) = {y} LVexit (5) = LVentry (7) = {z} LVexit (6) = LVentry (7) = {z} LVexit (7) = { } For example, the term LVentry (4) corresponds to the statement [y>x] found at source code line 4 (cf. Listing 5.4). LVentry (4) evaluates to {x, y}, revealing that at the entry point of that particular program location the only two variables live are x and y. Furthermore, LVentry (5) evaluates to {y} indicating that y is still alive immediately before statement [z:=y]. For the chosen example, LVexit (l) and LVentry (l ) yield the same results, arising from the fact that the presented example code for the sake of simplicity lacks any kind of program loops. l 1 2 3 4 5 6 7 killLV (l) x y x z z x genLV (l) x, y y y z LVentry (l) y x, y y y z LVexit (l) y x, y y z z

Table 5.2: Results after solving LVA data-ow equations for source code Listing 5.4. The annotated CFG of the example is shown in Figure 5.3. In here, it becomes obvious that there is no variable marked as live in the rst program location, i.e., the statement [x:=2]. As a result, the rst assignment of value 2 to variable x is superuous and can be neglected. The resulting and minimized CFG is shown in Figure 5.3(c). In compiler theory, such a reduced CFG would lead to a smarter code that can be generated by the compiler backend.

entry [x:=2] LVentry = {} LVexit = {y} LVA ow [x:=1] LVentry = {y} LVexit = {x, y} LVentry = {x, y} LVexit = {y} LVentry = {y} LVexit = {z} exit [x:=z]
(a) CFG for Listing 5.4.

LVentry = {} LVexit = {}

entry [y:=4]

[y:=4]

[x:=1]

[y>x]

[y>x] LVentry = {y} LVexit = {z} LVentry = {z} LVexit = {}

(b) Evaluated LVA equations.

[z:=y]

[z:=y*y]

[z:=y]

[z:=y*y]

exit [x:=z]

(c) Minimized CFG.

5 Static Analysis

Figure 5.3: LVA example.

5.1 Background Static Analysis of Embedded Systems Code

However, [mc]square does not use this data-ow information to generate any code but would now mark the memory location where variable x is saved as dead, thus, resetting the memory location to its initial value. Resetting a memory location to its initial value increases the probability of nding equal states within the generated state space. Equal states do not contribute to the expansion of the state space size, since the model checker simply adds an additional edge to the state space graph.

5.1.5 Solving Data-ow Equations

Having discussed the principles of data-ow based static analysis, the issue of solving those equations is still not addressed. In general, algorithms are used that result in the least xed point of the equations. A least xed point is the solution of the equations whose assigned values of, e.g., RDexit (l)s and RDentry (l)s are contained in any other solution to the equations. A set S S is a xed point of a function : P(S) P(S) if (S ) = S [28]. In the presented approach, a xed point iteration algorithm, similar as the one given in [87], is used for solving the RDA data-ow equations. It works as follows. First, it is started with an estimate such that RDexit (entry) = 0 for all nodes l. Then, an iteration until the RDexit (l)s converge starts, i.e., when there are no new results for the RDexit s. Thus, a boolean value is used to track changes of RDexit s for every iteration. The algorithm is sketched in Algorithm 1. Line 1 initializes data-ow values and lines 2-4 contain the loop responsible for iterating until convergence and lines 5-10 apply the data-ow equations. Algorithm 1: A xed point iterating algorithm to solve data-ow equations for the RDA problem [87]. Input : A CF G with killRD (l) and genRD (l) resolved for each node. Result: RDentry (l) and RDexit (l), the set of denitions reaching the entry and exit of each node l CFG.
1 2 3 4 5 6 7 8 9 10

RDexit (entry) = 0; foreach Node l other than entry do RDexit (l) = 0; end while any RDexit (l) changes do foreach Node l other than entry do RDentry (l) = RDexit (l ) | l predecessor of l; RDexit (l) = (RDentry (l) \ killRD (l)) genRD (l); end end

For the LVA a similar algorithm is used. As aforementioned, information ow travels backwards through the control ow in the CFG, thus, the LVA algorithm starts by initializing LVentry (exit) = 0 and the sets LVentry and LVexit have their roles interchanged as shown in Algorithm 2. For more details on the theoretical background of data-ow analysis the interested reader is referred to relevant literature, such as [87, 81].

5 Static Analysis

Algorithm 2: A xed point iterating algorithm to solve data-ow equations for the LVA problem [87]. Input : A CF G with killLV (l) and genLV (l) resolved for each node. Result: LV entry (l) and LV exit (l), the set of denitions reaching the entry and exit of each node l CFG.
1 2 3 4 5 6 7 8 9 10

LV entry (exit) = 0; foreach Node l other than exit do LV entry (l) = 0; end while any LV entry (l) changes do foreach Node l other than exit do LV exit (l) = LV entry (l ) | l predecessor of l; LV entry (l) = (LV exit (l) \ killLV (l)) genLV (l); end end

5.2 Implementation Static Analysis for the C51Simulator

Having discussed the theoretical foundations of static analysis, the following section focuses on relevant implementation details. The aim of the static analysis in [mc]square is to support model checking by providing information that can be statically extracted from the source code. The generated annotations are used to reduce state spaces by limiting the over-approximation during model checking. One can think of making the model checker intelligent due to the additional knowledge extracted by the static analysis. Basically, [mc]square implements two kinds of analyses, namely (i) data-ow analysis and (ii) abstraction techniques. A rather high level sketch of the [mc]square static analysis framework is given in Figure 5.4.
data-ow analyses parser & preparation
RBA ... CFA SA RDA LVA IFA DVR PR model checker

abstraction techniques model checking

Figure 5.4: The [mc]square static analysis framework for the Intel MCS-51 target. The [mc]square static analysis framework is composed out of: Parser and preparation handles the interaction with the user. It accepts a compiled and linked *.hex le and a specication given in CTL. Furthermore, it parses common debug formats in order to preserve the connections between the analyzed assembler code and the source code le, which may be written in C, C++, Java, or any other high level language able to be compiled towards assembler machine code for the Intel MCS-51 microcontroller. As aforementioned, a complete and precise CFG is the basis

5.2 Implementation Static Analysis for the C51Simulator

for all further data-ow analysis. Consequently, the parser & preparation component is responsible of preparing the analyses and building the CFG. Data-ow analyses performs forward and backward oriented data-ow analyses, such as RDA and LVA. It uses the CFG to execute those analyses. The extracted reaching denitions are further used by the particular abstraction techniques in order to gather a better program comprehension. Further, it includes the novel RBA, a Stack Analysis (SA), and an Interrupt Flag Analysis (IFA). Abstraction techniques use the information gathered by the data-ow analyses to apply state space reductions. A technique called Dead Variable Reduction (DVR) is used to mark dead memory locations1 , prompting the model checker to reset certain memory locations whilst model checking. Another concept is Path Reduction (PR), which aims at combining single successor chains, e.g., of an ISR into a single state. Model checking uses the additional information about the veried program in order to reduce the overall system states. In the following, a rather conceptional description about the actual implementation of the various analyses into the [mc]square framework is given. For a more detailed insight, the interested reader is referred to the respective source code.

5.2.1 Overview
Currently, [mc]square is able of conducting the following static analyses: Control Flow Analysis (CFA) [81] Stack Analysis (SA) [9] Reaching Denition Analysis (RDA) [81] Interrupt Flag Analysis (IFA) [9] Live Variable Analysis (LVA) [81] Dead Variable Reduction (DVR) [81, 90] Path Reduction (PR) [91, 90] The execution order is depicted in Figure 5.4. The SA is used to track dependencies between values pushed onto and popped from the stack. For instance, the PSW is frequently pushed onto the stack at the beginning of a function and then read from the stack at the end. The status of interrupt registers is extracted from the reaching denitions, which then inuences the RDA in the next iteration. The RBA described in Section 5.2.6 interacts with the RDA and, in consequence, increases the precision of the RDA and the IFA. All analyses are designed as interprocedural analyses due to the peculiarities of assembly code. For instance, all memory locations can be accessed globally. Data-ow analyses in [mc]square consist of the following steps:
1

Recapitulating, a dead memory location is a memory location that is not used anymore in the further progression of the input program.

5 Static Analysis

(i) The static behavior of a function is determined, where the eects of function calls are ignored. (ii) The static behavior of a called function is propagated from the return statement of a callee into the call site. (iii) Data-ow information is propagated from a call site into a called function. All these steps run as xed point iterations2 to support recursive function calls. More details of this approach are described in [90, 9]. In what follows, the adaption of these existing static analysis techniques for the Intel MCS-51 target architecture is described.

5.2.2 Control Flow Graph Building

Static analysis starts with CFG building. Even at this early stage of analysis one has to take care about architectural peculiarities of the Intel MCS-51 target microcontroller. Since the Intel MCS-51 is a CISC based architecture, single instructions can be of dierent length. Some are one byte long, others are two bytes and a few are four bytes long. Thus, the program memory content cannot be divided into parts of equal length where each of these parts represent a single instruction. On the other hand, this is possible for (most) Reduced Instruction Set Computer (RISC) architectures. Consequently, the very humble approach of CFG building by linearly stepping through the program memory, may not cover all instructions executed by the CISC based target microcontroller and calls for a tailored treatment. Interestingly, modern compilers make use of this fact. A common technique especially with enabled favor size options applied by the compiler tries to reuse equal bytes of program memory for distinct purposes. For example, the lower two bytes of a three byte instruction may equal to another instruction of the microcontroller. In order to save program memory size, the compiler may insert a JUMP instruction to the entry point of the lower two bytes of the three byte instruction. This is especially eective when the shared program memory bytes are used at multiple points of the program.
1 2 3 4 5 6 7 8 9 10 11 12 13

C: 0 x0800 C: 0 x0802 C: 0 x0803 C: 0 x0805 C: 0 x0807 C: 0 x0900 C: 0 x0902 C: 0 x0903 C: 0 x0905 C: 0 x0907 C: 0 x0800 C: 0 x0900

78D3 E8 3520 6411 F4 D3 E8 3520 6411 F4

MOV MOV ADDC XRL CPL SETB MOV ADDC XRL CPL

R0,#0xD3 A, R0 A, 0 x20 A,#0 x11 A C A, R0 A, 0 x20 A,#0 x11 A

78 D3 E8 35 20 64 11 F4 D3 E8 35 20 64 11 F4 XX

Listing 5.5: Code sharing within the program memory.

A xed point iteration is the common approach to solve data-ow equations. Usually, the analyses are repeated until no change can be detected. Most time only the dierence between iterations are concerned, in order to avoid redundant steps. The interested reader is referred to [81, 87]

5.2 Implementation Static Analysis for the C51Simulator

To illustrate that behavior, consider Listing 5.5. In here, it is assumed that the compiler has already translated the high level code to assembly instructions. The program memory ranges 0x0800 - 0x0807 and 0x0900 - 0x0907 contain almost similar code. The only dierence is that the Carry ag is set at location 0x0900 before entering the calculations starting at locations 0x0802 and 0x0902, respectively. Thus, in order to save program memory space the compiler might now combine those similar blocks, by replacing the ve instructions located from 0x0900 to 0x0907 by an unconditional jump, e.g., an AJMP [0x0801], leading to a dynamic disassembly of the location 0x0801, which turns out to become the same sequence as when executing sequentially from 0x0900 to 0x0907. Thereby, the compiler can save ve bytes of program memory, since the AJMP instruction itself is two bytes long. Considering the discussed compiler optimization of sharing equal program memory bytes, it is not possible to sequentially decode the program memory, i.e., by iterating over the program memory and changing the index pointer by the instruction length. In the present case, a more elaborated approach to CFG building is needed. The implemented algorithm for building the CFG out of a given Intel MCS-51 program memory is given in Algorithm 3. Algorithm 3: CFG building algorithm for the Intel MCS-51 target. Input : A disassembled program memory content P. Result: An equivalent CF G representation for P.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

initialize CF G; foreach instruction I in P do if I is an indirect branching statement then quit CF G building; end add new node l to CF G; label node l with detail info of I; add new edge E from l to all its successors l ; end while unknown target addresses exist do foreach edge E in CF G do obtain target addresses A of E; if A does not point to an existing l in CF G then dynamically disassemble P at address A; obtain new instruction I = P(A); if I is an indirect branching statement then quit CF G building; end add new node l to CF G; label node l with detail info of I; add new edge E from l to all its successors l ; end end end

First, an object capable of storing a CFG is initialized. Then, sequential decoding of the program memory starts. For each instruction a node is added to the CFG.

5 Static Analysis

Moreover, edges are added from the current node to all its successors. For example, for non-program branching instructions one edge is added (from the current program counter location l to l representing the location with the program counter value of l + l.length). If an indirect branching statement is detected whilst decoding, CFG building is stopped, since one cannot guarantee anymore that the resulting CFG will be complete. Then, the iteration starts over as long as new branching targets are found that do not point to an existing node in the CFG. Later on, the target addresses are extracted and the program memory is re-decoded at this location. Finally, the dynamic decoded instructions are added to the CFG. The loop is left when all target addresses are resolved and map to a node in the CFG, thus, ensuring that the resulting CFG is complete.

5.2.3 Action List Building

The presented algorithms for solving data-ow equations expect a CFG as input with the statements kill(l) and gen(l) resolved for each program location l. Thus, the very rst action in the analysis is to evaluate the corresponding kill(l) and gen(l) statements for each node (instruction) in the CFG. This process is termed Action List Building. Later on, this information is used by all following data-ow analyses. From the implementation point of view, a visitor pattern is used with a visitor for each instruction dened in the instruction set manual [47] of the Intel MCS-51 microcontroller. For an instruction at program location l we dene: kill(l) involves any piece of memory written by the instruction. That can involve single bits, bytes, or whole memory areas. gen(l) involves any piece of memory read by the instruction. Again, that can be any bit inside the microcontroller, byte locations, or whole memory areas. In the following, a few examples of Action List Building for the Intel MCS-51 target are discussed. The behavior of the instructions below is dened in the datasheet as follows: CLR [C] Clear carry. The indicated bit is cleared (i.e., reset to zero). No other ags are aected [47]. MOV [dest, src] Move src byte to dest. The byte variable indicated by the second operand is copied into the location specied by the rst operand. The source byte is not aected. No other register or ag is aected [47]. ADDC [A, direct] Add direct byte to Accumulator with carry. Simultaneously adds the indicated byte variable, the carry ag, and the Accumulator contents, leaving the result in the Accumulator. The carry and auxiliary-carry ags are set, respectively, if there is a carry-out from bit 7 or bit 3, and cleared otherwise. When adding unsigned integers, the carry ag indicates an overow occurred. OV is set if there is a carry-out of bit 6 but not out of bit 7, or a carry-out of bit 7 but not out of bit 6, otherwise OV is cleared. When adding signed integers, OV indicates a negative number produced as the sum of two positive operands, or a positive sum from two negative operands [47].

5.2 Implementation Static Analysis for the C51Simulator

Table 5.3 shows the evaluated gen(l) and kill(l) statements for the instructions above. ADDC [A, direct] reads the Accumulator, the specied direct memory location, and the carry ag, thus, gen(l) evaluates to {A, direct, C}. Similar, the instruction writes the Accumulator and the ags carry, auxiliary-carry, overow, and parity, thus, kill(l) evaluates to {A, C, AC, OV, P}. Instruction CLR [C] MOV [dest, src] ADDC [A, direct] gen(l) src {A, direct, C} kill(l) C dest {A, C, AC, OV, P}

Table 5.3: Action List Building a few examples. The implementation of the instruction visitors is straightforward. Listing 5.6 shows the corresponding visitor for the mnemonic ADDC [A, direct]. The instruction visitors for the remaining instructions work quite similar to the introduced one and the interested reader is referred to the actual source code of [mc]square for details.
1 2 3 4 5 6 7 8 9 10 11 public v o i d v i s i t ( ADDC_A_Direct i n s t r u c t i o n ) { addSingleRead ( currentVertex . a c t i o n L i s t , addSingleRead ( currentVertex . a c t i o n L i s t , addPSWBitRead ( c u r r e n t V e r t e x . a c t i o n L i s t , addSingleWrite ( currentVertex addPSWBitWrite ( c u r r e n t V e r t e x addPSWBitWrite ( c u r r e n t V e r t e x addPSWBitWrite ( c u r r e n t V e r t e x addPSWBitWrite ( c u r r e n t V e r t e x } . . . . . actionList actionList actionList actionList actionList , , , , , t r u e , C 5 1 U t i l i t i e s . REGISTER_ACC ) ; true , i n s t r u c t i o n . address ) ; t r u e , PSW.CY ) ; true true true true true , , , , , C 5 1 U t i l i t i e s . REGISTER_ACC ) ; PSW.CY ) ; PSW.AC ) ; PSW.OV) ; PSW. P ) ;

Listing 5.6: The Action List Builder visitor pattern for the ADDC [A, direct] instruction.

5.2.4 Live Variable Analysis

For the implementation, a dedicated C51LVALatticeElement serves to describe memory locations that are live at a discrete node in the CFG. Within the C51LVALatticeElement memory locations are dened that are watched and modeled either on byte level or on bit level. The generic LVALatticeElement is extended by bitwise modeling of the PSW and the Interrupt Enable (IE) register. Bitwise modeling of the PSW is needed by the RBA and the bitwise model of the IE register is needed for the IFA, respectively. Property LVALatticeElement C51LVALatticeElement (architectural dependencies) Figure 5.5: The type hierarchy of the C51LVALatticeElement. Every node in the CFG has a C51LVALatticeElement attached. Its type hierarchy is given in Figure 5.5. The main class for LVA is the C51LVABuilder responsible for the microcontroller specic extensions of the generic LVA as used in [mc]square. The C51LVABuilder determines for each node in the CFG the set of live variables by iterating backwards over all nodes in the CFG. First, the behavior of a single node (i.e., an

5 Static Analysis

instruction) is determined and in a later step the behavior of functions and interrupts is propagated through their callers in the CFG. The C51LVABuilder applies a LVA on single nodes in the CFG. Usually, a CFG consists of several functions that are called by, e.g., the main routine. Moreover, especially in embedded systems software the use of interrupts is common, thus, a program may respond to various sources of interrupts. In principle there is no dierence between interrupts and functions, except that interrupts can occur at every location along the CFG, whereas a function is always explicitly called. Thus, it is not sucient to consider the set of live variables on the node level, a more broaden approach is needed to obtain usable results by the LVA. This broaden approach is realized by the base classes of C51LVABuilder, namely LVABuilder and BackwardProceduralAnalysis, respectively. The corresponding type hierarchy is given in Figure 5.6. Analysis BackwardProceduralAnalysis LVABuilder C51LVABuilder (architectural dependencies) Figure 5.6: The type hierarchy of the C51LVABuilder. Within those classes, [mc]square implements the propagation of LVA relevant behavior from single nodes in the CFG to their predecessors and successors, from functions back to their caller, and from ISR to all those locations within the CFG where interrupts are enabled.

5.2.5 Reaching Denitions Analysis

Similar to the LVA, a dedicated C51RDALatticeElement serves to store denitions made at discrete nodes of the CFG. Again, bitwise modeling for the PSW and the IE register is performed. The motivation is the same as for LVA. The reaching denitions are mapped into single nodes of the CFG by attaching a C51RDALatticeElement object. The type hierarchy of C51RDALatticeElement is given in Figure 5.7. Property RDALatticeElement C51RDALatticeElement (architectural dependencies) Figure 5.7: The type hierarchy of the C51RDALatticeElement. In order to respect architectural features of the Intel MCS-51 microcontroller, the generic framework for RDA of [mc]square is extended by C51RDABuilder, as shown in Figure 5.8.

5.2 Implementation Static Analysis for the C51Simulator

Analysis ForwardProceduralAnalysis RDABuilder C51RDABuilder (architectural dependencies) Figure 5.8: The type hierarchy of the C51RDABuilder. More details on the conceptional approach are given in [9], for implementation details the reader is referred to the actual source code of [mc]square.

5.2.6 Register Bank Analysis

Most of the aforementioned analysis techniques can be applied to the Intel MCS-51 target without major modications. Particular architectural features, however, need to be considered explicitly in order to obtain usable analysis results. In the following, a concept termed Register Bank Analysis (RBA) is introduced. With this concept, the problem of additional over-approximation in the existing data-ow analyses is tackled. The over-approximation originates from the architectural feature of register bank switching as available on the Intel MCS-51 microcontroller. The following is mainly based on our work published in [14]. The internal RAM area of the Intel MCS-51 is separated into three main sections, i.e., (a) the register bank area, (b) the bit-addressable area, and (c) the general user RAM area. The register bank area is located within the bottom 32 bytes of internal data memory, resulting in four 8 byte wide register banks. A certain register bank may be selected by the application software through modifying the register bank selection bits. These bits are termed RS0 and RS1 and both are located in the PSW of the microcontroller. More details of architectural features of the Intel MCS-51 are given in its documentation [46]. The bits RS0 and RS1 act as register bank pointer indicating the active register bank. Register banks and possible register bank pointer congurations are detailed in Table 5.4. IRAM Addresses 0x00 . . . 0x07 0x08 . . . 0x0F 0x10 . . . 0x17 0x18 . . . 0x1F Register Bank Memory R0 {0, 1, 2, 3, 4, 5, 6, R1 {0, 1, 2, 3, 4, 5, 6, R2 {0, 1, 2, 3, 4, 5, 6, R3 {0, 1, 2, 3, 4, 5, 6, Bank Selection Pointer Bank RS0 RS1 0 0 0 1 0 1 2 1 0 3 1 1

7} 7} 7} 7}

Table 5.4: Register bank congurations of the Intel MCS-51. Register bank swapping is a frequent approach taken by the compiler for passing data to functions or for saving status information before entering ISRs. Conducting a register bank swap over pushing values of memory locations onto the stack before entering an ISR minimizes interrupt latency, and thus, it is the favored approach for time-critical interrupts. Programs with embedded assembly code, however, can change the register bank at any program location by bit-wise writing the register bank pointer. Knowing the actual value

5 Static Analysis

of the register bank pointer is a decisive criterion for the precision and usefulness of further analysis, such as LVA and RDA. For example, in case that a variable resides within memory area (a) of the microcontroller, the analysis results can be signicantly sharpened if precise values of the register bank pointer are determined. Consider the Intel MCS-51 instruction MOV [R0, #const], which copies an immediate value to the working register R0 of the currently active register bank. Apparently, MOV [R0, #const] reads the immediate #const and writes the working register R0. killLV () evaluates to R0 and genLV () to #const, respectively. In order to assign R0 to a certain register bank, however, there is a need for special treatment of the register bank pointer composed out of the control bits RS0 and RS1. Motivation Missing prior information about the actual values of RS0 and RS1 is cumbersome. Any following data-ow analysis suers from the generated over-approximation due to the unknown value of the register bank pointer. This eect is detailed in Table 5.5. For example, if only a single bit of the register bank pointer is ambiguous (either or , see Section 5.2.6 for denition), none of the working registers can be marked as killed since the active register bank is unknown. The actual working register may be located at register banks 0, 1, 2, or 3. As mentioned in Section 5.1.4, dead variables are reset during state space building, thus, leading to a greater number of equal states that can be merged. Therefore, a bit-wise analysis of the bank selection pointer seems worthwhile and actively contributes to smaller state spaces. Bank Selection Pointer Banks RS0 RS1 0 0 0 1 0 1 2 1 0 3 1 1 {0, 2} / 0 {1, 3} / 1 {0, 1} 0 / {2, 3} 1 / {0, 1, 2, 3} / / killLV () Register Bank Memory R0 {0} R1 {0} R2 {0} R3 {0} R0 {0},R2 {0} R1 {0},R3 {0} R0 {0},R1 {0} R2 {0},R3 {0} R0 {0},R1 {0},R2 {0},R3 {0}

Safe?

no no no no no

Table 5.5: Evaluating killLV () for MOV [R0, #const].

Bit-wise Modeling The register bank pointer is modeled at bit-level granularity to capture the eects of bit-wise operations on the PSW. For most of the other analyses, registers are modeled at byte-level granularity, which turned out to be accurate enough. Based on ideas from abstract interpretation [92], a single bit is represented using a complete lattice as shown in Figure 5.9. In the following, the lattice for a single bit depicted on the right-hand side is denoted by L1 .

5.2 Implementation Static Analysis for the C51Simulator

1 11 1

1 10 1

0 01 0

0 00 0 0
(b) For a single bit.

(a) For RS0 and RS1.

Figure 5.9: Bit-wise modeling of the register bank selection pointer. The lattice L1 is composed of the values 0 (false), 1 (true), a top element (all), and a bottom element (unknown). The top element represents a bit that may have the value 0 or 1, and the bottom element states that no information is available at all. A 4-valued approach of bit-wise modeling is required since merging dierent paths in the CFG forces the analysis to generate a safe over-approximation. Branches in the CFG origin from conditional branching instructions, which change the ow of program execution. Examples are JZ (jump if accumulator zero), CJNE (compare jump if not equal), and DJNZ (decrement jump if not zero). Merging multiple predecessors in the CFG and combining their individual contributions at conuence points is performed by a join-operation as illustrated in Figure 5.10. the join-operator 0 1 0 0 / 1 / 1 / / / DJNZ / / / / SETB join MOV

Figure 5.10: The join-operator and a simple CFG.

Formal Description In the presented approach, the existing RDA is extended to analyze the register bank pointer at bit-level granularity. In a rst run of the analysis the reaching denitions for the bits RS0 and RS1 are gathered by a RDA at bit-level. In the rst pass, all register banks are assumed to be active in each program location. Then, in further iterations, the application of the join-operator introduced in Table 5.5 leads to more precise results. The join-operator is implicitly encoded in the equations explained in the remainder of this section. In the following, the notation of Nielson et al. [81] is used for the denition of the functions genRBA and killRBA , which form the basis of the extension of the RDA. The function : L1 2{0,1} is used to project lattice elements representing register bank congurations to the domain of values they represent.

5 Static Analysis

{0, 1} if r = , (r) = if r = , r otherwise. The function is used in function : L1 L1 2{0,1,2,3} , which computes integer representations of possible register bank congurations. Here, (r0 , r1 ) returns the set of all register banks that may be active due to the values of r0 and r1 .

(r0 , r1 ) = {2 x0 + x1 |x0 (r0 ), x1 (r1 )} A reaching denition is a pair (v, ), where v represents a memory location or a register and represents an instruction. Reaching denitions with register bank analysis are computed in several iterations. In the following, the values of RS0 and RS1 in program location after the ith iteration are denoted by Ri () and Ri (). They can be exRS0 RS1 tracted from the results of the i-iteration of the analysis. It is initially R0 = R0 = , RS0 RS1 which means that all register banks are assumed to be active. This leads to a conservative over-approximation of reaching denitions in the rst iteration. For an assignment to R? {k}, for instance through an instruction MOV [R0, #0x80], a reaching denition for register R0 on register bank b, denoted by Rb {k}, is generated in program location using genRBA , if there exists a register bank conguration b. The notation R? {k} denotes that from the instruction itself, no knowledge about the active register bank is present.

geni+1 () = {(Rb {k}, )|R? {k} is assigned a value RBA in r0 Ri () RS0 r1 Ri () b (r0 , r1 )} RS1 In case the register bank conguration is ambiguous, an over-approximation of the real behavior is generated because a reaching denition is generated for each possible register bank conguration. In like manner, a reaching denition is deleted by killRBA only if the register bank conguration is unambiguous. This means, a denition can only be overwritten if only a single register bank conguration is possible. Otherwise, no reaching denitions can be killed in order to guarantee an over-approximation.

i+1 killRBA () = {(Rb {k}, )|R? {k} is assigned a value

in r0 Ri (), r1 Ri () : RS0 RS1 b (r0 , r1 ) | (r0 , r1 )| = 1} In case no assignment to a memory location addressable using register banks is found, the common equations for RDA are used.

5.2 Implementation Static Analysis for the C51Simulator

i killRDA () = {(R{k}, )|R{k} is assigned a value

in k 8 (R{k}, ) RDAi1 ()} A geni RDA () = {(R{k}, )|R{k} is assigned a value in k 8} The entry- and exit-functions for RDA using RBA are then expressed in such a way i that the specic equations geni RBA and killRBA are only used for those memory locations addressable through register banks. That is, these functions are only used for Rb {k} with 0 b 3 and 0 k 7. Hence, RBA is used for absolute memory addresses from 0x00 i to 0x1F. For all other memory locations, geni RDA and killRDA are used. RDAi () = (RDAi () \ A
i i (killRDA () killRBA ())) i geni RDA () genRBA ()

RDAi () = A

{RDAi ( )|( , ) CFG}

The results are rened in further iterations. Due to monotony3 , the results become smaller after each iteration and eventually stabilize after a nite number of iterations. In practice, a xed point was reached for all programs checked already after the second iteration, but it is possible to construct programs where more iterations are required. After each iteration, concrete values for RS0 and RS1 are extracted from the RDA if possible and used in the next iteration. Consequently, RBA is conducted at least twice: The rst time to collect reaching denitions for RS0 and RS1 and further times to rene the analysis results by actively using the previously extracted values of the register bank pointer for read and write accesses on working registers. For example, Ri () is a reaching denition for the bit RS0 at a RS0 certain program location , containing a set of all denitions geni RDA () detected for RS0 through the program. The denitions originate from the predecessors of in the CFG. In like manner, RBA also contributes to the precision of LVA by allowing a more precise reasoning about which values are read in which locations. This enhanced precision is demonstrated using an example in the following section. Example To highlight the eectiveness of the introduced RBA, the analysis of an assembler program that alters the register bank pointer is described. In particular, the contribution of RBA to the precision of LVA is evaluated. The program is given in Listing 5.7.
1 2 3 4 3

C: 0 x0000 C: 0 x0100 C: 0 x0100 C: 0 x0103

020100 75D000 7880

LJMP STARTUP: MOV MOV

STARTUP PSW,#0 x00 R0,#0 x80

A function f : L1 L2 between partial ordered sets L1 = (L1 , 1 ) and L2 = (L2 , l, l L1 : l 1 l f (l) 2 f (l ). denotes partial ordering [81].

is monotone if

5 Static Analysis C: 0 x0105 C: 0 x0105 C: 0 x0106 C: 0 x0108 C: 0 x0109 C: 0 x010B C: 0 x010D C: 0 x010F C: 0 x0111 C: 0 x0112 C: 0 x0115 C: 0 x0115 C: 0 x0118 C: 0 x011A C: 0 x011B C: 0 x011E C: 0 x011E C: 0 x0120 C: 0 x0121 RAM_CLR: DEC MOV MOV JNZ MOV JNZ MOV INC LJMP READ_P3: MOV MOV INC MOV CONT: MOV ADD SJMP END

5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

18 7600 E8 70FA E590 7006 ABA0 0B 02011E 75D008 ABB0 0B 75D000 E50B 2B 80FE

R0 @R0,#0 x00 A, R0 RAM_CLR A, P1 READ_P3 R3 , P2 R3 CONT PSW,#0 x08 R3 , P3 R3 PSW,#0 x00 A, 0x0B A, R3 $

Listing 5.7: Example assembly code. During the start-up code, the PSW is initialized with 0x00 (see source code lines 3-9), thus, the initial register bank is register bank 0 (RS0=0, RS1=0). In source code line 16, however, the register bank pointer is altered and for lines 16-19 the active register bank is register bank 1 (RS0=0, RS1=1). For the remaining program, register bank 0 remains active. The results of the LVA are listed in Table 5.6. For the sake of illustration, ISRs are not considered within the example to keep things simple and focus on comprehension of the main idea of the analysis. Functions and ISRs require the propagation of local analysis results to call-sites. In consequence, this makes the intermediate steps and results dicult to follow. The results are evaluated in two ways. First, the results with the support of RBA are described. These results are compared to the original analysis without the additional information gathered by RBA. The new analysis successfully generates the desired overapproximation and narrows the analysis results. For example, instead of adding registers R0 {3}, R1 {3}, R2 {3}, R3 {3}, and P 3 to the set of live variables at line 17 (program location 0x118), the RBA reveals that the exact set of live variables at this location is only composed of R0 {3} (working register 3 on bank 0) and P 3. Hence, the number of live variables reduces from 5 down to 2, which is a signicant improvement. Consequently, reducing the number of live variables increases the number of variables that can be marked as dead. The same applies to program location 0x10f where the RBA successfully determines register bank 1 as active. Thus, instead of setting registers R0 {3}, R1 {3}, R2 {3}, R3 {3}, and P 2 live, RBA reduces the set of live registers down to R1 {3} and P 2. Moreover, for the program locations 0x115, 0x118, 0x11a, and 0x11b, the analysis manages to recognize the change of the register bank pointer from 0 to 1, which is conducted by the instruction MOV [PSW, #0x08] in program location 0x115. Evaluating the resulting set of live variables in this example reveals that without the new RBA one is unable to make precise propositions about the register bank pointer

5.2 Implementation Static Analysis for the C51Simulator (0x000) LJMP 0x0100

(0x100) MOV PSW, #0x00

(0x103) MOV R0, #0x80

(0x105) DEC R0

(0x106) MOV R0, #0x00

(0x108) MOV A, R0

(0x109) JNZ 0x0105

(0x10b) MOV A, P1

(0x10d) JNZ 0x0115

(0x115) MOV PSW, #0x08 (0x10f) MOV R3, P2 (0x118) MOV R3, P3 (0x111) INC R3 (0x11a) INC R3 (0x112) LJMP 0x011E (0x11b) MOV PSW #0x00

(0x11e) MOV A, 0x0B

(0x120) ADD A, R3

(0x121) SJMP $

Figure 5.11: The corresponding CFG as generated with [mc]square for the assembly code in Listing 5.7.

5 Static Analysis

conguration. In this case, the highest degree of over-approximation for working registers has to be applied, i.e., all four register bank combinations are added to the set of live variables (see Table 5.6). The RBA, however, signicantly improves the LVA results. The actual contribution to state space reduction is dicult to state due to the strong interdependence of the analyses. It is simple to construct example codes where an enabled RBA leads to signicant state space reductions. On the other hand, examples exist where RBA fails to further shrink the state space. To give an estimation for the example code at hand, the overall state space without static analysis consists of 263,683 states. However, in case static analysis supported by RBA is activated the state space shrinks down to 196,740 states leading to a reduction of appr. 25% for this specic example. Note that the large number of states results from the fact that the application reads three I/O ports. PC 0x000 0x100 0x103 0x105 0x106 0x108 0x109 0x10b 0x10d 0x10f 0x111 0x112 0x115 0x118 0x11a 0x11b 0x11e 0x120 0x121 with RBA R0 {3},R1 {3},P1,P2,P3 R0 {3},R1 {3},P1,P2,P3 R0 {3},R1 {3},P1,P2,P3 R0 {0,3},R1 {3},P1,P2,P3 R0 {0,3},R1 {3},P1,P2,P3 R0 {0,3},R1 {3},P1,P2,P3 R0 {0,3},R1 {3},P1,P2,P3,A R0 {3},R1 {3},P1,P2,P3 R0 {3},R1 {3},P2,P3,A R1 {3},P2 R0 {3},R1 {3} R0 {3},R1 {3} R0 {3},P3 R0 {3},P3 R0 {3},R1 {3} R0 {3},R1 {3} R0 {3},R1 {3},A R0 {3},A without RBA R0 {0,3},R1 {0,3},R2 {0,3},R3 {0,3},P1,P2,P3 R0 {0,3},R1 {0,3},R2 {0,3},R3 {0,3},P1,P2,P3 R0 {0,3},R1 {0,3},R2 {0,3},R3 {0,3},P1,P2,P3 R0 {0,3},R1 {0,3},R2 {0,3},R3 {0,3},P1,P2,P3 R0 {0,3},R1 {0,3},R2 {0,3},R3 {0,3},P1,P2,P3 R0 {0,3},R1 {0,3},R2 {0,3},R3 {0,3},P1,P2,P3 R0 {0,3},R1 {0,3},R2 {0,3},R3 {0,3},P1,P2,P3,A R0 {3},R1 {3},R2 {3},R3 {3},P1,P2,P3 R0 {3},R1 {3},R2 {3},R3 {3},P2,P3,A R0 {3},R1 {3},R2 {3},R3 {3},P2 R0 {3},R1 {3},R2 {3},R3 {3} R0 {3},R1 {3},R2 {3},R3 {3} R0 {3},R1 {3},R2 {3},R3 {3},P3 R0 {3},R1 {3},R2 {3},R3 {3},P3 R0 {3},R1 {3},R2 {3},R3 {3} R0 {3},R1 {3},R2 {3},R3 {3} R0 {3},R1 {3},R2 {3},R3 {3} R0 {3},R1 {3},R2 {3},R3 {3},A

Table 5.6: Comparison of resulting live variables. Summarizing, the described approach of RBA is a powerful contribution to narrow dataow analysis results for the Intel MCS-51 architecture. It can be applied to a variety of programs, showing its precision whenever the compiler makes use of instructions involving register banks. For the Intel MCS-51 microcontroller, the RBA handles all of the 160 (out of 256) instructions that use register banks.

5.2.7 Stack Analysis

Stack overows are a common source of software failures within embedded systems code. In order to detect possible stack corruptions, [mc]square implements a simple check to verify that all those locations pushed onto the stack are later on popped from the stack in

5.2 Implementation Static Analysis for the C51Simulator

the correct order. For the SA [9], the needed adaption for the Intel MCS-51 target were little, thus, not elaborated in this thesis.

5.2.8 Interrupt Flag Analysis

As aforementioned, control-ow behavior is propagated from ISRs to all nodes of the CFG with interrupts enabled. In order to alleviate over-approximation it is particular important to determine the actual value of the IE bit for each location in the CFG. For locations with interrupts disabled there is no need to consider the additional behavior originated from ISRs. Similar to SA, the existing concepts for the IFA [9] t well for the Intel MCS-51 target, thus, only minor extensions were made.

5.2.9 Path Reduction

PR [91, 90] is an abstraction technique that is used to compress single successor chains, i.e., paths of states that have only single successors, into single states. Hence, only the rst and the last state of these chains are stored by this abstraction technique. As usual, this abstraction technique is performed in order to reduce the overall state space. Figure 5.12 illustrates the principle. S1

}
S6 ...

S5 ...

Figure 5.12: The principle of PR. In microcontroller code such single successor chains are, for example, found in ISRs. Although this abstraction contributes greatly to state space reductions, a minor drawback still exists. The validity of the CTL neXt operator is not preserved due to the compression of successor chains into single states. Thus, using path reduction leads to a restriction of applicable CTL statements and may lead to incomplete counterexamples that are dicult to understand. The subset of statements which can be used in combination of PR are called CTL-X, as described by Yorav and Grumberg in [91]. For the process of collapsing multiple successors to a single state, a set of rules for path reduction for the Intel MCS-51 target was established:

5 Static Analysis

1. PR cannot be applied if the given CTL specication makes use of the neXt operator. 2. PR cannot be applied if one of the successors is an ISR. Similar to rule #4. 3. Any state in which a register is written that is part of the CTL specication cannot be collapsed, since the model checking algorithm has to evaluate the value of this register at this state in order to prove or falsify the specication. 4. Any branch in the CFG determines an end point such as S4 in Figure 5.12. Thus, path reduction must preserve the full control ow of the program. The idea of PR is implemented in two components of [mc]square, i.e., the C51Simulator and the Path Compressor. The C51Simulator part is called the dynamic and the Path Compressor part is called the static component of PR. The Path Compressor iterates over the nodes in the CFG and tracks whether the node writes a memory location involved in the CTL formula, thus, covering rule #3. The obtained results are back-annotated into the CFG. Indirect control ow and successors branching to ISRs are detected on-the-y whilst state space building by the C51Simulator, covering rule #2 and rule #4. Finally, rule #1 is checked by the input parser.

5.2.10 Implementation Summary

In this section, a novel data-ow analysis termed Register Bank Analysis was presented to master the architectural feature of register bank swapping for the Intel MCS-51 microcontroller. This new analysis supports static assembly code analysis within [mc]square. In particular, the approach leads to more precise RDA and LVA results, which allows the detection of additional dead variables. Hence, the number of overall system states is reduced during model checking. The eectiveness of this approach was shown by an example. Typical data-ow analyses for high-level languages cannot be applied to assembly code one to one, thus, it is necessary to take architectural peculiarities into account during the analysis to achieve precise results. Analyses such as RDA or LVA have to be adapted to be applicable to assembly code [90]. Especially the concept of Action List Building proved to be a major contributor for the portability of existing static analyses within [mc]square for future microcontroller families.

5.3 Remaining Challenges in Static Analysis of (Intel MCS-51) Assembly Code

Although the introduced RBA is an advantageous concept to limit the over-approximation on architectures featuring register bank swapping, there are still some obstacles to overcome. In what follows, four major challenges are outlined and possible approaches to overcome them are highlighted. These challenges are mostly concerned with indirect control, indirect memory access, and loops.

5.3.1 Indirect Addressing

Data structures such as tables, lists, or arrays are often accessed by changing the address of an operand on-the-y, i.e., during the execution of a program. In high-level programming languages, these addresses are known as pointers. Indirect addressing is a powerful

5.3 Remaining Challenges in Static Analysis of (Intel MCS-51) Assembly Code

addressing mode, which provides exibility for the compiler. For the Intel MCS-51 the working registers R0 or R1 may serve as base registers for indirect addressing. Again, when executing instructions such as MOV [A, @R0], it is the register bank pointer that selects the corresponding base register from one of the four register banks. Thus, for resolving the actual base register the aforementioned RBA is used. Without actually executing the code prior to the instruction, which uses indirect addressing, it is in most cases not trivial to predict the content of the base register. This uncertainty forces our analysis to generate a conservative over-approximation. Consider the assembly snippet depicted in Listing 5.8.
1 2 3 4 5

CLR MOV ADD MOV MOV

A A, 0 x25 A, 0 x26 R0 ,A 0 x44 , @R0

Listing 5.8: Intel MCS-51 assembly snippet. In fact, resolving the destination register is a rather challenging problem for the static analysis. The destination register is the value held by R0 in line 5 in Listing 5.8. To resolve the actual value one has to consider: (i) The actual value of memory locations 0x25 and 0x26 needs to be detected. In order to do so, one has to trace back the operations performed on these memory locations until one can reason about the circumstances under which the actual values are generated. (ii) The exact semantics of the involved instructions is required. Although guessing the eect of certain instructions on the memory content of the microcontroller seems to be obvious for instructions such as CLR and MOV, it is a challenging task for complex instructions such as ADD at least without explicitly executing the instruction, for instance, by a target platform simulator. (iii) Embedded systems communicate actively with their environment, thus, various interrupt sources are likely to interfere with execution of the main process. Special care has to be taken in this case. Interrupt handlers may alter the values of memory locations 0x25, 0x26, or even the value of the working register R0. This situation becomes more challenging on target architectures supporting nested interrupts. (iv) Even though the presented assembly code does not contain any program branching instructions, the remaining program memory may contain direct and indirect jumps targeting any of the program locations stated in Listing 5.8. For instance, a branching instruction may target the program location holding MOV [R0, A] with an entire dierent register conguration compared to the sequential execution of the program fragment.

5.3.2 Indirect Control Flow

A precise CFG is the foundation of any kind of data-ow analysis. Building a complete CFG is challenging in presence of indirect control. A precise CFG requires all control branches in the input program to be mapped into the resulting CFG. Due to information that is not

5 Static Analysis

statically computable such as target addresses of indirect branches the analysis framework is forced to generate a conservative over-approximation. For example, the unconditional indirect jump statement JMP [@A+DPTR] would add edges to all possible program locations reachable by the JMP [@A+DPTR] instruction. Indirect branches to dynamically calculated targets are fragments commonly used by the compiler in order to generate optimized code. As a matter of fact, in the embedded systems domain highly optimizing compilers are used due to prevailing resource constraints. An interesting aspect when dealing with indirect control ow is the fact that a target addresses of branch instructions can origin from either (i) the environment or (ii) from lookup tables stored in the program memory. The latter is the more common one, since reading branch target addresses from the environment is rarely found in real life applications.
1 2 3 4 5 6 7 8 9 10 11 12 13 14

void main ( ) { ... switch ( var ) { case 0xAA : case 0xBB : case 0xCC : case 0xDD: case 0xEE : case 0xFF : case 0xC0 : default : } while ( 1 ) ; }

foo1 foo2 foo3 foo4 foo5 foo6 foo7

(); (); (); (); (); (); ();

break ; break ; break ; break ; break ; break ; break ; break ;

Listing 5.9: C source code containing switch statement. Lookup tables, among others, are used by the compiler to realize switch-case statements as shown in Listing 5.9. The program in Listing 5.10 shows the resulting assembler code generated by the Keil C51 Compiler v8.01, without any optimizations enabled. The assembler routine C?CCASE is called from the main method to achieve the required behavior needed for the switch statement (cf. Listing 5.9, line 14). The call of the subroutine (indirectly) pushes the current program counter value (0x0808) on the stack. Thereafter, the new data pointer value of 0x0808 is loaded from the stack by two consecutive POP statements. Next, the Accumulator holding the value of variable var1 is loaded into working register R0 and the Accumulator is cleared afterwards. Line 7 fetches a byte from program memory at address C:0x0808. Listing 5.11 presents a memory dump of the particular program memory section, revealing that the instruction in line 7 actually reads the byte 0x8. A conditional jump to address C:0x0860 is executed. Instructions at source lines 22 and 23 loading the comparison value 0xAA for the rst case branch (cf. Listing 5.9, line 14). The comparison value residing in R0 and the Accumulator are XORed, thus, carrying out a compare of the two values. In case the two values are equal, the Accumulator is set to zero after the comparison. For the rst run the comparison value (0xAA) does not match the value of variable var1 (0xC0), therefore, program ow reaches lines 26 to 29, incrementing the data pointer by three bytes. The aforementioned sequence is repeated for the comparison values 0xBB and 0xC0, respectively. Both do not match the value of variable var1. Next, the program code is

5.3 Remaining Challenges in Static Analysis of (Intel MCS-51) Assembly Code

executed with the comparison value of 0xC0 that matches the actual value of variable var1. The comparison of the two values in line 24 evaluates to a cleared accumulator in line 25, thus, forcing a jump to program location C:0x0855 (cf. Listing 5.10, line 14). Lines 14 to 17 are loading the address of the corresponding function C:0x082B and the following two instructions reset the data pointer (DPTR). Finally, the indirect jump in line 21 branches to the selected function void foo3(void), which resides at program address C:0x082B (cf. Listing 5.12, line 3).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

C: 0 x0805 C?CCASE: C: 0 x0845 C: 0 x0847 C: 0 x0849 C: 0 x084A C: 0 x084B C: 0 x084C C: 0 x084E C: 0 x0850 C: 0 x0851 C: 0 x0853 C: 0 x0854 C: 0 x0855 C: 0 x0856 C: 0 x0857 C: 0 x0859 C: 0 x085A C: 0 x085C C: 0 x085E C: 0 x085F C: 0 x0860 C: 0 x0862 C: 0 x0863 C: 0 x0864 C: 0 x0866 C: 0 x0867 C: 0 x0868 C: 0 x0869

120845 D083 D082 F8 E4 93 7012 7401 93 700D A3 A3 93 F8 7401 93 F582 8883 E4 73 7402 93 68 60EF A3 A3 A3 80DF

LCALL POP POP MOV CLR MOVC JNZ MOV MOVC JNZ INC INC MOVC MOV MOV MOVC MOV MOV CLR JMP MOV MOVC XRL JZ INC INC INC SJMP

C?CCASE(C: 0 8 4 5 ) DPH( 0 x83 ) DPL( 0 x82 ) R0 ,A A A,@A+DPTR C: 0 8 6 0 A,#0 x01 A,@A+DPTR C: 0 8 6 0 DPTR DPTR A,@A+DPTR R0 ,A A,#0 x01 A,@A+DPTR DPL( 0 x82 ) ,A DPH( 0 x83 ) , R0 A @A+DPTR A,#0 x02 A,@A+DPTR A, R0 C: 0 8 5 5 DPTR DPTR DPTR C: 0 8 4A

Listing 5.10: Switch Statement Assembler code snippet. The corresponding lookup table assembled by the compiler contains the comparison values AA, BB, C0, CC, DD, EE, and FF (see Listing 5.11) for the case statements as well as the entry addresses of the called functions (see Listing 5.11 and Listing 5.12).
1 2 3 4

C: 0 x0808 C: 0 x0810 C: 0 x0818 C: 0 x0820

08 C0 35 42

21 08 EE xx

AA 2B 08 xx

08 CC 3A xx

26 08 FF xx

BB 30 00 xx

08 DD 00 xx

3F 08 08 xx

Listing 5.11: Program memory content. Again, at least for this particular case, it seems feasible to reason about actual target locations by searching for the pattern of comparison values in the program memory. Such naive approaches, however, may work for a very limited number of congurations, but they

5 Static Analysis

heavily depend on the compiler version, optimization levels, etc. Most important, they are only applicable for a certain target architecture. Consequently, a rather holistic approach needs to be found to overcome the problem of generating a precise CFG for programs containing indirect control ow.
1 2 3 4 5 6 7 8

C: 0 x0821 C: 0 x0826 C: 0 x082B C: 0 x0830 C: 0 x0835 C: 0 x083A C: 0 x083F C: 0 x0842

void f o o 1 ( void ) ; void f o o 2 ( void ) ; void f o o 3 ( void ) ; void f o o 4 ( void ) ; void f o o 5 ( void ) ; void f o o 6 ( void ) ; void f o o 7 ( void ) ; while ( 1 ) ;

Listing 5.12: Entry addresses for called functions.

5.3.3 Self-Modifying Code

Self-modifying code is error-prone as well as dicult to read, understand, test, and maintain and hard to port to dierent target microcontrollers. In general, generating self-modifying code is not supported by compilers, thus, it is a design pattern introduced by sloppy application engineering. Fortunately, it is rarely seen in real life applications and widely considered as bad programming style. As [mc]square aims toward an universal static analysis framework for assembly code, we have to consider the possibility of self-modifying code. Determining the exact dependencies and behavior of self-modifying code is a challenging task. The Intel MCS-51 target realizes a traditional Harvard architecture with program and data memory strictly separated. Hence, the problem of self-modifying code can be abandoned by architectural considerations. Nevertheless, modied Harvard architectures and von Neumann based targets are open for self-modifying code. Fortunately, those architectures targeted by [mc]square use dedicated instructions to actively alter the program memory content at runtime. For the ATMEL ATmega16 target, modifying program code is only possible by a single instruction, namely SPM (store to program memory). Consequently, a rather straightforward approach is applied to deal with self-modifying code. Whenever these instructions are detected during CFG building, the data-ow analyses are aborted since self-modifying code will show a behavior that cannot be analyzed statically. Thus, in the presented approach, the analyses are limited to constant, non self-modifying code. Other approaches exist, for instance, Anckaert et al. [93] introduced state-enhanced CFGs as a new program representation in presence of self-modifying code.

5.3.4 Loop Bounds

A precise data-ow analysis requires upper bounds of loop iterations. Respective approaches are referred as loop bound analysis in literature. Existing approaches aim at determining loop bounds widely automatic, but approximating loop bounds for all conceivable loop constructs such as non constant increment and decrement of counter variables or multiple nested loops depending on each other remains a challenge. Some of the existing tools [94, 95] avoid these pitfalls by requiring the user to annotate certain program

5.3 Remaining Challenges in Static Analysis of (Intel MCS-51) Assembly Code

locations prior to the analysis, or by directly specifying upper loop bounds for constructs that cannot be analyzed automatically. Considering the enormous amount of research already performed on detecting loop bonds, we are eager to reuse the existing knowledge. For our particular purpose, we are interested in the following cases: (i) For a precise pointer analysis, a narrow approximation of loop bounds is required. This necessitates to take the eects of complete sequences of instructions into account as the conditions of branching instructions are frequently computed in a sequence of instructions. (ii) Other techniques such as program slicing [96, 77] require the detection of loop termination. During program slicing for model checking, instructions that have no inuence on the validity of a specication are removed from the program. Divergent behavior, i.e., non-termination, of the original program must remain visible in the sliced program. Hence, loops for which termination cannot be proven cannot be sliced. Moreover, statements that inuence loop conditions cannot be sliced, which strongly aects sizes of program slices. In summary, a combination of detecting loop bounds and loop termination is required in order to compute precise results for slicing.

5.3.5 Summary
Even though application tailored analysis methods, such as RBA, allow a signicant narrowing of data-ow analyses, there are still diculties in static analysis for assembly code to overcome. A scalable and precise pointer analysis would be a major boost for analysis precision within [mc]square. In consequence, future research incentives are required to develop a widely generic approach for resolving indirect read and write accesses on assembly code level. One requirement for such an approach is that only minor modications to the analysis are required to take peculiarities of dierent target architectures into account. Such a framework could as well be used for predicting actual target addresses on the assembly code level in order to accomplish CFG building of programs featuring indirect control.

5 Static Analysis

6 Real Life Case Study

Program testing can be used to show the presence of bugs, but never to show their absence. We . . . take the position that it is not only the programmers task to produce a correct program but also to demonstrate its correctness in a convincing manner. (Edsger W. Dijkstra)

In what follows, a real life industry case study is conducted with the [mc]square model checker. The case study focuses on (i) assessing the feasibility of [mc]square when applied to real life embedded applications, (ii) evaluating the eects of the implemented abstraction techniques on the resulting state space size, and (iii) identifying future research directions. First, an introduction and a motivation for the case study is given. Next, hardware and software components of the application are presented. Later on, the used communication protocol is sketched. Then, temporal logic properties are postulated that correspond to the given textual specication. Finally, results and ndings are presented.

6.1 Introduction and Motivation

In the following, a real life embedded systems application is introduced and its software is model checked with [mc]square. The model checking process is supported by the previously described abstraction and static analysis techniques for the Intel MCS-51 target. We use this case study to evaluate strengths and weaknesses of our approach. Most of the previously conducted case studies (cf. [49, 12, 13]) were of smaller code size or less complexity. On the other hand, the results of the case study are used to assess the individual contribution to state space reduction of the aforementioned abstraction techniques. Furthermore, we expect the case study as a signicant indicator to dene further research incentives in order to make [mc]square a mature embedded software verication tool that can be used by embedded software designers within their day-to-day software engineering routine. Besides all the technical concepts implemented in [mc]square, in our believe, it is of upmost importance that a certain degree of usability is preserved throughout the development process of a verication tool. A few existing tools especially those with strong academic background are, from the technical point of view, highly professional but from the user point of view quite hard and challenging to operate and manage. Not surprisingly, the long term vision of [mc]square is a fully automatic, push-button verication tool. The analyzed source code is provided by Texion Software Solutions, a company located

6 Real Life Case Study

in Aachen, Germany. Texion Software Solutions core business is the development of individual, application tailored embedded hardware and software solutions. Their product portfolio includes the software ProFab, a software system for production data acquisition. Their customers use the software system for networking of textile knitting machines. A textile knitting machine produces various types of knitted fabrics of varying degrees of complexity. Modern knitting machines usually contain highly complex electronics controlling the needles and the yarn. Figure 6.1(a) shows such a machine. As in every industrial application software reliability is a major concern. Thus, formal verication of the knitting machines software is worthwhile, since failures caused by software faults are costly in terms of production losses and the associated additional maintenance eort. The target application of the case study is the software for a knitting machine monitoring device, as shown in Fig. 6.1(b). The source code for the knitting machine monitoring device was selected for the case study based on the following considerations: Good conformance of the application with the applications we are aiming at. The application targets the Intel MCS-51 microcontroller, uses several on-chip peripheral modules, interacts with its environment, and makes use of various interrupt sources. Commonality of interests with the developers and their willingness to cooperate. Our industry partner supported us by providing the full source code and a sample device. Furthermore, we can access all accompanying documents such as the software specication and the hardware schematics. Criticality and the need of high reliability. As aforementioned, awless software is crucial, since every malfunction is costly in industrial practice. Complexity. The number of source code lines is within our reach, i.e., from the conceptional point of view, [mc]square is able to handle applications of this size. It should be noted, that the source code line count is a rather unsuited indicator, whether an application can be successfully model checked or the model checker will run out of resources (state-explosion problem) whilst examining the code. It is almost solely the source code complexity that is crucial.

6.2 The Knitting Machine Monitoring Device Hardware Overview

The knitting machine monitoring device is composed out of the following hardware modules (cf. Figure 6.2): Knitting machine is the machine that is observed and monitored by the knitting machine monitoring device. Input module connects the knitting machine monitoring device with the knitting machine. It features eight input lines. Each input is decoupled and an inverting Schmitt trigger1 acts as a pulse shaper. The input module connects the input lines with the
1

A Schmitt trigger is a comparator circuit that incorporates positive feedback. When the input is higher than a certain threshold, the output is high. When the input is below another (lower) threshold, the output is low. When the input is between the two, the output retains its value [97].

6.2 The Knitting Machine Monitoring Device Hardware Overview

77
(b) Knitting machine monitoring device.

(a) A modern knitting machine (image property of Texion Software Solutions).

Figure 6.1: The target application.

6 Real Life Case Study

corresponding I/O pins of the microcontroller (four pins of Port 3 and four pins of Port 1). Microcontroller executes the software subject to verication. Serial Interface provides the physical link to the host application trough a RS232 interface. Host application uses the data gathered by the monitoring device for further processing. Miscellaneous (not depicted in Figure 6.2) Watchdog module is a hardware timing module that triggers the reset input of the microcontroller due to a faulty condition. The fault condition is reached if the watchdog hardware timer overows. The timer overow can be avoided if the microcontroller application resets the watchdog module periodically. Power and clock generation provides an inverse-polarity protection and a 5 V ltered and stabilized power supply. Furthermore, this module contains a quartzcontrolled clock generation. Light Emitting Diode (LED) module operates three LEDs, signalizing serial communication trac and the liveness of the application. Potential separation uses a photo-coupler for electrical isolation and performs the needed voltage level adjustment.
Knitting machine Knitting machine monitoring device Microcontroller 87C51 12 MHz Watchdog Serial interface Input module Host application

Power supply

LED module

Figure 6.2: The knitting machine monitoring device.

6.3 The Knitting Machine Monitoring Device Software Overview

The application is of small/medium size and is designed following the foreground/background design pattern (cf. [98, 99]). The system consists of a super-loop, i.e., an innite loop that calls individual modules (functions) to perform the desired operations (the background part). Asynchronous events (the foreground part) are handled through ISRs. Thus, time critical operations are performed by the ISRs to ensure that they are dealt within the given timing constraints. Timing correctness is met by interrupting the background part of the software at predened points in time, e.g., when a timer expires or a character is received over the serial interface. The used foreground/background design pattern is sketched in Figure 6.3.

6.3 The Knitting Machine Monitoring Device Software Overview

Background
void main (void){ InitBoard(); SendTxt(STX,'R',ETX); SetTime(); while(1){ Watchdog();

Foreground

External ISR(); Timer ISR();

Circ. Bu.

Loop

Liveness(); UpdateInputs(); EvaluateRPM(); } RSM(readCmd());

Serial ISR();

Figure 6.3: The foreground/background design pattern.

6.3.1 The Main Bulding Blocks

RPM module external ISR pulse counting

Timer module timer ISR time management

State machine communication controller receiver state machine

Serial interface serial ISR receive and transmit characters

In- Output module read environment reset watchdog liveness LED

Figure 6.4: The software components. From the conceptional point of view, the source code can be divided into ve building blocks (cf. Figure 6.4): Revolutions Per Minute (RPM) module manages two external interrupts to count pulses from external rotary encoders. It initializes the two interrupt sources and denes their interrupt priority. The pulse count is internally mapped to a 16 bit wide unsigned data type. Timer module uses the Timer 0 peripheral module of the microcontroller to provide a system tick and four software timers. Furthermore, it provides trivial functions for time management like reset(), set(), and get_time(). State machine implements the serial communication protocol and performs the needed housekeeping. Serial interface module initializes the serial communication device of the microcontroller to 9600 Baud and uses dedicated circular buers for managing receive and transmit queues. It provides methods for sending and receiving characters. In- Output module reads the eight input ports for the monitoring function and handles the watchdog reset. Furthermore, it toggles the liveness LED.

6 Real Life Case Study

The full source code of the case study consists of about 600 lines of C-code (i.e., 1400 lines of assembly code).

6.3.2 Serial Receive and Transmit Ringbuer

The application uses software circular buers to compensate the lack of a hardware First In First Out (FIFO) memory. The circular buer manages buering of characters received from and sent to the serial port. A circular buer is a common data structure that uses a single, xed-size buer as if it is connected end-to-end (cf. Figure 6.5).

occupied

write pointer

read pointer free

Figure 6.5: A software circular buer model. The read pointer indicates the element that is read next and the write pointer determines the location, which will be lled with the next character. Altogether, the case study uses two dedicated circular buer structures, i.e., one for receiving and one for sending characters. The C code macros and the initialization calls are given in Listing 6.1.
1 2 3 4 5 6 7 8 9 10 11 12 / Header macro / #d e f i n e R i n g B u f f e r ( Name , DataType , IndexType , Exp , A t t r i b u t e ) \ s t r u c t {\ IndexType ReadIndex ; \ IndexType W r i t e I n d e x ; \ IndexType Mask ; \ DataType B u f f e r [1<<Exp ] ; \ } A t t r i b u t e Name = { 0 , 0 , ( 1 << ( Exp ) ) 1} / Ring b u f f e r i n i t i a l i z a t i o n / R i n g B u f f e r ( R x B u f f e r , c h a r , word , 2 , R i n g B u f f e r ( T x B u f f e r , c h a r , word , 2 ,

); );

/ 4 charRingBuffer f o r / 4 charRingBuffer f o r

r e c e i v e r / t r a n s m i t t e r /

Listing 6.1: Ringbuer C code macro. Considering the initialization code in Listing 6.1, it is easily seen that four byte-wide buers are used. According to Table 6.1, a single RingBuer element consists of 10 bytes altogether. As [mc]square reads and parses relevant debug info, it allows C-code variable names to be included into the temporal specication, i.e., CTL formulas. Thus, the column Formula name in Table 6.1 refers to the actual expression that is used within the CTL formulas.

6.4 Extracting CTL Properties Out of the Textual Specification

Element IndexType ReadIndex IndexType WriteIndex IndexType Mask DataType Buer[1Exp] Sum

C code type word word word char RingBuer

Length [byte] 2 2 2 4 10

Formula name Tx|RxBuer_{0, 1} Tx|RxBuer_{2, 3} Tx|RxBuer_{4, 5} Tx|RxBuer_{6...9} Tx|RxBuer_{0...9}

Table 6.1: Ringbuer elements and their size.

6.3.3 The Communication Protocol

The communication between the knitting machine monitoring device and the host application follows a well dened protocol. It is a straightforward master-slave approach where the host application operates as master. Thus, every communication is initiated by the host application, with one exception: the knitting machine monitoring device sends a status message to the master after powerup. The communication protocol does not include any data integrity checks such as checksums. Figure 6.6 shows the corresponding communication sequence chart and Table 6.3.3 states the specied commands and the expected reply. RPM1 denotes the rst byte of the Revolutions variable, RPM2 refers to the second byte, respectively. The same applies to CNT1 , CNT2 , VER1 , VER2 , and VER3 .

6.4 Extracting CTL Properties Out of the Textual Specication

One of the most crucial steps in model checking, as in any formal verication method, is the process of creating a formal specication (as interpreted by the model checker, e.g., CTL) that relate to a given textual specication. Again, it is important to realize that any formal verication is only as good as the stated claims. The remainder of this section reveals that nding a formal CTL counterpart for a textual representation of the systems behavior is non-trivial and sometimes challenging. The precise meaning of the used variables and symbols within the properties is given in Table 6.3.

6.4.1 The Given Textual Specication

In our case, an initial specication is part of the project. The initial specication is given in German. For the sake of clarity, it was translated into English rst. This was done by the best of the authors knowledge and special care was taken to preserve the original meaning of the specication. Strictly speaking this might already introduce some kind of inconsistency and misinterpretation. As the given specication is rather informal and in a textual form, we had to identify relevant properties rst and translate them to CTL.

6.4.2 CTL Properties

In the following, CTL properties are presented that originate from the given textual specication. These properties will be model checked by [mc]square later on. Each property is given in the form of:

6 Real Life Case Study

Master

Slave

$ D #

$ D RPM1 RPM2 #

$ R #

$ Z #

$ Z CNT1 CNT2 #

$ E #

$ E INP #

$ V #

$ V VER1 VER2 VER3 #

Figure 6.6: Communication sequence chart.

# Bytes # # # # # 5 4 6 3 5 D R Z E V # 3 $ V # 3 $ E # 3 $ Z # 3 $ R # 3 $ D

Master Command Bytes

Comment

6.4 Extracting CTL Properties Out of the Textual Specification

Slave Response [RPM1 RPM2 ] revolutions per minute [2 bytes] system reset [0 bytes] [CNT1 CNT2 ] counter value [2 bytes] [INP] input representation [1 byte] [VER1 VER2 VER3 ] version string [3 bytes]

returns the current revolutions per minute resets the device returns the current pulse counter value returns the current input representation returns the software version number as string

Table 6.2: The master-slave communication protocol.

Variable Revolutions RxBuer_i RxBuer_1 RxBuer_4 TxBuer_j TxBuer_1 TxBuer_4 Command_state startUpCodeFinished mark
6 Real Life Case Study

Variables used by the target application (excerpt) Scope Initial Length Comment global 0xffff 2 byte Holds the current RPM global 0x00 1 byte ith byte of receive buer memory area global 0x00 1 byte The receive circular buer read pointer global 0x00 1 byte The receive circular buer write pointer global 0x00 1 byte j th byte of transmit buer memory area global 0x00 1 byte The transmit circular buer read pointer global 0x00 1 byte The transmit circular buer write pointer local 0x00 1 byte Holds the actual state of the state machine Supplementary variables inserted for model checking global 0x00 1 byte Set to 1 when main() is entered global 0x00 1 byte Serves as marker when entering certain PC locations Table 6.3: Case study variables and their meaning.

6.4 Extracting CTL Properties Out of the Textual Specification

Property #Explanation Textual representation As stated in the textual specication of the knitting machine monitoring device application. CTL representation ( [mc]square notation) The postulated CTL formula. The formula is given in the exact same notation as it is used as input to [mc]square. Comment Additional information and explanation of the CTL formula.

Property #1 Textual representation The variable Revolutions is initialized to 0xffff. CTL representation ( [mc]square notation) (AG (startUpCodeFinished =0 & Revolutions=0x0000 A Revolutions=0x0000 U Revolutions=0xff00) & AG (startUpCodeFinished =0 & Revolutions=0xff00 A Revolutions=0xff00 U Revolutions=0xffff) & EF Revolutions=0x0000 & EF Revolutions=0xff00) Comment If variable Revolutions is 0x0000, then it remains 0x0000 until it becomes 0xff00. If variable Revolutions is 0xff00, then it remains 0xff00 until it becomes 0xffff. There is a path where variable Revolutions is 0x0000 and there is a path where Revolutions is 0xff00. The initialization must be completed before the startup code is left.

Property #1a Textual representation The variable Revolutions is initialized to 0xffff. CTL representation ( [mc]square notation) AF Revolutions=0xffff & startUpCodeFinished =1 Comment On all paths the variable Revolutions is of value 0xffff when the startup code is left.

Property #1b Textual representation The variable Revolutions is initialized to 0xffff. CTL representation ( [mc]square notation) AF Revolutions=0xffff Comment On all paths there is a state where variable Revolutions is set to 0xffff.

6 Real Life Case Study

Property #2 Textual representation Initialization of the receive circular buer. CTL representation ( [mc]square notation) AF(RxBuer_0 =0x00 & RxBuer_1 =0x00 & RxBuer_2 =0x00 & RxBuer_3 =0x00 & RxBuer_4 =0x00 & RxBuer_5 =0x03 & RxBuer_6 =0x00 & RxBuer_7 =0x00 & RxBuer_8 =0x00 & RxBuer_9 =0x00 & startUpCodeFinished =1) Comment On all paths within the startup code there is nally a state where all bytes of RxBuer_i where i = {0 . . . 9} are initialized to 0x00, except RxBuer_5 which is initialized to 0x03, since it acts as the circular buer mask. See Listing 6.1 for details.

Property #3 Textual representation Initialization of the transmit circular buer. CTL representation ( [mc]square notation) AF(TxBuer_0 =0x00 & TxBuer_1 =0x00 & TxBuer_2 =0x00 & TxBuer_3 =0x00 & TxBuer_4 =0x00 & TxBuer_5 =0x03 & TxBuer_6 =0x00 & TxBuer_7 =0x00 & TxBuer_8 =0x00 & TxBuer_9 =0x00 & startUpCodeFinished =1) Comment On all paths within the startup code there is nally a state where all bytes of TxBuer_j where j = {0 . . . 9} are initialized to 0x00, except TxBuer_5 which is initialized to 0x03, since it acts as the circular buer mask. See Listing 6.1 for details.

Property #4 Textual representation It is possible to reach the sleep state of the application where the application idles in an endless loop. CTL representation ( [mc]square notation) EF mark =MARK_SLEEP Comment There is a state where the application reaches the sleep state. Note that, this formula can also be expressed by involving the PC into the property, such as AG (EF PC =0xc0ffee). For the sake of clarity, however, the variable mark is introduced to allow self-explanatory CTL expressions.

6.4 Extracting CTL Properties Out of the Textual Specification

Property #5 Textual representation It is possible to reach the send version state of the application where the application sends its version string to the host application. CTL representation ( [mc]square notation) EF mark =MARK_SENDVERSION Comment There is a state where the application reaches the send version state.

Property #6 Textual representation It is possible to reach the send inputs state of the application where the application sends the actual value of the digital input lines to the host application. CTL representation ( [mc]square notation) EF mark =MARK_SENDINPUTS Comment There is a state where the application reaches the send inputs state.

Property #7 Textual representation It is possible to reach the send pulse count state of the application where the application sends the actual value of the pulse counter to the host application. CTL representation ( [mc]square notation) EF mark =MARK_SENDPULSCNT Comment There is a state where the application reaches the send pulse count state.

Property #8 Textual representation It is possible to reach the send RPM state of the application where the application sends the actual RPM value to the host application. CTL representation ( [mc]square notation) EF mark =MARK_SENDRPM Comment There is a state where the application reaches the send RPM state.

6 Real Life Case Study

Property #9 Textual representation The default path of the receiver state machine in function readCommand() (cf. Listing 6.2) is executed at least once. CTL representation ( [mc]square notation) EF mark =MARK_DEFAULT Comment There is a state where the default path of the switch statement in function readCommand() is executed.

Property #10 Textual representation The receiver state machine may only reside in states 0, 1, or 2. All other states are invalid. CTL representation ( [mc]square notation) Inv:(Command_state=0 | Command_state=1 |Command_state=2) Comment On all paths the actual state of the state machine is either 0, 1, or 2. The term Inv stands for invariant model checking. An equivalent expression of this formula is AG (Command_state=0 | Command_state=1 | Command_state=2)

Property #11 Textual representation Changes in the receiver state machine can only follow the following patterns: 0 1 2 0 or 0 1 0. All other transitions are invalid. CTL representation ( [mc]square notation) (AG (Command_state=0 A Command_state=0 U Command_state=1 | Command_state=0) & AG (Command_state=1 A Command_state=1 U Command_state=0 | Command_state=2 | Command_state=1) & AG (Command_state=2 A Command_state=2 U Command_state=0 | Command_state=2) & AF Command_state=0) Comment If Cmd_state = 0, Cmd_state remains 0 until it changes to 1, if Cmd_state = 1 then Cmd_state remains 1 until it changes to 0 or 1, if Cmd_state = 2, Cmd_state remains 2 until it changes to 0. There is a path where Cmd_state initially becomes 0 and the receiver state machine may always remain in its current state.

6.4 Extracting CTL Properties Out of the Textual Specification

Property #12 Textual representation The serial receive circular buer read and write pointer may never exceed the circular buer bounds. CTL representation ( [mc]square notation) AG (RxBuer_1 <4 & RxBuer_0 =0 & RxBuer_3 <4 & RxBuer_4 =0) Comment RxBuer_1 (the read pointer low byte) and RxBuer_3 (the write pointer low byte) are on all paths lower than the circular buer bound, i.e., 4 bytes. Moreover, the high byte of the read and write pointer (RxBuer_0 and RxBuer_4 ) remain 0.

Property #13 Textual representation The serial transmit circular buer read and write pointer may never exceed the circular buer bounds. CTL representation ( [mc]square notation) AG (TxBuer_1 <4 & TxBuer_0 =0 & TxBuer_3 <4 & TxBuer_4 =0) Comment TxBuer_1 (the read pointer low byte) and TxBuer_3 (the write pointer low byte) are on all paths lower than the circular buer bound, i.e., 4 bytes. Moreover, the high byte of the read and write pointer (TxBuer_0 and TxBuer_4 ) remain 0.

Property #14 Textual representation The microcontroller application sends $ R # to the host application after power-up. CTL representation ( [mc]square notation) EF (TxBuer_6 =$ & TxBuer_7 =R & TxBuer_8 =# & TxBuer_9 =0) Comment There is a path where the transmit circular buer is lled with the sequence $ R #.

6.4.3 Comments
It is notable, that property #1 reveals one of the major strengths of our assembly code model checking approach. As the verication process is based on machine instructions, it is even possible to verify the exact initialization sequence of the 16 bit wide variable Revolutions. Property #1 requires that the high byte (located on the higher address) is initialized rst and then the low byte is initialized, as it is the usual way on little-endian processor architectures. In contrast, a model checker targeting C code is most times not able to make assumptions on byte/memory location granularity due to the missing details about the target platform. Property # 1a is a property suited for C code model checkers. However, as the property EF Revolutions=0xffff & startUpCodeFinished =1 only veries that variable Revolutions will eventually reach the value 0xffff within the startup sequence, it excludes the details on how the initialization of Revolutions is accomplished.

6 Real Life Case Study

For example, it is possible that the variable Revolutions is for various reasons rst set to 0xfc0f and later on set to 0xffff. As a result, property #2 will evaluate to true, since the initialization sequence is not suciently specied. However, as [mc]square allows CTL properties to include single memory locations, property #1 will evaluate to false showing the erroneous behavior within the startup code as counterexample. The same applies to properties #12 and #13.

6.4.4 Reviewing Properties #4a to #8a

Properties #4 to #8 claim for a single path where the application will nally reach the specied state, e.g., EF mark =MARK_SLEEP. We might tighten this property in a way that, globally, from every state in the program, it must be possible to nally reach the sleep state. Properties of this form, are termed resetability in literature [28, 29]. As a result, we extend the properties #4 to #8:
Property #4a Textual representation It is always possible to reach the sleep state of the application where the application idles in an endless loop. CTL representation ( [mc]square notation) AG(EF mark =MARK_SLEEP) Comment On every path the application nally reaches the sleep state.

Property #5a Textual representation It is always possible to reach the send version state of the application where the application sends its version string to the host application. CTL representation ( [mc]square notation) AG(EF mark =MARK_SENDVERSION) Comment On every path the application nally reaches the send version state.

Property #6a Textual representation It is always possible to reach the send inputs state of the application where the application sends the actual value of the digital input lines to the host application. CTL representation ( [mc]square notation) AG(EF mark =MARK_SENDINPUTS) Comment On every path the application nally reaches the send inputs state.

6.4 Extracting CTL Properties Out of the Textual Specification

Property #7a Textual representation It is always possible to reach the send pulse count state of the application where the application sends the actual value of the pulse counter to the host application. CTL representation ( [mc]square notation) AG(EF mark =MARK_SENDPULSCNT) Comment On every path the application nally reaches the send pulse count state.

Property #8a Textual representation It is always possible to reach the send RPM state of the application where the application sends the actual RPM value to the host application. CTL representation ( [mc]square notation) AG(EF mark =MARK_SENDRPM) Comment On every path the application nally reaches the send RPM state.

6.4.5 Communication Protocol Verication

Setting up valid CTL formulas for properties such as In case $ Z # is received the microcontroller must answer with $ CNT1 CNT2 # is a rather challenging task, due to (i) fairness issues (see Section 6.4.5 for details) and (ii) the prevalent circular buer implementation of the application. However, consider the following (humble) specication:
Property #invalid Textual representation After receiving $ Z #, answer with $ CNT1 CNT2 #. CTL representation ( [mc]square notation) AG (RxBuer_6 =0x24 & RxBuer_7 =0x5A & RxBuer_8 =0x23) AF (TxBuer_5 =0x24 & TxBuer_6 =CNT1 & TxBuer_7 =CNT2 & TxBuer_8 =0x23) Comment Whenever the receive circular buer is lled with $ Z # there is always a path where the transmit circular buer nally carries the values $ CNT1 CNT2 #.

The above stated property is invalid and incomplete due to several reasons: The property does not consider the value of the read and write pointers of the circular buer. What if the read pointers Tx|RxBuer_{0, 1} do not point to the rst element in the circular buer? What if the transmit write pointer TxBuer_{2, 3} does not point to the rst element in the circular buer?

6 Real Life Case Study

What if the model checker encounters a path where the serial interrupt is never red? The property does not consider the value of the circular buer mask, i.e., Tx|RxBuer_{4, 5}. The property does not consider the value of the fourth byte of the receive buer, i.e., RxBuer_9. How to make sure that, after once decoding $ Z #, the receive circular buer is not altered anymore? Once the receive circular buer is lled with $ Z #, how to make sure that the application does read the circular buer content in a sequential way? What if, i.e., the read pointer is incremented twice, or it is not incremented at all? It is obvious that creating valid CTL expressions for the communication protocol verication at least without additional knowledge of software internals is quite challenging. It might be possible for some corner cases, however, the resulting formulas are of unhandy length and complex to understand in their full details. By all means, there is no way to express fairness in plain CTL model checking. The Unfair Path Recapitulating, [mc]square abstracts from time, i.e., can be categorized as timeless, pure CTL model checker (cf. Section 3.5). Consider the property AG (AF mark =MARK_SENDRPM), which claims that on every path it must always be possible to nally reach the desired mark, e.g., mark =MARK_SENDRPM. As the target application uses interrupts and the Intel MCS-51 allows interrupt nesting, it might be possible that the model checker may get stuck within an interrupt loop, where immediately after an interrupt is executed, the ISR is re-entered again. Such an interrupt loop is depicted in Figure 6.7 by the path {I1, I2, I3, I4, I1, I2, ...}, which we call an unfair path. The model checker now may nd a counterexample where the property AG (AF mark =MARK_SENDRPM) is disproved due to this unfair path by getting stuck inside the interrupt loop. Clearly, such unfair paths are very unlikely when executing the code on the actual target hardware, albeit theoretical possible. As a matter of fact, the serial interrupt may only occur at multiple time instances of the selected serial baud rate, and thus, is very unlikely to produce unfair paths. As [mc]square follows a timeless model checking approach, we cannot use timing constraints to eliminate this behavior. In fact, we have to overcome the lack of fairness in CTL in order to obtain meaningful results for the present case study. The Lack of Fairness in CTL When using formal verication tools, one is often only interested in proving a property over fair paths. Thus, certain paths that are considered to be unrealistic for the actual target hardware, in our case the Intel MCS-51 microcontroller, need to be ruled out. In literature [29, 40] an unfair computation is described as an unreasonable computation that ignores certain transition alternatives forever and all the others are described as fair. In order to express fairness, fairness constraints [29] are used that operate on a path level and

6.4 Extracting CTL Properties Out of the Textual Specification

A4 mark =MARK_SENDRPM

enter ISR

re-enter ISR B1 B2 B3 B4 mark =MARK_SENDRPM Figure 6.7: The unfair path.

replace the standard meaning for all paths with for all fair paths and there exists a path with there exists a fair path. Unfortunately, such fairness properties cannot be expressed directly in CTL [100, 36, 101] but can be expressed in CTL*. In contrast, fairness assumptions can be easily added as a premise to an LTL formula. In LTL a fairness assumption can be stated in the form of (fairness)(property), e.g., (GF enabled)(GF occurs). However, Clarke et al. [28] show how a Kripke structure2 can be enriched by fairness constraints in order to enable fairness in CTL. In their approach, a fair path must contain an element of each fairness constraint innitely often. A path is fair if each constraint is true innitely often along the path. Consequently, they restrict path quantiers in the logic to those fair paths. We use, from the conceptional point of view, a similar approach of introducing fairness into CTL model checking with [mc]square. In the following, we present the introduction of fairness through a model of the microcontroller environment. Introducting Fairness through Environment Modeling As [mc]square implements CTL model checking algorithms, we are limited to plain CTL without fairness constraints, thus, fairness must be introduced via an additional concept. To that end, we make use of environment modeling. Within [mc]square this particular feature is termed User Dened Environment (UDE) modeling [102, 103]. The UDE constrains the inputs read from the environment to a manually specied set of values and allows to control the occurrence of interrupt sources. An automata is used to dene input values as well as interrupt and value transitions. The use of UDE leads to the adapted model checking workow as shown in Figure 6.8.
2

In their approach, a fair Kripke structure is a 4-tuple M = (S, R, L, F), where S, R, and L are dened as in Section 3.4.2 and F 2S is a set of fairness constraints. is a path in M and inf() is dened as inf() = {s | s = si for innitely many i}. A path is fair i for every P F, inf() P = .

6 Real Life Case Study

The model checking workow is now enriched by a third input, namely the environment automata U. An UDE is realized by a communicating nite state machine [104], which interacts with a representation of the microcontroller, i.e., the C51Simulator component (see Section 3.6). During model checking, this automata represents the environment, thus, inuences the behavior of the C51Simulator. From the user point of view, UDE automata are created with a graphical editor inside the GUI of [mc]square. It works like any other automata drawing tool, e.g., the user adds states and transitions through toolbars onto a canvas. Alternatively, UDEs can be dened by using an environment description language [102]. Both approaches have the same expressiveness [103]. For the present case study, an UDE is used to dene an exact sequence of values read from the serial port. Moreover, we block or re the serial ISR at certain points to ensure fairness. UDE automata U Assembly source code or Hex le [mc]square Model checker
System model M

System property

M |= ?

Notication

yes

Counterexample

Figure 6.8: The model checking workow of [mc]square with UDE (cf. Figure 3.3).

An Environment Automata for Fairness in the Communication Protocol Verication Having discussed the principles of UDEs in [mc]square, the remainder of this section is dedicated to the procedure of nding a suitable UDE automata for fairness in verifying the communication protocol of the case study. Two requirements for the desired UDE automata are derived: (i) Values are read from the serial interface according to the communication protocol specication (cf. Figure 6.6). (ii) The unfair path of interrupt loops (see Section 6.4.5) is avoided through blocking of the corresponding ISRs for at least a single instruction after executing a RETI

6.4 Extracting CTL Properties Out of the Textual Specification

instruction. In other words, progress in the background part of the application (cf. Figure 6.3) is assured. With respect to this requirements, we can generate the environment automata U1, as shown in Figure 6.9. State changes among {S0, S1, S2, S3} are triggered whenever the application reads a value from the serial communication interface. For example, the transition label SBUF 35! indicates that the UDE forces the simulator to determinize the serial receive register SBUF to the value of 35, i.e., ASCII # (cf. Table 6.4). The states {B0, B1, B2, B3} are responsible for blocking the serial interrupt, after the serial ISR is left (transition ISR leave). The execution of the next instruction triggers the transition (Instr leave) back to one of the states of {S0, S1, S2, S3}.
SBUF {35,82,36}!

SBUF 35!
init

SBUF 82! S1 Instr leave Instr leave S2

SBUF 36! S3 Instr leave

Instr leave

ISR leave

B0
block ISR

B1
block ISR

B2
block ISR

ISR leave

B3
block ISR

Figure 6.9: A rst UDE automata proposal (U1). Item SBUF d 35 d 36 d 68 d 69 d 86 d 82 d 90 Description The serial transmit/receive register Decimal equivalent of ASCII # Decimal equivalent of ASCII $ Decimal equivalent of ASCII D Decimal equivalent of ASCII E Decimal equivalent of ASCII V Decimal equivalent of ASCII R Decimal equivalent of ASCII Z

Table 6.4: Denitions for UDE modeling. Nevertheless, the automata U1 is still not sucient for our protocol verication venture, due to the following consideration What happens if the serial (receive) interrupt does not occur at all? In fact, this conguration is possible on the real target hardware. If the host application does not initiate any serial communication at all, the knitting machine monitoring device will not send any answer to the host. This can be seen as the idle state of the application.

ISR leave

6 Real Life Case Study

However, as we are interested in the communication sequences rather than the idle state, the path where no serial (receive) interrupt occurs is unfair too. As a result, when model checking the communication protocol with the UDE automata U1, [mc]square disproves the properties by presenting counterexamples where the serial interrupt is never activated. In order to eliminate those paths, we extend the automata U1 to U2, as shown in Figure 6.10. Considering automata U2, the states {F0, F1, F2, F3} actively trigger the execution of the serial interrupt. Again, {B0, B1, B2, B3} are states where the serial interrupt is blocked. Basically, there are four sequences that are equivalent, i.e., {S0, B0, F0}, {S1, B1, F1}, {S2, B2, F2}, and {S3, B3, F3}. Transitions among those sequences ({F0, S1}, {F1, S2}, {F2, S3}, and {F3, S0}) are used to determinize the serial receive register SBUF to the values of the protocol as dened in Table 6.3.3. The transitions labeled with ISR leave from {F0, F1, F2, F3} back to {B0, B1, B2, B3} are needed due to an implementation detail of the application. In case the circular buer is full, the application skips incoming serial data bytes, thus, it may happen that a serial receive interrupt occurs, but the application does not read the value of the SBUF register. Hence, the transition is needed to prevent the automata U2 from being stuck in the states where the receive circular buer is full and the serial interrupt is red again, i.e., one of {F0, F1, F2, F3}. Note that the transition SBUF {35,36,82}! is used to send any of these three bytes between a communication sequence. Thus, in our UDE model possible sequences are {35,36,82,35} (#,$,R,#), {35,36,82,36} (#,$,R,$), {35,36,82,82} (#,$,R,R), {35,35,36,82} (#,#,$,R), . . . We can easily extend this claim to a full nondeterministic read between a communication sequence, e.g., by changing the transition to SBUF {0 . . . 255}!, however, as it will turn out in the remainder of this section this is not needed for our verication process. It should be noted that without detailed knowledge of the applications software structure it is almost impossible to obtain a proper UDE automata at least for the example code at hand. As we are aiming towards a formal verication tool that can be used as early as during the development phase, the software insight is brought into by the software development team. After demonstrating that an UDE automata is capable of introducing fairness to the model checking process, in the following, properties for the communication protocol verication are stated that are model checked with support of the automata U2. The extended [mc]square workow is used as shown in Figure 6.8. Note that, automata U2 exactly corresponds to property #Comm2. For the remaining properties, we adapt the transitions with the actual values of the command, i.e., we replace the transitions SBUF 82! and SBUF {35,36,82}! of property #Comm2 with SBUF 68! and SBUF {35,36,68}! to obtain the UDE automata for property #Comm1.

6.4 Extracting CTL Properties Out of the Textual Specification

SBUF 35!
f ire ISR

SBUF 36! S1
f ire ISR

Instr leave

ISR leave

block ISR

B1 Instr leave

block ISR

B3 Instr leave F3

ISR leave

f ire ISR

F1 SBUF 82!

f ire ISR

SBUF {35,36,82}!

init

Figure 6.10: The nal UDE automata (U2).

ISR leave

6 Real Life Case Study

Property #Comm1 Textual representation After receiving $ D # the knitting monitoring device answers with $ D RPM1 RPM2 #, i.e., sends the variable Revolutions. CTL representation ( [mc]square notation) AG(AF mark =MARK_SENDRPM) Comment If the application reads $ D # from the serial port the application always reaches the state MARK_SENDRPM.

Property #Comm2 Textual representation After receiving $ R # the knitting monitoring device answers with $ R #, i.e., enters the reset state. CTL representation ( [mc]square notation) AG(AF mark =MARK_RESET) Comment If the application reads $ R # from the serial port the application always reaches the state MARK_RESET.

Property #Comm3 Textual representation After receiving $ Z # the knitting monitoring device answers with $ Z CNT1 CNT2 #, i.e., sends the current pulse counter value. CTL representation ( [mc]square notation) AG(AF mark =MARK_SENDCOUNTER) Comment If the application reads $ Z CNT # from the serial port the application always reaches the state MARK_SENDCOUNTER.

Property #Comm4 Textual representation After receiving $ E # the knitting monitoring device answers with $ E INP #, i.e., sends the current input lines value. CTL representation ( [mc]square notation) AG(AF mark =MARK_SENDINPUTS) Comment If the application reads $ E # from the serial port the application always reaches the state MARK_SENDINPUTS.

6.5 Results

Property #Comm5 Textual representation After receiving $ V # the knitting monitoring device answers with $ V VER1 VER2 VER3 #, i.e., sends the software version string. CTL representation ( [mc]square notation) AG(AF mark =MARK_SENDVERSION) Comment If the application reads $ V # from the serial port the application always reaches the state MARK_SENDVERSION.

6.5 Results
In what follows, the results of the case study are presented. Note that for claritys sake only a summary of the most signicant results is given in this section.

6.5.1 Numbers
Table 6.5 shows the results of the rst [mc]square model checking run. The item States created refers to the overall states that are created by the model checker. The item States stored comprises the states that are stored in main memory. It is compiled out of the total states created (i.e., the item States created ) shortened by the number of state revisits, i.e., single states that are already present in the main memory. The column Time refers to the time needed by [mc]square to nish model checking. The numbers were generated on a Dual-Core AMD OpteronTM Processor 8220 with 2.80 Ghz (8 cores), equipped with 256 GB of RAM running 64 bit Windows Server Enterprise Edition and a JavaTM server virtual machine version 1.6.0 (with settings -server -Xmx200G -Xss120M). Source code revision 4338 of [mc]square was used.

6.5.2 Stack Analysis

As mentioned in Section 5.2.7, [mc]square performs a static stack analysis in order to detect stack corruptions. For the case study, no stack corruptions were detected, thus, the stack is safe. That means, all those bytes pushed onto the stack, are again popped from the stack in the right order. The upper stack bound of the application is 0x81 and the lower stack bound evaluated to 0x73, hence, the maximal stack size is 8. The results obtained from static analysis were cross-proved in the model checking run, as the maximum stack size during model checking evaluated to 8, too.

6.5.3 The Circular Buer Implementation

The circular buer implementation is covered by the properties #2, #3, #12, and #13. Whereas #2 and #3 refer to the correct initialization of the circular buer, properties #12 and #13 make propositions about the range of the read and write pointers. Referring to the resulting gures in Table 6.5, these properties could be veried on the analyzed source code. Thus, with respect to the postulated CTL properties, the circular buer implementation is error-free.

6 Real Life Case Study

Property

M |= ?

State space size

Time

Abstraction techniques NDPSW

Stored Created Revisits hh:mm:ss 2,107,785 22,627,672 721,167 00:05:11 230 634 0 00:00:01 146 145 0 00:00:01 230 634 0 00:00:01 230 634 0 00:00:01 123,242 126,486 3,244 00:00:02 123,304 126,551 3,247 00:00:02 104,234 107,051 2,817 00:00:02 142,248 156,924 3,676 00:00:02 161,195 165,300 4,106 00:00:03 2,325,333 20,099,919 525,454 00:04:25 2,333,585 20,065,707 524,632 00:04:28 2,333,585 20,065,707 524,632 00:04:20 2,497,397 19,999,441 524,632 00:04:33 2,359,645 20,018,732 524,632 00:04:35 1,519 344 14 00:00:01 Revisited Properties #4 to #8 4a 2,325,333 20,099,919 525,454 00:04:30 5a 20,459 150,896 3,247 00:00:03 6a 20,456 150,890 3,247 00:00:02 7a 23,893 173,084 3,676 00:00:02 8a 27,328 195,275 4,1063 00:00:03 Comm. Protocol Verication with UDE (faulty receiver implementation) Comm1 4,810 4,811 1 00:00:01 Comm2 4,810 4,811 1 00:00:01 Comm3 4,810 4,811 1 00:00:01 Comm4 4,810 4,811 1 00:00:01 Comm5 4,810 4,811 1 00:00:01 Comm. Protocol Verication with UDE (xed receiver implementation) Comm1 101,277 101,616 339 00:00:02 Comm2 48,513 48,672 159 00:00:01 Comm3 101,205 101,544 339 00:00:01 Comm4 100,989 101,328 339 00:00:02 Comm5 101,133 101,472 339 00:00:02 Table 6.5: Case study results.

# 1 1a 1b 2 3 4 5 6 7 8 9 10 11 12 13 14

100

6.5 Results

6.5.4 The Receiver State Machine

The receiver state machine, responsible for decoding input commands from the serial interface, is considered by properties #10 and #11. These properties were successfully veried by [mc]square on the analyzed source code (see Table 6.5), thus, the variable Command_state is proven to remain within its bounds. Furthermore, the receiver state machine implementation follows the claimed transition sequence.

6.5.5 Properties #4a to #8a

As shown in Table 6.5, property #4a is valid. The property claims that it is always possible to reach the sleep state of the application. Surprisingly, properties #5a to #8a were disproved by [mc]square. The counterexample shows that it is not possible to reach the states MARK_SENDVERSION (property #5a), MARK_SENDINPUTS (property #6a), MARK_SENDPULSCNT (property #7a), and MARK_SENDRPM (property #8a) in case the host application previously sent the command $ R # (go to sleep state) to the microcontroller application. The sleep state of the application is implemented as a while(true) endless loop, that can only be left through an external reset of the microcontroller by the watchdog module. As we do not consider the watchdog module in the verication process, the endless loop cannot be left, thus, properties #5a to #8a are correctly disproved by [mc]square.

6.5.6 The Communication Protocol

The proper implementation of the communication protocol is covered by properties #Comm1 to #Comm5. As aforementioned, we used the UDE automata U2 in order to rule out unfair paths and to overcome the lack of fairness in plain CTL model checking. As stated in Table 6.5, all those properties were falsied by [mc]square. The tool is capable of presenting counterexamples either in a step-by-step way by using the Intel MCS-51 simulator or by drawing a graphical counterexample path. Studying the counterexample reveals that the communication protocol implementation is erroneous. In the rare case the host application sends an even number of start bytes ($), the microcontroller application skips the following command, thus, fails to send a reply message to the host. Consequently, erroneous sequences are3 : {$,$,?,#} {$,$,$,$,?,#} {$,$,$,$,$,$,?,#} {$,$,$,$,$,$,$,$,?,#} ... More general, sequences of the form {[2n]n>0 $],?,#}4 lead to erroneous behavior described. Surprisingly, the communication protocol implementation is correct whenever the
3 4

Note that, ? {D,R,Z,E,V} (cf. Figure 6.6). Note that, the statement n > 0 is needed, since the conguration of n = 0 is invalid, thus, a sequence of {D,#} does not yield any response of the microcontroller application.

101

6 Real Life Case Study

host application sends an odd number of start bytes, i.e., {[2n+1]n>0 $],?,#}. Listing 6.2 shows the erroneous implementation of the receiver state machine. The counterexample presented by [mc]square reveals that a sequence of {[2n]n>0 $],?,#} resets the variable Command_state to 0 after the initial state is left for the rst time. In fact, the readCommand() implementation is now only sensitive to start bytes ($), but the host application already starts transmitting the command part (one of {D,R,Z,E,V}) of the communication sequence. As a result, the following protocol bytes are skipped as long as a new valid command sequence ({[2n + 1]n>0 $],?,#}) is received. The root cause of the failure lies in source code line 26 of Listing 6.2. The application waits for either a start byte (STX) or a stop byte (ETX), thus, an even number of start bytes resets the variable Command_state to 0 over and over again, as shown in source code line 28.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 c h a r readCommand ( v o i d ) { s t a t i c b y t e Command_state = 0 ; s t a t i c c h a r Command ; char c ; if ( charavail ( ) ) { c = rcvchar ( ) ; } else { return 0; } switch ( Command_state ) { case 0: / I n i t i a l S t a t e / i f ( c == STX) { Command_state = } break ; case 1: i f ( ( c == STX) | | ( c == { Command_state = } else { Command = c ; Command_state = } break ; case 2: Command_state = 0 ; i f ( c == ETX) { r e t u r n Command ; } break ; default : Command_state = 0 ; mark = MARK_DEFAULT; break ; } 0;

ETX) ) 0; / STX r e p e a t or ETX /

/ Command found / 2;

/ Command r e t u r n e d

i f ETX r e c e i v e d /

return }

Listing 6.2: The erroneous receiver state machine implementation. Revising the Receiver State Machine Implementation In order to correct the receiver state machine implementation, source code lines 25-35 of Listing 6.2 are adapted to lines 35-39 of Listing 6.3. Whenever the host now sends sequences of the form {[2n]n>0 $],?,#}, the variable Command_state is not erroneously reset to 0, but set to 1 which forces the state machine to wait for the following command byte.

102

6.5 Results

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53

c h a r readCommand ( v o i d ) { s t a t i c b y t e Command_state = 0 ; s t a t i c c h a r Command ; char c ; if ( charavail ( ) ) { c = rcvchar ( ) ; } else { return 0; } switch ( Command_state ) { case 0: / I n i t i a l S t a t e / i f ( c == STX) { Command_state = } break ; case 1: i f ( c == ETX) { Command_state = } e l s e i f ( c == STX) { Command_state = } else { Command = c ; Command_state = } break ; case 2: Command_state = 0 ; i f ( c == ETX) { r e t u r n Command ; } break ; default : Command_state = 0 ; mark = MARK_DEFAULT; break ; } 0;

/ ETX r e c e i v e d /

/ STX r e p e a t /

/ Command found / 2;

/ Command r e t u r n e d

i f ETX r e c e i v e d /

return }

Listing 6.3: The revised receiver state machine implementation.

6.5.7 Compiler Criticism

Not surprisingly, property #9 was disproved by [mc]square. It follows, that the application never enters the default path of the receiver state machine as shown in lines 47-50 in Listing 6.3. Even though it seems a little far-fetched, an intelligent compiler would recognize the default path as dead code and remove it from the *.hex le.

6.5.8 Comparison of Abstraction Techniques

In the following, a comparison of the implemented abstraction techniques is presented. It is used to assess the individual contribution to state space reduction of the dierent concepts. In order to achieve comparability among the abstraction techniques, one abstraction technique is enabled at a time. Furthermore, the formula AG (true) is used that is equivalent to plain state space building of the application. Table 6.6 shows the results of the second [mc]square model checking run by using the same hardware conguration as described in Section 6.5.1. Note that the rst model checking round was canceled after 24 hours of runtime.

103

6 Real Life Case Study

Total state space size

Time

Abstraction techniques State space Static analysis NDPSW DNDlA DND DVR RBA IFA SA

Stored >391,789,933 162,471,807 15,378,086 15,271,404 2,316,882 2,311,895

Created >6,981,334,953 172,839,103 15,902,718 15,796,063 20,072,077 20,015,571

Revisits 10,367,297 524,632 524,659 524,632 524,659

hh:mm:ss >48:00:00 01:33:10 00:05:41 00:05:01 00:04:30 00:04:27

Table 6.6: Case study results for plain state space building. Based on the results in Table 6.6, the following consequences are drawn: (i) For the present case study, it is not feasible to build the state space without any abstraction techniques applied. This is not surprising since the application heavily interacts with the environment. (ii) The model checking run with enabled Delayed Nondeterminism option was canceled after running two days on the server. Thus, Delayed Nondeterminism provides not enough abstraction to build the state space within reasonable time and resource constraints. The same applies to the option Delayed Nondeterminism with Look Ahead. (iii) The state space could only be built when enabling the options Delayed Nondeterminism with Look Ahead and Nondeterministic Program Status Word. These options drastically reduced the state space and consequently the run time. (iv) Enabling static analysis additionally helps to mitigate the state-explosion problem. Especially the option Path Reduction is a great contributor to state space reduction. (v) For the conducted case study the option Dead Variable Reduction leads only to a minor reduction of system states. This can be explained by the size of the source code. Static code analysis gets coarser with increasing source code complexity. Nevertheless, this result can be seen as an indicator that there is still vast room for improvements in the existing data-ow analyses. (vi) Due to the implemented abstraction techniques for the Intel MCS-51 target, the state space could be reduced to a number that can easily be handled by conventional desktop computers. There is no need for a dedicated server in order to build the state space. Note that the presented results are only valid for the investigated case study, actual savings of the individual abstraction techniques heavily depend on the source code structure, complexity, and the number of accesses to nondeterministic memory locations.

104

7 Remaining Challenges and Future Work

The reverse side also has a reverse side. (Japanese Proverb)

This section summarizes remaining challenges of the [mc]square approach and highlights future research possibilities. First, the problem of nding understandable counterexamples is highlighted. Next, the issue of verifying the simulator implementation is discussed and the idea of automatically generating simulators out of high level descriptions is presented. Then, the need of counterexample validation is claried. Finally, coping with the state-explosion problem is discussed.

7.1 Local Model Checking and Resulting Counterexamples

Due to the local model checking algorithm implemented in [mc]square (cf. Section 3.4.6), the tool presents only a single counterexample. Depending on the actual implementation of the search algorithm that builds and traverses the state space, the presented counterexample is very likely not an optimal counterexample. Technically, an optimal counterexample is of minimum length, thus, the one with the smallest number of states. However, from the user point of view a practical counterexample is one which can easily be understood by the tool user. Finding such an understandable counterexample is a rather challenging task. Enabling a variation of counterexamples requires either a global model checking algorithm or the possibility to continue the search for further counterexamples within the local model checking algorithm. At best, the tool generates a number of counterexamples and the user gets the possibility to study any of those counterexamples. Thus, the following conclusions are derived: (i) Further research is needed in order to algorithmically nd an understandable counterexample, or to dene characteristics to assess the understandability of a counterexample. (ii) In order to do so, the internal model checking algorithms of [mc]square have to be revised.

7.2 Getting the Intel MCS-51 Simulator Implementation Right

A major characteristic of the [mc]square approach is the handmade CPU simulator that serves for state space building. Generating a simulator of a modern, high performance

105

7 Remaining Challenges and Future Work

microcontroller is a challenging, lengthy, and error-prone task. Consequently, verication of the simulator is tricky, too. Whereas the instruction set part of the implementation can be easily veried against other commercial available target simulators (cf. Section 3.6.3), reasonable verication of the customized parts of the target simulator still remains an open issue. It is fair to state that implementation errors residing in the simulator itself are very likely to be uncovered during model checking since the model checker urges the simulator to execute single instructions with a huge number of input congurations. As a result, bugs that stem from the simulator implementation are revealed by wrong counterexamples presented by [mc]square. This is an especially eective method to get the simulator bug-free in the early stages of an implementation. However, the following conclusions are noted: (i) Strategies for handling the verication of the customized CPU simulator have to be proposed. (ii) Derive methods for an automatic verication of the CPU simulator.

7.3 The Automatic Generated Target Simulator

Lowering the implementation eort for new microcontroller families is a major target of future research. The following approaches are taken into consideration: (i) Derive new simulator models out of a high level description of the microcontroller, for example out of a behavioral hardware model. (ii) Interfacing to an existing hardware model (e.g., RTL code and IP cores). (iii) A generic simulator generator.

7.4 Counterexample Validation

As [mc]square uses a model of the actual target microcontroller, one has to make sure that the presented counterexamples are indeed real ones that occur in the eld. The various abstraction techniques help to alleviate the state-explosion problem, however, they also introduce over-approximation, i.e., generate behavior that would not occur on the real target microcontroller within its working environment. In practice, counterexamples are manually validated by the test engineer by executing the code at the target platform and trying to reconstruct the counterexample at the real life application. Thus, an automatic approach of proving a counterexample to be a real one might be the last missing part of a fully automatic embedded systems software formal verication process. At the best of the authors knowledge, no feasible approach has been adopted so far to automatically cross-check a given error trace on the real target hardware. Mercer and Jones [8] are using the GNU debugger for state space generation in their model checking approach. Nevertheless, their approach suers from state-explosion and rather slow execution times.

106

7.5 Coping the State-Explosion Problem

Thus, an innovative approach for counterexample validation is needed. In the long run, it might be feasible to tie model checking to the root where all software errors are emerging from to the hardware unit whose software is subject to verication itself. It might be promising to extend existing microcontroller IP cores in a way to support state space building and the automatic validation of counterexamples. However, the following conclusions are noted: (i) Further research is needed to automatically validate a given counterexample on the real target hardware. (ii) Evaluate possibilities of generating a customized IP core for state space generation and counterexample validation, based on available microcontroller implementations. (iii) Strategies to contain massive over-approximations due to abstraction techniques are needed.

7.5 Coping the State-Explosion Problem

Although the available abstraction techniques (cf. Section 4.2 and 5.2) lead to tremendous state space reductions, the state-explosion problem is still one of the heavy-weighted challenges to overcome in oder to scale up the [mc]square approach to huge code bases. (i) Promote assembly code static analysis in order to obtain greater savings in the state space. (ii) Focus on invariant model checking and try to disprove certain properties as early as possible by the preceding static analysis. (iii) Consider further architectural peculiarities of the target microcontrollers and develop tailored abstraction techniques.

107

7 Remaining Challenges and Future Work

108

8 Conclusion
The main contribution of the present master thesis is the enhancement of the existing C51Simulator component of the [mc]square model checker. The Intel MCS-51 simulator is extended by (i) state-space abstraction techniques and (ii) integrated into the static analysis framework of [mc]square. Finally, (iii) a real life embedded systems application is formally veried with [mc]square by taking advantage of the implemented abstraction techniques. Regarding (i), a novel abstraction technique, termed Delayed Nondeterminism with Look Ahead is proposed that when applied to formal verication of I/O intensive embedded systems assembly code is able to achieve a quite notable state space reduction. In particular, this approach helps to avoid the generation of successor states whenever a microcontroller executes logic operations. The presented approach centers around the coherence among the boolean operators , , and with particular regard to 3-valued logic. Regarding (ii), existing static analyses are adapted to the Intel MCS-51 architecture. Furthermore, a novel data-ow analysis, termed Register Bank Analysis is introduced. This new analysis is used to support static assembly code analysis within [mc]square. In particular, the approach leads to more precise Reaching Denition Analysis and Live Variable Analysis results, which allows the detection of additional dead variables. Thus, the number of overall system states is reduced during model checking. Typical data-ow analysis for high-level languages cannot be applied to assembly code one to one. Analyses such as Reaching Denition Analysis have to be adapted to be applicable to assembly code. The approach shows that it is necessary to take architectural peculiarities into account during the analysis to achieve precise results. Regarding (iii), a real life industrial application is model checked by [mc]square. A specication in Computational Tree Logic (CTL) is derived out of a textual specication. It is possible to reveal an implementation error concerning the receiver state machine, responsible for decoding incoming data bytes from the serial interface. The found error is very likely to go unnoticed during traditional testing methods, since the erroneous behavior only shows up in the rare case the host application sends sequences in the form of {[2n]n>0 $],?,#} to the target microcontroller, i.e., sequences with an even number of start bytes. It turns out that some of the system properties cannot be suciently specied in CTL, due to the lack of fairness in CTL. Unfair paths in the microcontroller program are determined and ruled out by taking advantage of a concept termed User Dened Environment (UDE). With UDE it is possible to introduce the required fairness constraints into the [mc]square model checking process. A solution to x the existing implementation error is given and is proved to be correct in a further model checking run. [mc]square proved to be a promising approach for model checking and static analysis of Intel MCS-51 assembly source code. It aims at a push-button formal verication approach of embedded systems code by relying on custom, highly optimized simulator components. When compared to traditional model checking approaches, the eort for the verication of a system or software is shifted from the user and a system model towards the model

109

8 Conclusion

checking tool and the implementation itself. Nevertheless, as recognized by Gerth in [3], the real challenge besides all the technical issues that have to be solved in formal verication lies in convincing the design teams that devoting some of their verication resources to formal methods leads to a higher design quality. Thus, the major future challenge is to move formal verication upstream in the embedded systems design ow. Contributors, such as ever shortening design cycles and stringent time to market requirements, strongly support the claim for formal verication even at very early design stages. It is about time to transform projects successful in research and academia into practical tools ready to be used within the day-to-day (embedded) software engineering practice. To conclude, a vague and rather incomplete personal outlook on future trends in formal verication is given: (i) The holy grail of full program verication has been abandoned - It will probably remain abandoned for the next years. (ii) Less ambitious tools like [mc]square might emerge and become more widely used to formally verify sensitive parts of the application software. (iii) Future tools will exploit ideas from various analysis disciplines, such as abstract interpretation, static analysis, and model checking. (iv) Future tools will aim at alleviating the chicken-and-egg problem of writing specications.

110

Bibliography
[1] M. Woodward and P. Mosterman, Challenges for embedded software development, in Proceedings of the 50th International Midwest Symposium on Circuits and Systems (MWSCAS), Montreal, Canada, August 2007, pp. 630633. [2] G. J. Holzmann, Software safety in rocket science, ERCM News Special: SafetyCritical Software, vol. 75, pp. 1415, October 2008. [3] R. Gerth, Model checking if your life depends on it: A view from Intels trenches, in Proceedings of the 8th International SPIN Workshop, Toronto, Canada, 2001. [4] L. Holenderski, A model checking project at Philips research, in Proceedings of the 8th International SPIN Workshop, Toronto, Canada, 2001. [5] D. Coer, E. Engstrom, R. Goldman, D. Musliner, and S. Vestal, Applications of model checking at Honeywell Laboratories, in Proceedings of the 8th International SPIN Workshop, Toronto, Canada, 2001. [6] B. Schlich and S. Kowalewski, [mc]square: A model checker for microcontroller code, in Proceedings of the 2nd International Symposium on Leveraging Applications of Formal Methods, Verication and Validation (ISoLA 2006), Paphos, Cyprus, 2006, pp. 466473. [7] A. Fehnker, R. Huuck, F. Rauch, and S. Seefried, Some assembly required - program analysis of embedded systems code, in Proceedings of the 8th IEEE International Working Conference on Source Code Analysis and Manipulation, Beijing, China, September 2008, pp. 1524. [8] E. Mercer and M. Jones, Model checking machine code with the GNU debugger, in Proceedings of the 12th SPIN Workshop on Model Checking Software, ser. Lecture Notes in Computer Science, vol. 3639, August 2005. [9] B. Schlich, Model checking of software for microcontrollers, Dissertation, RWTH Aachen University, Aachen, Germany, June 2008. [Online]. Available: http://sunsite.informatik.rwth-aachen.de/Publications/AIB/2008/2008-14.pdf [10] UAS Technikum Wien, FHplus project design methods for embedded control systems (DECS), visited: May 2009. [Online]. Available: http://embsys. technikum-wien.at/projects/decs/index.html [11] T. Reinbacher, M. Kramer, M. Horauer, and B. Schlich, Challenges in embedded model checking a simulator for the [mc]square model checker, in Proceedings of the 3rd Intl Symposium on Industrial Embedded Systems (SIES 2008), Montpellier, France, 2008, pp. 245248.

111

Bibliography

[12] , Motivating model checking for embedded systems software, in Proceedings of the 4th IEEE/ASME Intl Conf. Mechatronic and Embedded Systems and Applications (MESA 2008), Beijing, China, October 2008, pp. 546551. [13] T. Reinbacher, M. Horauer, and B. Schlich, Using 3-valued memory representation for state space reduction in embedded assembly code model checking, in Proceedings of the 12th IEEE Symposium on Design and Diagnostics of Electronic Circuits and Systems (DDECS 2009), Liberec, Czech Republic, April 15-17 2009, pp. 114119. [14] T. Reinbacher, J. Brauer, M. Horauer, and B. Schlich, Rening assembly code static analysis for the Intel MCS-51 microcontroller, in Proceedings of the 4th Intl Symposium on Industrial Embedded Systems (SIES 2009), Lausanne, Switzerland, July 8-10 2009, accepted for publication. [15] E. S. Raymond, The Cathedral and the Bazaar. Musings on Linux and Open Source by an Accidental Revolutionary. OReilly Media, 1999. [Online]. Available: http://www.catb.org/~esr/writings/cathedral-bazaar/cathedral-bazaar/ [16] J. L. Lions, ARIANE 5 ight 501 failure report, July 1996, visited: May 2009. [Online]. Available: http://www.ima.umn.edu/~arnold/disasters/ariane5rep.html [17] M. I. Board, Mars climate orbiter - phase I report, November 1999, visited: May 2009. [Online]. Available: ftp://ftp.hq.nasa.gov/pub/pao/reports/1999/MCO_ report.pdf [18] NYISO, Interim report on the August 14, 2003 blackout, Januar 2004, visited: May 2009. [Online]. Available: http://www.hks.harvard.edu/hepg/Papers/NYISO. blackout.report.8.Jan.04.pdf [19] M. Kanellos, Software glitch stalls some Toyota hybrids, October 2005, visited: May 2009. [Online]. Available: http://news.cnet.com/ Software-glitches-stalls-some-Toyota-hybrids/2100-11389_3-5895574.html [20] D. Gainer, Microsoft Excel calculation issue update, September 2007, visited: May 2009. [Online]. Available: http://blogs.msdn.com/excel/archive/2007/09/25/ calculation-issue-update.aspx [21] APA, A1-Netzausfall: Mobilkom gibt Entwarnung, October 2008, in German. Visited: May 2009. [Online]. Available: http://futurezone.orf.at/stories/317646/ [22] E. A. Emerson, The beginning of model checking: A personal perspective, 25 Years of Model Checking: History, Achievements, Perspectives, pp. 2745, 2008. [23] T. Reinbacher, Introduction to embedded software verication, 2008, Students Paper, UAS Technikum Wien, Master Embedded Systems, Course: System Architecture and Engineering SS-08. [24] A. Turing, On computable numbers, with an application to the Entscheidungsproblem, in Proceedings of the London Mathematical Society, ser. 2, vol. 42, 1936, pp. 230265.

112

Bibliography

[25] T. Hoare and J. Misra, Veried software: theories, tools, experiments vision of a grand challenge project, in Veried Software: Theories, Tools, Experiments (VSTTE 2005), Toronto, Canada, 2005. [26] D. A. Wheeler, Linux Kernel 2.6: Its worth more! November 2007, visited: May 2009. [Online]. Available: http://www.dwheeler.com/essays/linux-kernel-cost.html [27] US Department of Commerce, The economic impacts of inadequate infrastructure for software testing, May 2002. [Online]. Available: http://www.nist.gov/director/ prog-ofc/report02-3.pdf [28] E. M. Clarke, O. Grumberg, and D. A. Peled, Model Checking. 1999, ISBN 0262032708. [29] C. Baier and J.-P. Katoen, Principles of Model Checking. ISBN 026202649X. The MIT Press,

The MIT Press, 2008,

[30] E. M. Clarke and E. A. Emerson, Design and synthesis of synchronization skeletons using branching time temporal logic, in Workshop on Logic of Programs, ser. Lecture Notes in Computer Science, vol. 131, 1981, pp. 5271. [31] J.-P. Queille and J. Sifakis, Specication and verication of concurrent systems in CESAR, in Proceedings of the 5th Colloquium on International Symposium on Programming, London, UK, 1982, pp. 337351. [32] E. Clarke, The birth of model checking, in 25 Years of Model Checking, ser. Lecture Notes in Computer Science, vol. 5000, 2008, pp. 126. [33] E. A. Emerson and E. M. Clarke, Characterizing correctness properties of parallel programs using xpoints, in Proceedings of the 7th Colloquium on Automata, Languages and Programming, 1980, pp. 169181. [34] A. Pnueli, The temporal semantics of concurrent programs, in Proceedings of the International Sympoisum on Semantics of Concurrent Computation. Springer-Verlag, 1979, pp. 120. [35] A. Pnueli and Z. Manna, The Temporal Logic of Reactive and Concurrent Systems: Specication. Springer-Verlag Gmbh, 1991, ISBN 0387976647. [36] E. M. Clarke and I. A. Draghicescu, Expressibility results for linear-time and branching-time logics, in Linear Time, Branching Time and Partial Order in Logics and Models for Concurrency, School/Workshop, vol. 354, London, UK, 1989, pp. 428437. [37] G. J. Holzmann, The model checker SPIN, IEEE Transactions on Software Engineering, vol. 23, pp. 279295, 1997. [38] K. Heljanko, Model checking the branching time temporal logic CTL, Helsinki University of Technology, Digital Systems Laboratory, Espoo, Finland, Tech. Rep. A45, May 1997.

113

Bibliography

[39] T. Schuele and K. Schneider, Global vs. local model checking: a comparison of verication techniques for innite state systems, in Proceedings of the 2nd IEEE International Conference on Software Engineering and Formal Methods (SEFM 2004), Beijing, China, September 26 - 30 2004, pp. 67 76. [40] D. Peled, Software Reliability Methods. Springer, 2001, ISBN 0387951067. [41] G. Balakrishnan, T. Reps, D. Melski, and T. Teitelbaum, WYSINWYX: What you see is not what you execute, in Veried Software: Theories, Tools, Experiments (VSTTE 2005), Toronto, Canada, 2005. [42] B. Schlich, M. Rohrbach, M. Weber, and S. Kowalewski, Model checking software for microcontrollers, Technischer Bericht AIB-2006-11, RWTH Aachen, Tech. Rep., 2006. [43] F. Scheuer, Extending the model checker [mc]square to handle the inneon XC167 mircocontroller, Masters thesis, RWTH Aachen University, Department of Computer Science 11, May 2007, (in German). [44] J. Wernerus, Model-checking of instruction list programs for programmable logic controllers using [mc]square, Masters thesis, RWTH Aachen University, Department of Computer Science 11, 2008, (in German). [45] T. Reinbacher, MCS-51 simulator integration into the [mc]square model checker, Department of Embedded Systems, University of Applied Sciences Technikum Wien, Tech. Rep., 2007. [46] Intel Cooperation, MCS 51 Microcontroller Family Users Manual, 1994, order No.: 272383-002. [47] NXP Semiconductors, 80C51 family programmers guide and instruction set, 1997. [Online]. Available: http://www.standardics.nxp.com [48] S. Dutta, SDCC compiler user guide, online, visited: Available: http://sdcc.sourceforge.net/doc/sdccman.pdf May 2009. [Online].

[49] T. Noll and B. Schlich, Delayed nondeterminism in model checking embedded systems assembly code, in Proceedings of the 3rd Intl Haifa Verication Conf. (HVC 2007), ser. Lecture Notes in Computer Science, vol. 4899, 2008, pp. 185201. [50] T. Ball and S. Rajamani, The SLAM project: Debugging system software via static analysis, in Proceedings of the Symposium on Principles of Programming Languages (POPL 2002), Portland, USA, January 16-18 2002, pp. 13. [51] D. Beyer, T. Henzinger, R. Jhala, and R. Majumdar, The software model checker BLAST: Applications to software engineering, International Journal on Software Tools for Technology Transfer, vol. 9, pp. 505525, 2007. [52] S. Chaki, E. Clarke, A. Groce, J. Ouaknine, O. Strichman, and K. Yorav, Ecient verication of sequential and concurrent C programs, Formal Methods in System Design (FMSD), vol. 25, pp. 129166, 2004.

114

Bibliography

[53] H. Chen, D. Dean, and D. Wagner, Model checking one million lines of C code, in Proceedings of the 11th Annual Network and Distributed System Security Symposium (NDSS), 2004, pp. 171185. [54] M. Gallardo, P. Merino, and D. Sanan, Towards model checking C code with OPEN/CAESAR, in Proceedings of the 4th International Workshop on Modelling, Simulation, Verication and Validation of Enterprise Information Systems (MSVVEIS06), Paphos, Cyprus, 2006, pp. 198201. [55] P. de la Cmara, M. Gallardo, P. Merino, and D. Sann, Model checking software with well-dened APIs: the socket case, in Proceedings of the 10th international workshop on Formal Methods for Industrial Critical Systems (FMICS), Lisbon, Portugal, 2005, pp. 1726. [56] B. Schlich and S. Kowalewski, Model checking c source code for embedded systems, in Proceedings of the IEEE/NASA Workshop Leveraging Applications of Formal Methods, Verication, and Validation (ISoLA 2005), 2005. [57] T. Mehler, Challenges and applications of assembly-level software model checking, Dissertation, University of Dortmund, 2006. [Online]. Available: https://eldorado.tu-dortmund.de/bitstream/2003/22435/1/main.pdf [58] S. C. Kleene, Introduction to Metamathematics, 11st ed. North Holland, 1996. [59] G. Bruns and P. Godefroid, Model checking partial state spaces with 3-valued temporal logics, in Proceedings of the 11th Intl Conf. Computer Aided Verication (CAV99), ser. Lecture Notes in Computer Science, vol. 1633, 1999, pp. 274287. [60] E. Yahav, Verifying safety properties of concurrent Java programs using 3-valued logic, in Proceedings of the 27th ACM Principles of Programming Languages Conference (POPL 2000), vol. 36, no. 3, Boston, USA, 2001, pp. 2740. [61] R. E. Bryant, A methodology for hardware verication based on logic simulation, Journal of the ACM, vol. 38, no. 2, pp. 299328, 1991. [62] T. Feng, L.-C. Wang, K.-T. Cheng, M. Pandey, and M. S. Abadir, Enhanced symbolic simulation for ecient verication of embedded array systems, in Proceedings of the 2003 Conference on Asia South Pacic Design Automation (ASPDAC), Kitakyushu, Japan, 2003. [63] C. Seger and R. Bryant, Formal verication by symbolic evaluation of partiallyordered trajectories, in Formal Methods in System Design, 1993, pp. 147190. [64] C.-J. H. Seger, R. B. Jones, J. W. OLeary, T. Melham, M. D. Aagaard, C. Barrett, and D. Syme, An industrially eective environment for formal hardware verication, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 24, pp. 13811405, 2005. [65] P. Godefroid, N. Klarlund, and K. Sen, DART: directed automated random testing, in Proceedings of the 2005 ACM SIGPLAN Conf. Programming language design and implementation (PLDI 05), vol. 40, no. 6, 2005, pp. 213223.

115

Bibliography

[66] K. Sen, D. Marinov, and G. Agha, CUTE: a concolic unit testing engine for C, in Proceedings of the 10th European Software Engineering Conference/13th ACM SIGSOFT Int. Symp. on Foundations of Software Engineering (ESEC/FSE-13 2005), 2005, pp. 263272. [67] T. Reps, M. Sagiv, and R. Wilhelm, Static program analysis via 3-valued logic, in Proceedings of the 16th Intl Conf. Computer Aided Verication (CAV 2004), ser. Lecture Notes in Computer Science, vol. 3114, Boston, USA, July 13-17 2004, pp. 1530. [68] M. Sagiv, T. Reps, and R. Wilhelm, Parametric shape analysis via 3-valued logic, ACM Transactions on Programming Languages and Systems (TOPLAS), vol. 24, no. 3, pp. 217298, 2002. [69] A. Fehnker, R. Huuck, B. Schlich, and M. Tapp, Static analysis for microcontrollers, in Proceedings of the Current Trends in Theory and Practice of Computer Science (SOFSEM 09), ser. Lecture Notes in Computer Science, pindlerv Mln, Czech Republic, 2009, to appear. [70] J. Regehr and A. Reid, HOIST: A system for automatically deriving static analyzers for embedded systems, ACM SIGOPS Operating Systems Review, vol. 38, no. 5, pp. 133143, 2004. [71] J. Regehr and U. Duongsaa, Deriving abstract transfer functions for analyzing embedded software, in Proceedings of the ACM SIGPLAN/SIGBED Conference on Language, Compiler, and Tool Support for Embedded Systems (LCTES 2006), Ottawa, Canada, 2006, pp. 3443. [72] J. Bergeron, M. Debbabi, M. M. Erhioui, and B. Ktari, Static analysis of binary code to isolate malicious behaviors, in Proceedings of the 8th Workshop on Enabling Technologies on Infrastructure for Collaborative Enterprises (WETICE 1999), Stanford, USA, 1999, pp. 184189. [73] D. Brylow, N. Damgaard, and J. Palsberg, Static checking of interrupt-driven software, in Proceedings of the 23rd International Conference on Software Engineering (ICSE 2001), Toronto, Canada, 2001, pp. 4756. [74] F. Martin, M. Alt, R. Wilhelm, and C. Ferdinand, Analysis of loops, in Proceedings of the 7th International Conference on Compiler Construction (CC 1998), ser. Lecture Notes in Computer Science, vol. 1383, Lisbon, Portugal, 1998, pp. 8094. [75] J. Regehr, A. Reid, and K. Webb, Eliminating stack overow by abstract interpretation, in Proceedings of the 3rd International Conference on Embedded Software (EMSOFT 2003), Philadelphia, USA, 2003, pp. 306322. [76] C. Linn, S. Debray, G. Andrews, and B. Schwarz, Stack analysis of x86 executables, 2004, visited: May 2009. [Online]. Available: http: //www.cs.arizona.edu/people/debray/papers/stack-analysis.ps [77] C. Cifuentes and A. Fraboulet, Intraprocedural static slicing of binary executables, in Proceedings of the International Conference on Software Maintenance (ICSM 1997), Bari, Italy, 1997, pp. 188195.

116

Bibliography

[78] A. Lal and T. Reps, Reducing concurrent analysis under a context bound to sequential analysis, in Proceedings of the 20th International Conference on Computer Aided Verication (CAV 2008), Princeton, USA, 2008. [79] S. Qadeer and J. Rehof, Context-bounded model checking of concurrent software, in Proceedings of the 11th International Conference on Tools and Algorithms for Construction and Analysis of Systems (TACAS 2005), ser. LNCS, vol. 3440, Edinburgh, UK, 2005, pp. 93107. [80] A. Lal, T. Touili, N. Kidd, and T. Reps, Interprocedural analysis of concurrent programs under a context bound, in Proceedings of the 14th International Conference on Tools and Algorithms for Construction and Analysis of Systems (TACAS 2008), ser. LNCS, vol. 4963, Budapest, Hungary, 2008, pp. 282298. [81] F. Nielson, H. Nielson, and C. Hankin, Principles of Program Analysis. 2004, ISBN 3540654100. Springer,

[82] J. E. Hopcroft and J. D. Ullman, Introduction to Automata Theory, Languages, and Computation. Addison Wesley, 1979, ISBN 0201441241. [83] G. E. Moore, Cramming more components onto integrated circuits, Electronics Magazine, vol. 38, no. 8, April 1965. [Online]. Available: ftp://download.intel.com/ museum/Moores_Law/Articles-Press_Releases/Gordon_Moore_1965_Article.pdf [84] J. P. Arpasi, Introduction to ternary logic, November 2003. [Online]. Available: http://www.aymara.org/ternary/ternary.pdf [85] R. Martin, Agile Software Development, Principles, Patterns, and Practices. Prentice Hall, 2002, ISBN 0135974445. [86] V. Kamin, Extending the symbolic representation of states in [mc]square, Masters thesis, RWTH Aachen University, Department of Computer Science 11, 2008, (in German). [87] A. Aho, M. Lam, R. Sethi, and J. Ullman, Compilers: Principles, Techniques, and Tools, 2nd ed. Addison Wesley, 2006. [88] J. Blieberger and B. Burgstaller, Symbolic reaching denitions analysis of Ada programs, in Lecture Notes in Computer Science, 1998. [89] S. Mahlke, Introduction to compilers: Dataow analysis, liveness analysis, reaching denitions, 2003, Lecture Notes EECS 483 (Lecture 18), University of Michigan, November 2003. [Online]. Available: http://www.eecs.umich.edu/~mahlke/483f03/ lectures/483L18.pdf [90] B. Schlich, J. Lll, and S. Kowalewski, Application of static analyses for state space reduction to microcontroller assembly code, in Proceedings of the 12th Intl Workshop Formal Methods for Industrial Critical Systems (FMICS 2007), ser. Lecture Notes in Computer Science, vol. 4916, Berlin, Germany, 2007, pp. 2137. [91] K. Yorav and O. Grumberg, Static analysis for state-space reductions preserving temporal logics, Formal Methods in System Design, vol. 25, pp. 6796, 2004.

117

Bibliography

[92] P. Cousot and R. Cousot, Abstract interpretation: a unied lattice model for static analysis of programs by construction or approximation of xpoints, in Proceedings of the 4th ACM SIGACT-SIGPLAN symposium on Principles of programming languages, 1977. [93] B. Anckaert, M. Madou, and K. D. Bosschere, A model for self-modifying code, in Lecture Notes in Computer Science, vol. 2007, 2007, pp. 232248. [94] R. Heckmann and C. Ferdinand, Worst-case execution time prediction by static program analysis, White Paper, 2008, AbsInt Angewandte Informatik GmbH. [Online]. Available: http://www.absint.com/aiT_WCET.pdf [95] C. Healy, M. Sjdin, V. Rustagi, D. Whalley, and R. V. Engelen, Supporting timing analysis by automatic bounding of loopiterations, Real-Time Systems, vol. 18, pp. 129156, 2000. [96] M. Weiser, Program slicing, in Proceedings of the 5th International Conference on Software engineering (ICSE 81), San Diego, USA 1981, pp. 439449. [97] P. Horowitz and W. Hill, The art of electronics. Cambridge Univ. Press, 1980, ISBN 0521370957. [98] J. J. Labrosse, MicroC/OS-II The Real Time Kernel. 1578201039. CMP Books, 2002, ISBN

[99] M. J. Pont, Patterns for time-triggered embedded systems: building reliable applications with the 8051 family of microcontrollers. New York, NY, USA: ACM Press/Addison-Wesley Publishing Co., 2001. [100] L. Lamport, Sometime is sometimes not never: on the temporal logic of programs, in Proceedings of the 7th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 80), Las Vegas, USA, 1980, pp. 174185. [101] E. A. Emerson and J. Y. Halpern, Sometimes and not never revisited: on branching versus linear time (preliminary report), in Proceedings of the 10th ACM SIGACTSIGPLAN Symposium on Principles of Programming Languages (POPL 83), Austin, USA, 1983, pp. 127140. [102] D. Gckel, Extending the model checker [mc]square by user-dened environments, Masters thesis, RWTH Aachen University, Department of Computer Science 11, December 2007, (in German). [103] B. Schlich, D. Gckel, and S. Kowalewski, Modeling the environment of microcontrollers to tackle the state-explosion problem in model checking, in Proceedings of the 7th Symp. Formal Methods for Automation and Safety in Railway and Automotive Systems (FORMS/FORMAT 2008), Budapest, Hungary, 2008, pp. 2734. [104] D. Brand and P. Zaropulo, On communicating nite-state machines, J. ACM, vol. 30, no. 2, pp. 323342, 1983.

118

List of Figures
3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 4.1 4.2 4.3 4.4 4.5 Formal verication methods classication. . . . . . . . . . . . . . . . . . . . CTL examples and intuitions. . . . . . . . . . . . . . . . . . . . . . . . . . . The model checking workow. . . . . . . . . . . . . . . . . . . . . . . . . . . The coee vending machine example. . . . . . . . . . . . . . . . . . . . . . . The model checking workow of the [mc]square approach (cf. Figure 3.3). The [mc]square framework. . . . . . . . . . . . . . . . . . . . . . . . . . . C51Simulator verication process. . . . . . . . . . . . . . . . . . . . . . . . . Software architecture of the C51Simulator. . . . . . . . . . . . . . . . . . . . Over- and under-approximation in abstraction [81]. . . . . . . . . . . . . . . Nondeterministic state space representation. . . . . . . . . . . . . . . . . . . The Delayed Nondeterminism approach of handling the MOV [0xA, 0xB] instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The state-explosion problem. . . . . . . . . . . . . . . . . . . . . . . . . . . Successor state generation and resulting system states with options: instantiate immediately, Delayed Nondeterminism, and Delayed Nondeterminism with Look Ahead for the assembly code presented in Listing 4.3. . . . . . . 7 10 11 13 15 18 19 20 26 29 31 34

36 44 46 50 52 57 58 58 59 61 61 65 67 77 78 79 79 80 82 93 94

The resulting CFG for Listing 5.1. . . . . . . . . . . . . . . . . . . . . . . . Data-ow analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LVA example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The [mc]square static analysis framework for the Intel MCS-51 target. . . The type hierarchy of the C51LVALatticeElement. . . . . . . . . . . . . . . The type hierarchy of the C51LVABuilder. . . . . . . . . . . . . . . . . . . . The type hierarchy of the C51RDALatticeElement. . . . . . . . . . . . . . . The type hierarchy of the C51RDABuilder. . . . . . . . . . . . . . . . . . . Bit-wise modeling of the register bank selection pointer. . . . . . . . . . . . The join-operator and a simple CFG. . . . . . . . . . . . . . . . . . . . . . . The corresponding CFG as generated with [mc]square for the assembly code in Listing 5.7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12 The principle of PR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 The target application. . . . . . . . . . . . . . . . . . . . . . The knitting machine monitoring device. . . . . . . . . . . . The foreground/background design pattern. . . . . . . . . . The software components. . . . . . . . . . . . . . . . . . . . A software circular buer model. . . . . . . . . . . . . . . . Communication sequence chart. . . . . . . . . . . . . . . . . The unfair path. . . . . . . . . . . . . . . . . . . . . . . . . The model checking workow of [mc]square with UDE (cf. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 3.3). . . . . . . . . .

5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11

119

6.9 A rst UDE automata proposal (U1). . . . . . . . . . . . . . . . . . . . . . . 95 6.10 The nal UDE automata (U2). . . . . . . . . . . . . . . . . . . . . . . . . . 97

120

List of Tables
3.1 3.2 4.1 4.2 4.3 4.4 4.5 4.6 4.7 5.1 5.2 5.3 5.4 5.5 5.6 6.1 6.2 6.3 6.4 6.5 6.6 Memory representation in [mc]square. . . . . . . . . . . . . . . . . . . . . 20 ND memory representations and resulting value combinations. . . . . . . . . 21 Data memory size and resulting system states. . . . . . . . . . . . . Comparison of abstraction techniques for the C51Simulator. . . . . . Memory contents before and after the MOV instruction. . . . . . . . . Truth table for 3-valued logic. . . . . . . . . . . . . . . . . . . . . . . How bitmasks are used in embedded software. . . . . . . . . . . . . . Details on the Delayed Nondeterminism with Look Ahead approach. The ADDC [A, R0] example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 30 30 32 33 35 39 47 49 57 59 60 66 81 83 84 95 100 104

Results after solving data-ow equations for source code Listing 5.3. . . . Results after solving LVA data-ow equations for source code Listing 5.4. Action List Building a few examples. . . . . . . . . . . . . . . . . . . . . Register bank congurations of the Intel MCS-51. . . . . . . . . . . . . . . Evaluating killLV () for MOV [R0, #const]. . . . . . . . . . . . . . . . . . Comparison of resulting live variables. . . . . . . . . . . . . . . . . . . . . Ringbuer elements and their size. . . . . . . . . The master-slave communication protocol. . . . . Case study variables and their meaning. . . . . . Denitions for UDE modeling. . . . . . . . . . . . Case study results. . . . . . . . . . . . . . . . . . Case study results for plain state space building. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

121

122

List of Algorithms
1 2 3 A xed point iterating algorithm to solve data-ow equations for the RDA problem [87]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 A xed point iterating algorithm to solve data-ow equations for the LVA problem [87]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 CFG building algorithm for the Intel MCS-51 target. . . . . . . . . . . . . . . 55

123

124

Listings
4.1 4.2 4.3 4.4 4.5 4.6 4.7 Assembly code excerpt. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Embedded C code example program for the Intel MCS-51 target. . . . . . . Translated assembly code for source code lines 4-5 of Listing 4.2. . . . . . . The Delayed Nondeterminism visitor pattern for the ANL [direct, #immediate] instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Delayed Nondeterminism with Look Ahead visitor pattern for the ANL [direct, #immediate] instruction. . . . . . . . . . . . . . . . . . . . . . . The Delayed Nondeterminism visitor pattern for the ADDC [A, R0] instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Nondeterministic Program Status Word visitor pattern for the ADDC [A, R0] instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Source code used for CFG building. . . . . . . . . . RDA example code. . . . . . . . . . . . . . . . . . RDA example code. . . . . . . . . . . . . . . . . . LVA example code (cf. [81]). . . . . . . . . . . . . . Code sharing within the program memory. . . . . . The Action List Builder visitor pattern for the ADDC Example assembly code. . . . . . . . . . . . . . . . Intel MCS-51 assembly snippet. . . . . . . . . . . . C source code containing switch statement. . . . . Switch Statement Assembler code snippet. . . . . . Program memory content. . . . . . . . . . . . . . . Entry addresses for called functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [A, direct] instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 33 33 37 37 39 40 44 46 47 48 54 57 63 69 70 71 71 72

5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 6.1 6.2 6.3

Ringbuer C code macro. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 The erroneous receiver state machine implementation. . . . . . . . . . . . . 102 The revised receiver state machine implementation. . . . . . . . . . . . . . . 103

125

126

List of Abbreviations
ACM ASCII ASIC BDD CFA CFG CISC COTS CPU CTL CTL* DND DNDlA DVR ECM FIFO FPGA FSM GNU GUI IC IE IFA IP IRAM ISR LED LTL LVA ND NDPSW PC PDAG PLC POR PROMELA PR PSW RAM RBA Association for Computing Machinery American Standard Code for Information Interchange Application Specic Integrated Circuit Binary Decision Diagrams Control Flow Analysis Control Flow Graph Complex Instruction Set Computer Commercial O The Shelf Central Processing Unit Computational Tree Logic Computational Tree Logic* Delayed Nondeterminism Delayed Nondeterminism with Look Ahead Dead Variable Reduction Electronic Control Module First In First Out Field Programmable Gate Array Finite State Machine GNU is not Unix Graphical User Interface Integrated Circuit Interrupt Enable Interrupt Flag Analysis Intellectual Property Internal Random Access Memory Interrupt Service Routine Light Emitting Diode Linear Temporal Logic Live Variable Analysis Nondeterministic Nondeterministic Program Status Word Program Counter Propositional Directed Acyclic Graph Programmable Logic Controller Partial Order Reduction Process or Protocol Meta Language Path Reduction Program Status Word Random Access Memory Register Bank Analysis

127

RDA RISC ROM RPM RTL SA SDCC SFR SIES STE UART UDE VHDL

Reaching Denition Analysis Reduced Instruction Set Computer Read Only Memory Revolutions Per Minute Register Transfer Level Stack Analysis Small Device C Compiler Special Function Register Symposium on Industrial Embedded Systems Symbolic Trajectory Evaluation Universal Asynchronous Receiver Transmitter User Dened Environment Very High Speed Integrated Circuit Hardware Description Language

128