0% found this document useful (0 votes)
23 views23 pages

Choi 2012

Uploaded by

embedded banvien
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views23 pages

Choi 2012

Uploaded by

embedded banvien
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

SOFTWARE TESTING, VERIFICATION AND RELIABILITY

Softw. Test. Verif. Reliab. (2012)


Published online in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/stvr.1482

Model checking Trampoline OS: a case study on safety analysis


for automotive software‡

Yunja Choi* ,†

School of Computer Science and Engineering, Kyungpook National University, Daegu, Korea

SUMMARY
Model checking is an effective technique used to identify subtle problems in software safety using a
comprehensive search algorithm. However, this comprehensiveness requires a large number of resources
and is often too expensive to be applied in practice. This work strives to find a practical solution to model-
checking automotive operating systems for the purpose of safety analysis, with minimum requirements and a
systematic engineering approach for applying the technique in practice. The paper presents methods
for converting the Trampoline kernel code into formal models for the model checker SPIN, a series
of experiments using an incremental verification approach, and the use of embedded C constructs for
performance improvement. The conversion methods include functional modularization and treatment for
hardware-dependent code, such as memory access for context switching. The incremental verification
approach aims at increasing the level of confidence in the verification even when comprehensiveness cannot
be provided because of the limitations of the hardware resource. We also report on potential safety issues
found in the Trampoline operating system during the experiments and present experimental evidence of
the performance improvement using the embedded C constructs in SPIN. Copyright © 2012 John Wiley &
Sons, Ltd.

Received 24 January 2011; Revised 30 July 2012; Accepted 30 July 2012

KEY WORDS: model checking; Trampoline operating system; safety analysis; O SEK /V DX; S PIN

1. INTRODUCTION

The operating system is the core part of automotive control software; its malfunction can cause crit-
ical errors in the automotive system, which in turn may result in loss of lives and assets. Much effort
has been spent on developing a standard domain-specific development framework in automotive
software [2, 3] to support a systematic and cost-effective safety analysis/assurance method.
So far, safety analysis for such systems is typically applied at the system level [4, 5] or at the
small-scale source code level [6–8], separately with different kinds of focuses. Although interna-
tional standards for the safe development of electronic/electrical devices, such as IEC 61508 and
ISO 26262, recommend formal verification methods as a safety verification technique, practical
experiences with processes or methods for applying formal methods in this domain are still rare,
with little related literature on this matter [9–12]. In fact, most existing work is focused on a certain
aspect of an operating system, such as the scheduling algorithm and timing analysis, and requires
extensive human expertise for effective verification, which is application dependent.
This work studies how automated formal verification techniques, such as model checking, can
be systematically and efficiently used for the safety analysis of an automotive operating system

*Correspondence to: Yunja Choi, School of Computer Science and Engineering, Kyungpook National University,
Daegu, Korea.
† E-mail: [email protected]
‡ This is an extended version of [1] and [40].

Copyright © 2012 John Wiley & Sons, Ltd.


Y. CHOI

under two basic premises: (i) The analyst does not necessarily have knowledge about the details
of implementation nor extensive knowledge regarding a specific verification tool, and (ii) general
safety properties for automotive software are the target of the verification. With these premises,
theorem proving or aggressive abstraction techniques for model checking, which require extensive
domain knowledge during the verification process, would not be applicable. The aim of this work
is to assess the applicability of model checking in a practical setting by using only a systematic
engineering approach. The study is conducted on the Trampoline [13] operating system, which is an
open source operating system written in C and which is based on the OSEK/VDX [14] international
standard for automotive real-time operating systems.
Because the safety of an operating system cannot be addressed without considering how it is used
at the application level as well as at the system level, we take the system-level safety requirements
and the functional requirements/constraints specified in the OSEK/VDX standard into account when
we build the verification model from the kernel code: (i) The safety properties are identified from
the automotive system level and then elaborated in terms of the Trampoline kernel code, and (ii) the
functional requirements and constraints in the OSEK/VDX standard are imposed on the task model,
which is a unit of an application program.
The kernel code itself is faithfully converted into a formal model in P ROMELA, the modelling
language of S PIN [15], using existing C-to-P ROMELA conversion methods [16]. However, methods
for functional modularization, for task modelling to simulate the arbitrary behaviour of a task and
for the context-switching mechanism are newly introduced. The approach is property based because
the model conversion process selectively extracts only those functions that have dependency rela-
tionships with respect to the safety properties. For this, we have developed a property-based code
extractor on top of the static analysis tool Understand [17].
Five representative safety properties are verified or refuted using the model checker S PIN in three-
step approaches. The initial light-weight verification focuses on error finding by using a generic task
model with arbitrary behaviour. After identifying potential safety issues and analyzing false nega-
tives, the task model is further constrained to exclude illegal behaviours against the standard. The
second step incrementally increases the complexity of the task model and analyses the scalability
of model checking. Further abstraction is performed in the third step by using the embedded C
constructs in P ROMELA to reduce verification cost and improve model checking performance.
This approach enables us to identify a potential safety gap in the kernel code as well as several
subtle issues that are difficult to identify using typical dynamic testing or theorem proving. The
identified safety problem is confirmed by run-time testing using the counterexample sequence gen-
erated from the S PIN model checker as a test scenario. As expected, the model checking cannot scale
with an indefinite number of system calls. Nevertheless, we anticipate that the typical complexity
of a task with respect to its number of system calls is rather small in automotive electronic control
units (ECUs), and thus, the comprehensive verification result of a fixed number (15 in this case) of
arbitrary API calls per task would be sufficient for providing confidence in the system.
The major contribution of this work can be summarized as follows:
1. A systematic model construction method together with a verification process is provided for
model checking automotive operating system kernels on single-processor ECUs.
2. Safety analysis using model checking is demonstrated on the Trampoline operating system.
This is the first extensive case study on safety analysis in general using model checking for an
automotive operating system.
3. An incremental verification process and the new use of embedded C constructs in P ROMELA
are suggested to achieve better performance in practice.
The remainder of this paper is organized as follows: Section 2 provides some background knowl-
edge for this work, including the safety properties in question, major requirements specified in
the OSEK/VDX standards and the model checking process for the Trampoline operating system.
Section 3 introduces the conversion and modelling method for constructing a formal specification
from the Trampoline kernel code. The result of the first-step verification is presented in Section 4,
followed by the result of incremental verification in Section 5. Section 6 suggests an approach
and experimental result for further improving model checking performance. We conclude with a

Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
MODEL CHECKING TRAMPOLINE OS: A CASE STUDY

summary and future work in Section 9, following a survey of related work (Section 7) and discussion
of potential issues (Section 8).

2. SAFETY REQUIREMENTS OF OSEK/VDX

Because the Trampoline OS follows the OSEK/VDX standard, the safety requirements need to be
analysed under the requirements and constraints imposed by the standard.

2.1. OSEK/VDX requirements


OSEK/VDX [14] is an international standard specialized for automotive control systems. It is
designed for stringent hardware resource constraints, removing unnecessary complexities or unde-
sired behaviour because safety-critical systems such as automotive vehicles cannot afford such
complexities. For example, it does not allow dynamic memory allocation, circular waiting for
resources and multi-processing. The following are some of the core requirements and constraints
from the standard:
1. Dynamic memory allocation is not allowed; for example, all tasks are required to allocate
memory at system start-up time.
2. The maximum number of tasks per task type and the maximum number of activated tasks per
task type are to be statically assigned and checked for each task activation.
3. The priority of a task must not change at run time except when applying the priority ceiling
protocol (PCP).
4. Only one task can be in execution at any given time, that is, multi-processing is not allowed.
5. Task scheduling is based on a prioritized multi-level FIFO algorithm.
6. Resource de-allocation is based on an LIFO algorithm.
These restrictions seem quite severe for a generic operating system but are reasonable for control-
ling an ECU of an automobile; typically, an automobile consists of up to a hundred of such ECUs,
and the memory and energy requirements are thus quite stringent.
OSEK/VDX defines task models for user-defined tasks, which are the basic building blocks of an
application program. A task interacts with the operating system through system calls. OSEK/VDX
explicitly defines a total of 26 such APIs. Although it does not prevent users from modelling tasks
with an unsafe sequence of system calls, it provides an error-checking mechanism and a list of error
codes to prevent illegal usage of system calls. For example, if a task activates another task more
often than the maximum activation limit of the task, the system call ActivateTask is supposed to
return an error code 4. In this way, illegal (unsafe) task design can be detected at run time. This
means that some of the safety issues may need to be rephrased from Bad things never happen to If
a bad thing happens, the corresponding error code shall be delivered.

2.2. Safety requirements


Safety requirements for automotive operating systems are closely related to the safety of applica-
tions working on top of the operating system. Such safety requirements are analysed in a top–down
manner from the safety requirements of the overall automotive system, such as An automobile
control system shall change the direction of the wheels as intended by the driver. Hardware faults,
computational errors, errors in determining vehicle status, failures in protecting critical sections and
errors in task sequencing are some examples that may cause the vehicle control system to fail to
change the direction of the wheels as intended by the driver. Each such failure scenario is further
analysed using Software Fault Tree Analysis (SFTA) [18] to identify software-level safety proper-
ties such as Tasks and interrupt service routines shall not terminate while occupying resources and
Tasks shall not loop indefinitely while occupying resources.
Figure 1 shows an example of SFTA for automotive operating systems. Starting from a possible
hazard, for example, a delayed change of direction, the SFTA analyses faults that may lead to the
hazard. Each fault is again analysed in a top–down manner to identify its possible cause. The left
side of Figure 1 shows the first three levels of SFTA from the hazard A. Delayed change of direction,

Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
Y. CHOI

A.
A.1.3
Delayed change Delay in
of direction resource
allocation

A.1 A.2 A.3 A.4


Violation of real time Delay due to
Delay in Excessive A.1.3.1 A.1.3.2
requirements of a memory
computation error-handling management Requested resource Low priority of the
Task
is not available requesting Task/ISR

A.1.1 A.1.3 A.1.4 A.1.3.1.1 A.1.3.1.2


A.1.2 A.1.3.2.1 A.1.3.2.2
Assigning low priority Delay in Context The task owning The task owning
Delay in interrupt Static low Dynamically lowering
on a time-critical resource Switching the resources is in the resource is in
Task response time waiting state ready state priority the priority in runtime
allocation overhead

Figure 1. A part of SFTA for automotive operating systems.

and the right side of the figure shows the lower part of the SFTA from A.1.3. Safety properties are
identified from the leaf component of the fault tree. For example, the property Tasks shall not wait
for events while occupying resources is identified from component A.1.3.1.1. We have identified 56
safety properties in this way [19]; five representative safety properties are given as follows:
- SR1. Tasks and ISRs shall not terminate while occupying resources.
- SR2. Tasks shall not wait for events while occupying resources.
- SR3. Tasks shall not wait for events indefinitely.
- SR4. Any activated tasks shall be executed in the end.
- SR5. A task with higher static priority always starts its execution earlier than a task with lower
static priority.
SR1 and SR2 are meant to eliminate the possibility of process deadlocks because of circular
waiting of resources or indefinite waiting for a resource occupied by other tasks. SR1 is espe-
cially required because the operating system based on OSEK/VDX does not automatically reclaim
allocated resources even after task termination. SR3 and SR4 are there to ensure the absence of
starvation either by mismatched event sequences, mistakenly designed task priorities or process
deadlocks. SR5 is one of the safety properties identified from the SFTA A.1.3.2.2 to ensure that
there is no dynamic lowering of the priority that may unexpectedly change intended execution order.
Unlike functional requirements, the verification of safety requirements mostly requires showing
the absence of behaviour rather than the existence of behaviour, and therefore, typical scenario-
based testing approaches are not sufficient for this purpose. That is why most existing approaches,
including safety standards, recommend formal verification as an alternative and necessary solution.

2.3. Model checking Trampoline


Model checking [20] is an automated formal verification technique based on an exhaustive search
of the system state space. It requires the target system to be formally modeled and the properties in
question to be formalized in logical terms. This work uses the S PIN model checker [15] because it is
one of the most frequently used model checkers with extensive experience and support from which
we take useful hints for this case study [10, 16, 21].
Figure 2 illustrates the overall verification process using model checking; the initial step consists
of two manual activities, construction of the model from the Trampoline kernel code and identi-
fication of the safety properties from the software fault tree analysis. Because the verification of
the Trampoline operating system requires formal modelling of the kernel code itself as well as
modelling of the user task, the P ROMELA model includes a system model constructed from the
Trampoline kernel and a generic task model for user tasks constrained by the OSEK/VDX standard.
The P ROMELA model is validated using the S PIN simulator via random simulation and is then
verified with respect to each safety property by using the S PIN model checker. In this step, light-
weight model checking is applied using limited resources, for example, 4 Gbytes of system memory.
The process ends if all the properties are verified in this step.

Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
MODEL CHECKING TRAMPOLINE OS: A CASE STUDY

OSEK/VDX
constraints

Trampoline PROMELA Safety Software


kernel model property Fault Tree

Light-weight
Model
model Verified? YES end
Simulation
checking

No
Counter
Run-time Test Incremental
example YES Refuted? No
test scenario verification
analysis

Figure 2. Model checking process.

Each refuted safety property is analysed via the counterexample replay facility provided by the
simulator. If it turns out that the counterexample is a false alarm, because of mistakes in the model,
for example, then the correction is made to the model directly and re-verification is performed.
Otherwise, we generate a test scenario from the counterexample and perform run-time testing of the
Trampoline code on Linux to confirm that the counterexample does, in fact, reflect an actual prop-
erty violation. For those properties that are neither verified nor refuted, because of the limitation of
the resource, the process goes to the second step. In this step, the verification is performed iteratively
using larger resources by constraining verification conditions and relaxing them incrementally. The
incremental verification process includes the application of embedded C constructs in P ROMELA to
improve scalability as the third step of the verification process.
The first step of the verification was performed on a PC with Pentium III CPU and 4 Gbytes
of memory for the quick and easy identification of problems and false alarms. After addressing
issues identified from the initial verification, more extensive verification was performed on a SUN
workstation with 30 Gbytes of memory. S PIN version 6.0.1 and its graphical run-time environment
iSpin 1.0.0 were used for the verification.

3. CONVERSION AND MODELLING

A formal model of the Trampoline OS was constructed using one-to-one translation from C
constructs to P ROMELA constructs by using existing approaches [22, 23]. This section explains
the translation and modelling approaches unique to our work, while omitting details of conventional
translation rules.

3.1. Overall approach


The Trampoline operating system kernel consists of 174 functions, which comprise a total of 4530
lines of code. However, many of the functions act as wrappers, abstracting details on which imple-
mentation function is called inside, or as tracing code for debugging. This also includes platform-
specific functions for hardware access and functions for emulating environments. Because not all
of them are related to the core service of the operating system, we first identify the core functions
contributing to the safety properties in question with the following guidelines:

1. Eliminate wrapper functions and convert only the actual code.


2. Eliminate tracing functions for debugging.
3. Abstract hardware-dependent code.
4. Eliminate emulating functions.

Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
Y. CHOI

We first identify core functions that have a dependency relationship with major operating system
APIs. The core functions are classified by their service types and are grouped into modules accord-
ing to the classification. Each module and its member functions are converted into corresponding
P ROMELA constructs.

3.2. Extracting core functions and modularization


We have identified eight out of 26 APIs as being closely relevant to the safety properties
in question—ActivateTask, TerminiateTask, ChainTask, Schedule, GetResource, ReleaseResource,
SetEvent and WaitEvent. The identification of APIs related to the safety properties is based on the
OSEK/VDX specification explaining the functionality of the APIs. For the most general cases, we
assume that all the interrupts are enabled and the handling of alarms is static.
A call graph is generated for each identified API by using the Understand code analyser [17],
through which its dependent core functions are identified manually and extracted automatically. A
code extractor is implemented on top of Understand for this purpose.
Figure 3(a) is an example of a functional call graph identified from a core API, ActivateTask,
where the function ActivateTask is defined as returning the result of tpl_activate_task_service. The
identified core functions from the call graph are highlighted with a dark box. For example, the
function tpl_call_error_hook is a wrapper function that simply calls the ErrorHook function after
checking that the function is not called recursively. The function tpl_switch_context is a hardware-
dependent function accessing physical memory locations. As explained in a later section, we abstract
the hardware-dependent context switch mechanism at the software level by introducing explicit
switching points in the task model. Table I summarizes the number of functions eliminated from the
kernel modelling. As a result, a total of 40 functions are modeled out of 174 functions.

tpl_put_new_proc

tpl_get_task_lock
PostTaskHook

tpl_get_interrupt_lock_status
tpl_put_preempted_proc proc_service
resource scheduler
tpl_activate_task
tpl_get_proc start_schedule Init_proc
get schedule_from_running put_new_proc
tpl_activate_task_service tpl_schedule_from_running release Schedule_from_waiting get_proc
tpl_init_proc Schedule_from_dying put_preempted_proc

tpl_switch_context
tpl_get_internal_resource event task task_service
type
tpl_call_error_hook
terminate_task
PreTaskHook wait chain_task
set activate_task
tpl_release_task_lock clear
get
ErrorHook

(a) A function call graph (b) The proposed modular structure


of major functions

Figure 3. Modularizing Trampoline functions.

Table I. Categorizing eliminated functions.


Category Number
Wrapper 26
Hardware-dependent 16
For tracing and debugging 26
For interfacing with emulator 6
For error hooks 11
Core functions 89
Functions independent of the eight APIs 49
Extracted 40

Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
MODEL CHECKING TRAMPOLINE OS: A CASE STUDY

Table II. A mapping for conversion.


Kernel constructs Promela constructs
A module Proctype
A member function of a module A composite state under a label
A common function An inline function

The core functions are collected and classified into six categories: task service, resource man-
agement, event management, scheduler, process service and common service functions. Figure 3(b)
illustrates the modular structure on the basis of the categorization. For example, tpl_activate_task
identified from the ActivateTask API becomes a member function of task_service. tpl_init_proc,
tpl_put_new_proc, tpl_get_proc and tpl_put_preempted_proc are members of proc_service. On the
other hand, tpl_get_internal_resource belongs to the module for common functions. In the figure,
the task is not a part of the operating system kernel but acts as a user of the operating system. We
have two types of tasks, basic and extended, as defined in OSEK/VDX. Information on individual
tasks, such as task id, priority and resources, is maintained in separate data structures as will be
explained in Section 3.4. The module for common functions, which is used by all the modules, is
not included in the figure.

3.3. Conversion of modules


There are three possible ways of modelling a C function in P ROMELA; it can be modeled (i) as a
synchronized communicating process, proctype, with message passing, (ii) as an inline function and
(iii) as an embedded C code that is external to the model. The third case does not contribute to the
model’s state space during the model checking process but is executed separately to provide only the
function values to the model. Although the third method may be the most effective one in reducing
verification resources, it must be used with care because it may merge several transitions into one,
possibly losing important behaviour accidently. Therefore, we take a mixture of the first and the
second methods, whereas the third approach is selectively applied later in the third verification step
to improve performance.
Each module is converted into a synchronized communicating process, proctype, where its mem-
ber functions are translated into a collection of labelled transition systems with a common initial
state. The common service functions are converted into inline functions for reasons of simplicity.
Table II summarizes the mapping between modules and P ROMELA constructs.
For example, the following is a fragment of the Promela model converted from the module
scheduler.
1: proctype scheduler(){
2: // declaration of local variables
...
3: get_message: // for processing function calls
4: if
5: :: mch??start,origin -> goto start_schedule;
6: :: mch??Schedule,origin -> goto schedule_from_running;
7: :: mch??dying,origin -> goto schedule_from_dying;
8: :: mch??waiting,origin -> goto schedule_from_waiting;
9: fi;
10:
11: start_schedule:
12: // perform tpl_start_schedule function
...
13: tpl_init_proc(first_proc);
14: tpl_kern.running.state = 2;
...
15: mch!ret,origin;
16: goto get_message;

Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
Y. CHOI

17:
18: schedule_from_running:
19: // perform tpl_schedule_from_running
...
20: mch!ret,origin;
21: goto get_message;
22:
23: schedule_from_dying:
24: ...
25: }
The module scheduler is modeled as a communicating process, which is defined as proctype in
P ROMELA and is initially in the get_message state. As a function call is received through a message
channel mch, it jumps to the corresponding label, performs the corresponding functions, returns
a message to the caller and jumps back to the initial state. For example, if the start message is
received (line 5), it jumps to the start_schedule label (line 11), where it starts performing the func-
tion tpl_start_schedule. After finishing the corresponding function, it returns a return message to
the caller (line 15) and jumps back to its initial state (line 16). Each proctype can be instantiated as
an independent process; these processes are synchronized with each other through message passing.
Functions often used by other functions are categorized as common functions and modeled using
the inline construct in P ROMELA. The following is an example for the function tpl_init_proc, which
is used in the function tpl_start_schedule (line 13).
inline tpl_init_proc(proc_id){
// assign static values to the dynamic process table
tpl_dyn_proc_table[proc_id].ppriority =
tpl_stat_proc_table[proc_id].base_priority;
...
// initialize process context by calling another inline function
tpl_init_context(proc_id);
}
In this way, the user-level interactions remain traceable through message-passing sequences,
whereas verification costs are reduced by minimizing the number of communicating processes.
The P ROMELA model translated from the core kernel code includes a minimum of six commu-
nicating processes and 18 inline functions comprising a total of 1500 lines of code. The number of
communicating processes may increase depending on the number of task types.

3.4. Conversion of data structures and variables


The Trampoline kernel maintains (i) information on the currently running and the previously exe-
cuted task objects, including both static and dynamic descriptions of tasks, and (ii) a service call
description that records service id, resource id, task id, event masks and so on. As illustrated
in Figure 4, the kernel receives a service call from the task in the process whose static/dynamic
task information is recorded in the kernel process information block. The kernel process informa-
tion block is used to refer to the static/dynamic description of the process and is used to perform
rescheduling, put the current task into the ready queue or obtain and start a task from the ready
queue. It may also be used to change the priority of the currently running task if the service call is
to allocate a resource whose ceiling priority is greater than the task’s current priority.
Those data structures are faithfully converted into P ROMELA global data types after replacing
pointers with array indices and pointer assignments with the deep copy of the corresponding arrays.
Primitive variable types are converted into the corresponding P ROMELA basic types such as bit,
byte, int and unsigned. For example in Table III, the data type tpl_priority_level in Trampoline is
converted into P ROMELA data types by (i) changing the pointer type into a fixed-sized array and (ii)
replacing the u8 type (unsigned character) with the byt e type. Because the pointer type is converted
into a fixed-sized array, an assignment of T PL _P RIORITY _L EVEL type variables requires copying
all the values of the array elements. We note that the data type for fifo is changed from tpl_proc_id,
which is a signed character, to TASK _I NFO, which is a struct type composed of a process id and

Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
MODEL CHECKING TRAMPOLINE OS: A CASE STUDY

Static process Internal resource


Static process table description description Kernel process infomation
process id ceiling priority s_old Static process
s_running description
base priority owner s previous
max. activate priority old Dynamic process
count taken running description
type next resource running_id
need_switch
internal resources

Dynamic process Resource list


Dynamic process table description Priority change,
resource preempt, start
resources
activate count next resource kernel/scheduler task in process
priority
Service call
state
Preemptive scheduling,
FIFO scheduling,
put task in the queue,
Service call decription get a task from queue
Resource table
Resource
service id

FIFO queue
ceiling priority
owner s previous resource id/ task id
id
priority event_mask
owner id alarm base priority

...
FIFO ready list

Figure 4. Major data structures in the Trampoline kernel.

Table III. An example: data conversion.


Trampoline data type Promela data type
typedef struct { typedef TPL_PRIORITY_LEVEL{
tpl_proc_id * fifo TASK_INFO fifo[N];
u8 size; byte size;
} tpl_proirity_level; };
typedef TASK_INFO {
int id;
unsigned switchPoint:3;
};

a switching point. This is an exceptional case; we added the field for storing the context-switching
point to support simulation of context switching. This will be explained further in Sections 3.6
and 3.7.
We did not apply aggressive abstractions on the data structure and global variables for two rea-
sons: First, we did not have a complete understanding of the implementation details. Aggressive
abstraction is effective in reducing verification costs but can be dangerous if performed by some-
one who does not understand the rationale behind the implementation details. Second, aggressive
abstraction makes the counterexample analysis difficult because of the large difference between the
verification model and the actual code.

3.5. A task model for comprehensive scenario generation


As an operating system normally stays idle till a user request is received, behavioural problems in
an operating system can only be identified through user tasks. Therefore, comprehensive modelling
of a user task is also important for the verification of an operating system. We achieve the com-
prehensive modelling of a generic user task by simulating arbitrary deterministic task behaviour
with non-deterministic behaviour. In our approach, the generic task generates all possible interac-
tion scenarios with the Trampoline kernel code in the verification process, which is constrained by
the OSEK/OS specifications but not constrained by a specific user task.

Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
Y. CHOI

OSEK/VDX specifies that a task may have four states: suspended, ready, running and waiting.
A notable fact is that OSEK/VDX requires all tasks to be designed statically and to be loaded in
the memory as the operating system starts up. Therefore, all tasks are in suspended states initially.
We have elaborated the original task model by refining the running state with details of possible
interactions with the Trampoline kernel code.
Figure 5 shows the refined task model for the extended task type; each sub-state in the running
state is labelled with a possible system call that can be performed in the state. Basically, all the
sub-states in the running state are strongly connected with transitions except for some restrictions
imposed by the basic requirements of the OSEK/VDX standard. Examples of such requirements are
as follows:

1. A task must not terminate or chain another task while holding resources, that is, a task cannot
transit to TerminateTask or ChainTask from the getResource state.
2. A task must not be in a waiting state while holding resources, that is, a task cannot transit to
WaitEvent from the getResource state.
3. A terminated or chained task goes into the suspended state.

Figure 5 reflects those constraints in the task transitions; for example, after the getResource API
call, a task transits to sp4, from which a non-deterministic choice to transit to ReleaseResouce,
ActiateTask or SetEvent can be made, but it cannot transit to TerminateTask, ChainTask or
WaitEvent from sp4. Self-transitions are allowed, although not explicitly depicted in the figure,
for each state except for the WaitEvent, ChainTask and TerminateTask states, which are drawn in
dashed lines.
We note that the generic task model excludes only obvious illegal behaviours from a user task
but allows potential design mistakes; for example, the model does not restrict the number of calls to
ActivateTask and allows a task to terminate while holding a resource, such as sp0 ! getResource !
SetEvent ! sp1 ! TerminateTask. The purpose is to construct a generic task that subsumes poten-
tial corner cases in task design while reducing unnecessary complexities, rather than defining a
perfect user task. In this way, we can identify potential issues of the kernel code that may be caused
by illegal usage patterns or bad task design.
We also note that the non-deterministic choice is not only available for the system calls but also for
the values of the parameters for each system call; this may include infeasible selection of parameter
values depending on the context, which matches with our purpose to find corner cases.
The task model for the basic task type is the same as the one for the extended task type, except
that the WaitEvent state and its related transitions are removed.

Schedule TerminateTask
suspended
sp0
ChainTask
activate ActivateTask
start sp1 sp3
ready
switch out SetEvent
switch in

event sp4 ReleaseResource


getResource
waiting resume
WaitEvent
wait sp2
resume H
running

Figure 5. A task model with explicit switching points (extended task).

Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
MODEL CHECKING TRAMPOLINE OS: A CASE STUDY

3.6. Software simulation of context switch


Because Trampoline supports several hardware platforms, it is expensive to model each platform
for the safety analysis of the software part. We abstract the hardware-dependent code to reduce
verification costs and to find platform-independent software problems.
For example, the following is the Trampoline code for a context switch in the POSIX
environment:
void tpl_switch_context(const tpl_context * old_context,
const tpl_context * new_context){
if(NULL==old_context){
_longjmp((*new_context)->current, 1);
}
else if(0==_setjmp((*old_context)->current)){
_longjmp((*new_context)->current, 1);
}
return;
}
The code stores the current context and retrieves the new context from the physical memory when-
ever a task is preempted. We simplified this mechanism by introducing explicit switching points
in the task model and by annotating each task in the kernel process information block with its
switching point.
Figure 5 illustrates the five explicit switching points in the task model, sp0, sp1, sp2, sp3 and
sp4 in the running state. Because each system call is protected by an internal locking mechanism,
we safely assume that context switching occurs only in-between system calls. When context switch-
ing occurs, the switching point is stored in the kernel process information block for the switched
task (Table III). The switched task is rescheduled according to its priority and resumes from the
switching point it has left; the history mark H in the running state means that it remembers the last
active sub-state.
The following is a sample P ROMELA code for the context-switching process: First, it checks
whether the newly activated task has higher priority than the currently running task. If so, it puts the
task into the ready queue and copies its information, including the switching point, into the kernel
process information block. In the code, tpl_kern is the name of the data structure for the kernel
process information block.
if
::tpl_h_prio > tpl_kern.running.ppriority ->
mch!put,schedule_from_running;
param!tpl_kern.running_id,0;

COPY(tpl_kern.old, tpl_kern.running);
COPY(tpl_kern.s_old,tpl_kern.s_running);
:: ...
fi;
The next code shows the way context switching is modeled in P ROMELA: If the kernel determines
that switching is required, it first sends the switch command to the currently running task through
the channel cch, then sends the start command to the highest-priority task in the ready queue and
finally sends the switching point where it can resume execution.
if
::tpl_kern.need_switch != NO_NEED_SWITCH ->
id =tpl_kern.s_old.context;
cch!switch, id;
id = tpl_kern.s_running.context;
cch!start,id;
cch!startAt, tpl_kern.s_running.switchPoint;
:: else ->skip;
fi;

Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
Y. CHOI

Typically, it is not necessary to explicitly model the context-switching behaviour because pro-
cesses in P ROMELA are already designed to arbitrarily interleave with each other, simulating
arbitrary context switching at the statement level. However, arbitrary interleaving includes invalid
behaviours, such as context switching in the middle of executing system calls. We remove such
invalid behaviours by explicitly modelling context switching. This also enables us to simulate
multiple tasks by using one generic task model as explained in the next section.

3.7. Simulation model for multiple task activation


A task is modeled in proctype in P ROMELA, which can be activated multiple times. Multiple acti-
vations of a proctype typically result in the creation of multiple independent copies of a process
with the same behaviour. However, as O SEK does not allow parallel execution of multiple copies of
a task, we allow only one active proctype per task type and simulate multiple tasks/activations by
tracing the execution point of each activated copy.
Consider the three active processes of the same task type in Figure 6(a) when multiple activations
of proctype are allowed. They have exactly the same behaviour but are executed independent of
each other. However, as OSEK does not allow multi-processing, only one of them can execute at
any given time, and thus, these multiple copies can be serialized by keeping track of the execution
points for each of them. For example, the running sequence T2 W A ! T2 W B ! T1 W A ! T1 W B !
T1 W C ! T3 W A ! T3 W B ! T3 W D ! T2 W B ! : : : can be simulated by one active proctype
by recording the context-switching point in the ready queue together with the task id as shown in
Figure 6(b).

1. T2 is activated and starts execution.


2. T1 and T3 are activated and (task_id, switching_point) information for each of them is stored
in the ready queue. By default, the switching point is the initial state.
3. T2 goes to the ready/waiting state at B, and T1 starts execution from switching point A. Now,
the information in the ready queue is f.T3 , A/, .T2 , B/g.
4. T1 goes to the ready/waiting state at C , and T3 starts execution from switching point A. Now,
the information in the ready queue is f.T2 , B/, .T1 , C /g.
5. T3 goes to the ready/waiting state at D, and T2 starts execution from switching point B. Now,
the information in the ready queue is f.T1 , C /, .T3 , D/g.
6. . . .

Figure 6(b) is an illustration of such a simulation. We modified the ready queue in the
P ROMELA model to maintain the information on the switching point for each activation of a
task so that one active proctype simulates multiple processes of the same task type. Consider-
ing that the number of active processes in P ROMELA is a critical factor in verification cost, this
modification, which is intended to explicitly handle the switching points, is a minor sacrifice to
improve scalability.

T1 T2 T3 T Ready queue

A A A A 1. {(T1, A), (T3, A)}


1.T2
2. {(T3, A), (T2, B)}
B B 2.T1 B 3. {(T2, B), (T1, C)}
B
4. {(T1, C), (T3, D)}
...
C D C D C D C D

3.T3
E E E E

(a) Multiple activation of the same task (b) Simulation

Figure 6. Simulation of multiple activations of a task.

Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
MODEL CHECKING TRAMPOLINE OS: A CASE STUDY

4. INITIAL VERIFICATION RESULT

Table IV is a list of the safety properties introduced in Section 2 and their corresponding formal
specification together with the verification results. The formal specification is written in temporal
logic LTL, which is a propositional logic with temporal operators. Given two propositional formulas
x and y, the meaning of each temporal logic operator used in the safety property is as follows:
 Œx : x is always true for all execution paths
 <> x : x is true at some time in future states
 xUy : x is true until y is true
For example, the formal specification for SR1 can be interpreted as For all execution paths, if
TerminateTask or ChainTask is called and the calling task has a resource, then an error will be
set at some time in future states. We note that the LTL formulas are somewhat simplified from the
original version; for example, a variable i is used in the LTL formula of SR3 and SR4 to specify
them generally, but an explicit value is used for actual verification, such as Œ..wait_id DD 0 ! <>
.tpl_kern.running_id DD 0//, for each task identifier i. Also, taskhasResource of SR2 and wait_id
of SR3 are macros representing corresponding values from the Trampoline data structure. SR5 is
specified as SR51 in the initial formal specification but is strengthened to SR52 after a counterex-
ample analysis is performed; SR51 specifies that it is always the case that if taski has lower static
priority than taskj , and if taski and taskj are in the ready state at the same time, then taski does not
get into the running state before taskj unless an error occurs.
We performed initial verification using the kernel model and the generic task model. The result
shows that SR1, SR2 and SR52 were neither verified nor refuted because they ran out of memory.
SR3, SR4 and SR51 were refuted. The following is a detailed analysis of the result.

4.1. A potential safety issue due to under-guarded APIs


The safety property SR3 was refuted in 0.16 s, after searching 29116 states and 34979 transitions,
consuming 9.3 Mbytes of memory. The counterexample task scenario identified by S PIN against the
property is as follows:
1. Task t1 , which is an autostart task with priority 1, starts at system start-up time. It allocates
resource 1 and activates task t2 , releases resource 1 and terminates afterwards.
2. Task t2 has priority 5. Therefore, it preempts t1 as soon as it is activated by t1 . t2 activates task
t3 , waits for event 2 and terminates afterwards.
3. Task t3 has priority 2. Once it is started, it activates task t4 , sets event 0 for t4 and
terminates afterwards.

Table IV. Formal specification for safety properties.


SR Formal specification Result
SR1 Œ..T ermi nat eT ask jj C hai nT ask/
&& t askhasResource ! <> .error// Incomplete
SR2 Œ.W ai tEve nt && t askhasResource ! <> .error// Incomplete
SR3 Œ..wai t _id DD i ! <> .tpl_kern.runni ng_id DD i // Fail
SR4 Œ.dy n_proc_t ableŒi .st at e DD ready
! <> .tpl_kern.runni ng_id DD i // Fail
SR51 Œ..st at _proc_t ableŒi .pri ori ty < st at _proc_t ableŒj .pri ori ty &&
dy n_proc_t ableŒi .st at e DD ready && dy n_proc_t ableŒj .st at e DD ready/
! dy n_proc_t ableŒi .st at e ¤ runni ng U
.dy n_proc_t ableŒj .st at e DD runni ng jj error/ / Fail
SR52 Œ..st at _proc_t ableŒi .pri ori ty < st at _proc_t ableŒj .pri ori ty &&
st at _proc_t ableŒj .pri ori ty > max_cei li ng_pri ori ty &&
dy n_proc_t ableŒi .st at e DD ready && dy n_proc_t ableŒj .st at e DD ready/
! dy n_proc_t ableŒi .st at e ¤ runni ng U
.dy n_proc_t ableŒj .st at e DD runni ng jj error/ / Incomplete

Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
Y. CHOI

Activate t2 Terminate
t1 running ready running suspended
Activate t3 Wait
evt2
t2 suspended running waiting

Activate t4 Set evt0 for t4 Terminate

t3 suspended ready running ready running suspended

Wait evt0
t4 suspended running waiting

Figure 7. A counterexample scenario for SR3.

4. Task t4 has priority 4. Once it is started, it waits for event 0, sets event 2 for t2 and
terminates afterwards.
The scenario looks normal, and it is expected that all the tasks terminate normally. As Figure 7
illustrates, however, the S PIN verifier finds an abnormal behaviour for this task scenario: t4 waits
for event 0 indefinitely even though task t3 sets event 0 for t4 , and thus, t4 cannot run to set event 2
for task t2 , which again makes t2 wait for the event indefinitely. As a result, two of the four tasks
cannot terminate normally. It turns out that the source of the problem is in the encoding and check-
ing mechanism for events in the Trampoline kernel code, as shown in the following code fragment
from the Trampoline kernel.
1:tpl_status tpl_set_event(tpl_task_id task_id, tpl_event_mask in_event){
2: ....
3: if((events->evt_wait & in_event)!=0){
4: ....
5: // wake up and put the waiting process in the ready queue
6: ...
7: }
8: ...
9:}

All events are represented in an 8-bit mask; if a task calls the WaitEvent for event i, then the ith bit
of the event mask is set. When a task sets the event i for the waiting task, it calls tpl_set_event with
the task identifier and the event number. As stated in line 3, it performs a bitwise-and operation of
the event mask and the event number to check that a task is indeed waiting for the event. However,
this encoding and checking mechanism only works correctly when the event number is greater than
0; WaitEvent.0/ does not have any effect on the event mask since the bitwise-and operation of the
event mask and the event number are always equal to 0. In this case, the lines between 4 and 6 are
not executed and thus, cannot wake up the task waiting for event 0.
In fact, according to the OSEK/VDX specifications, the events are supposed to be declared
with names, such as evt1 and evt2, and it is assumed that those names are used when calling the
WaitEvent and SetEvent system services instead of using event numbers, such as WaitEvent.evt1/
and SetEvent.t 2, evt1/. Trampoline internally converts the event names into numbers starting from
1 so that the identified erroneous behaviour cannot be possible. Nevertheless, it is allowed to use
event numbers directly in the user task in Trampoline, that is, we can still code WaitEvent.0/ and
SetEvent.t 2, 0/ without getting warnings or error messages. We anticipate that this is a typical case
of a safety gap; considering that a safety problem is mostly caused by unexpected corner cases,
it is always recommended to safeguard potential issues. For example, this type of counterexample
should be avoided if any one of the following conditions is satisfied:

1. Application programmers follow the assumed method of coding and design.


2. A compiler that checks for the correct use of the SetEvent and the WaitEvent APIs from the
user task is provided.
3. Pre-condition and post-condition for each kernel function are defined and checked.
4. Run-time error handling for invalid input is implemented for each API.

Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
MODEL CHECKING TRAMPOLINE OS: A CASE STUDY

Because we cannot afford safety failures in automotive systems, at least one of the systematic
guarding and checking mechanisms, other than human responsibility, is necessary.

4.2. Potential safety issues due to unsafe task scenario


The safety property SR4 is also refuted, and the counterexample analysis revealed two types of
potential design errors in user tasks: (i) over-activation of a task beyond the maximum activation
limit and (ii) the existence of an infinitely running task with higher priority than the task waiting in
the ready queue. A representative example for each case is given in the following.
 Type 1: Task t1 activates task t0 multiple times over the maximum activation limit when t0
has lower priority than t1 . This causes multiple activation errors, and the system stops without
running t0 .
 Type 2: Let t0 , t1 , t2 and t3 be tasks and r a resource owned by t2 such that Prio.t0 / <
Prio.t1 / < Prio.t2 / < Prio.t3 /, where Prio.t / is the priority of task t . Suppose t0 first runs
and obtains the resource r and then activates t1 and t3 , in that order. Because the OSEK/VDX
standard uses the Priority Ceiling Protocol, the priority of t0 becomes equal to that of t2 , which
owns r. Thus, t1 stays in the ready queue, whereas t3 preempts t0 and gets in the running state.
If t3 activates itself by calling ChainTask.t3 /, then t3 is executed infinitely many times, whereas
t1 is starving.
The Type 1 case can be detected using the error-handling mechanism, and thus, the counterexam-
ple can be considered as a false negative, as SR4 can be rephrased and verified as

SR42 : Œ.dyn_proc_table[i].state D ready ! <> .tpl_kern.running_id = i jj error//.

The Type 2 case is more problematic because the starvation may not be obvious at the task
design level, but no error-handling mechanism is provided to detect such a situation because self-
reactivation is not prohibited by the OSEK/VDX standard. We may be able to avoid the Type 2 case
by prohibiting self-activation using ChainTask or by performing static/dynamic infinite loop detec-
tion. Although it is possible for some applications that infinite reactivation of a task is necessary, we
anticipate that such a case is only exceptional and can by handled by an application-specific way,
that is, by using a deterministic task sequence designed for the specific case.

4.3. Unsatisfied property due to the use of the Priority Ceiling Protocol
The safety property SR5, A task with higher static priority always starts its execution earlier than
a task with lower static priority, formally specified as SR51 , is not satisfied by the Trampoline OS
because of the use of O SEK’s Priority Ceiling Protocol (P CP) [14]. The O SEK P CP is designed to
avoid the problem of priority inversion and deadlocks by statically assigning a ceiling priority for
each resource and temporarily raising the priority of a task while the task allocates a resource whose
ceiling priority is higher than the priority of the task. The following is a counterexample scenario
identified by S PIN:
1. Task t1 , which has static priority 1, runs first and activates task t2 , which has static priority 5.
2. t2 preempts t1 and waits for event evt1.
3. t1 resumes and allocates resource r whose ceiling priority is 6. Then, the priority of t1 is
temporarily promoted to 6.
4. t1 activates task t3 , which has static priority 7, and is preempted. Now, t1 is in the ready state.
5. t3 sets event evt1 for t2 . Now, t2 is in the ready state.
6. t3 terminates. Then, t1 goes to the running state first because its priority is temporarily higher
than that of t2 .
SR52 is a stricter specification of SR5, adding the precondition that the priority of taskj is higher
than the maximum ceiling priority of all resources. The model checker did not find any counterex-
ample for the property within the given resources. This will be discussed in more detail in the
next section.

Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
Y. CHOI

Table V. S PIN verification options and performance.


Option Depth States Transitions Memory (Mbytes) Time (s) H/F
DCOLLAPSE 7160490 4.16e+08 5.78e+08 30926.651 3.34e+04 N/A
DHC4 9494460 6.64e+08 9.2e+08 30959.950 1.93e+04 N/A
DBITSTATE 9999999 1.3e+10 1.96e+10 9725.277 6.43e+05 5.27

4.4. Incomplete verification


Three out of the six properties, SR1, SR2 and SR52 , could be neither verified nor refuted because
of resource limitations. For example, the verification for SR1 quickly ran out of memory on a PC
with 4 Gbyte of memory. Even on the SUN workstation with 30 Gbytes of memory, S PIN reported
out-of-memory; Table V shows the comparative performance data by using different verification
options, DCOLLAPSE, DHC4, and DBITSTATE, for the verification of SR1. The first two
cases ran out of memory. The bitstate-hashing option trades completeness for less memory usage,
but the time required for the verification was too high, over 7 days, whereas the resulting search
coverage, indicated by the hash factor 5.77, was not so high§ .
Although S PIN did not report any safety errors during the search, we could not conclude that
there is no safety error because the search was incomplete mainly because it required more memory
than available.

5. INCREMENTAL VERIFICATION AND PERFORMANCE

Although the first-step verification was successful in finding potential safety issues in the kernel code
as well as in the task design, we still need a better answer for those properties where verification
was incomplete.
The inefficiency of the first-step verification results from two factors: (i) The Trampoline kernel
itself is too large in the statespace, and (ii) the task model used in the first-step verification is too gen-
erous in that it allows an arbitrary number of system calls in a task as well as infinite task sequences.
Because we aim at avoiding aggressive abstractions on the model, we focus on the second factor to
find room to improve performance and efficiency. The following is some observations made.
1. A task normally uses a limited number of system calls in practice.
2. Many counterexamples are caused by atypical infinite task sequencing.
Therefore, the second-step verification puts more constraints on the task model to exclude such
cases and tries to obtain a meaningful measure for comprehensive verification. The following is the
constraints imposed on the initial generic task model:
C1. The number of system calls is limited per task, producing more conservative scenarios but
still mimicking an arbitrary behaviour of a task.
C2. Over-activation of a task is prohibited.
C3. Self-activation using ChainTask, that is, calling ChainTask.t / from task t , is prohibited.
C1 is imposed on the task model by inserting a counter cnt_APIcalls for API calls, incrementing
the counter for each API call and imposing a guarding condition cnt_APIcalls < CallLimit for each
transition. Note that even though the number of system calls is limited, non-determinism is still
retained and so is the arbitrary behaviour of the task. C 2 is imposed on the task model by insert-
ing a guarding condition active_count < max_activation_count for the transitions going into the
ActivateTask or ChainTask states so that those system calls are called only when the current acti-
vation count is below the threshold. C 2 is not a necessary constraint because the over-activation is
to be caught by the error-handling mechanism in the Trampoline OS, but it helps to reduce unin-
teresting internal transitions and thus, reduce verification costs. The problem with atypical infinite

§
A hash factor between 10 and 100 has an expected coverage from 84% to 98%, and S PIN recommends trusting the
result of a bitstate search only if the hash factor is greater than 100 [24].

Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
MODEL CHECKING TRAMPOLINE OS: A CASE STUDY

Table VI. Performance of incremental verification.


APIs Depth States Transitions Memory (Mbytes) Time (s)
3 351359 8.00e+06 1.26e+07 840.100 253
5 1041301 4.20e+07 6.57e+07 2371.397 1.28e+03
7 2180863 1.12e+08 1.75e+08 6017.955 3.42e+03
9 3396568 1.92e+08 2.97e+08 9059.460 6.23e+03
11 4934305 3.02e+08 4.67e+08 17213.662 9.90e+03
13 6794956 4.40e+08 6.79e+08 22467.666 1.39e+04
15 9361.855 6.06e+08 9.36e+08 28791.592 1.83e+04

task sequencing is addressed by limiting the number of API calls and by imposing C 3. We note that
infinite task sequencing is still allowed by activating each other, but an infinite loop within a task
and infinitely self-activating tasks are not allowed under these constraints.
With this more constrained task model, all three properties that were incomplete in the initial
verification, SR1, SR2 and SR52 , are verified within the given resources. SR42 , which was refuted
because of the infinite task sequencing, is also verified.
Table VI shows the performance of model checking the safety property SR2 as the number of
system calls increases from 3 to 15. The columns from left to right represent the number of API
calls, the depth of the verification search, the number of states explored, the number of transitions,
the amount of memory used in Megabytes and the time required to finish verification in seconds.
The DHC4 option is used for the experiment. We note that around 29 Gbytes of memory are
consumed for comprehensive verification, with a maximum of 15 system calls per task. That is, 15
system calls per task are the limit of comprehensive verification in the second verification step.

6. PERFORMANCE IMPROVEMENT USING EMBEDDED C

The third verification step applies the embedded C constructs in P ROMELA to improve model
checking performance. The embedded C constructs were introduced to facilitate model-driven
verification of software systems, making it possible to directly embed implementation code into
P ROMELA models [25]. This section explains how embedded C constructs are applied to exist-
ing Trampoline models and compares model checking performance before and after applying
embedded C constructs.
As we already have Promela models for the Trampoline kernel, the use of embedded C con-
structs is not for embedding C code. Instead, we performed partial conversion of existing Trampoline
models into embedded versions as follows:

1. Convert the atomic sequence of statements into c_code blocks.


2. Embed all global variables referenced/used from the converted c_code blocks into
c_code blocks.
3. Embed all user-defined data types used in the c_code blocks into c_decl declarations.
4. Track each global variable declared in c_code blocks by using the c_track construct.

Figure 8 shows a fragment of the Trampoline model converted from a pure Promela model into a
model with embedded C constructs. The atomic sequence in the inline functions is converted into a
c_code block, the global variable tpl_fifo_rw accessed from the c_code block is declared in a c_code
block and then the user-defined data type TPL_FIFO_STATE is declared in a c_decl block. Finally,
the global variable is traced by using the c_track construct. Each c_code block is invoked during
the model checking process, which is executed separately and returns its computation results. In this
case, the model checker only needs to know the location and the size of the variable computed in
the c_code block, regardless of how it is accessed or computed.
One thing worth noting is that each c_code construct is considered as a single transition and does
not produce intermediate states in the model checking process, no matter how many statements are
embedded inside, whereas each statement in the atomic sequence produces intermediate states in the

Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
Y. CHOI

typedef struct TPL_FIFO_STATE{ c_decl{


unsigned read; typedef struct TPL_FIFO_STATE{
unsigned size; unsigned read;
} TPL_FIFO_STATE; unsigned size;
} TPL_FIFO_STATE;
TPL_FIFO_STATE tpl_fifo_rw[n]; }
inline initialize_tpl_fifo_rw(){ c_code{
atomic{ TPL_FIFO_STATE tpl_fifo_rw[n];
tpl_fifo_rw[0].read = 0; };
tpl_fifo_rw[0].size =0; c_track tpl_fifo_rw
tpl_fifo_rw[1].read =0; sizeof(TPL_FIFO_STATE)*n
tpl_fifo_rw[1].size =0; inline initialize_tpl_fifo_rw(){
c_code{
} tpl_fifo_rw[0].read = 0;
} tpl_fifo_rw[0].size =0;
tpl_fifo_rw[1].read =0;
tpl_fifo_rw[1].size =0;

}
}

Figure 8. Conversion example from the Trampoline OS.

pure P ROMELA model. Considering that the cost for model checking grows linearly with the num-
ber of states and that the number of states tends to grow exponentially during the model checking
process, this can result in a huge performance difference.
Table VII shows the performance data after applying embedded C constructs to the Trampoline
kernel model. We see that both the absolute value of the verification costs and the rate of the cost
increment as the number of API calls increases are greatly reduced. Figure 9 shows the comparative
memory and time requirements for verifying the original Trampoline model and the model with
embedded C; we note that the costs for the model with embedded C increase linearly as the number
of API calls increases, whereas the original model shows an exponential cost increase.

Table VII. Verification performance with embedded C.


APIs Depth States Transitions Memory (Mbytes) Time(s)
3 39556 2.29e+06 2.50e+06 1161.94 202
5 61789 5.68e+06 6.29e+06 1715.01 5.08e+02
7 78724 9.44e+06 1.05e+07 2118.33 8.46e+02
9 88317 1.32e+07 1.48e+07 2372.82 1.19e+03
11 96095 1.70e+07 1.91e+07 2886.24 1.54e+03
13 108300 2.00e+07 2.24e+07 3021.20 1.81e+03
15 121007 2.40e+07 2.69e+07 3284.29 2.17e+03

original original
Mbytes Seconds
embeddedC embeddedC
35,000 20,000
18,000
30,000
16,000
25,000 14,000
12,000
20,000
10,000
15,000 8,000
10,000 6,000
4,000
5,000
2,000
0 0
3 5 7 9 11 13 15 #of APIs 3 5 7 9 11 13 15 #of APIs

(a) Comparison on memory consumption (b) Comparison on verification time

Figure 9. Comparison of verification performance.

Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
MODEL CHECKING TRAMPOLINE OS: A CASE STUDY

We delayed the application of the embedded C constructs to the third step because merging tran-
sitions makes it difficult to analyse counterexamples and simulation results; the original model
is preferred for initial and incremental verification as long as the available resources allow this.
Embedded C is the last choice for better scalability.

7. RELATED WORK

There have been a number of works on the formal verification of operating systems regarding
various problems.
Reference [26] models and verifies the PATHO OS for vehicles by using timed automata. Recent
similar works are presented in [27, 28]. They model an OSEK/VDX application and the core part
of its kernel in timed automata and perform a rigorous timing analysis by using the model checker
U PPAAL. Their modelling and verification focus on non-preemptive scheduling and an analysis of
worst-case response time. It is subject to state space explosion as the number of tasks increases
because each task is explicitly modeled as an independent timed automata.
The authors in [12] suggested a meta-scheduler framework for implementing real-time schedul-
ing algorithms. They presented the design of the meta-scheduler and outlined its implementation.
The meta-scheduler is verified using U PPAAL with respect to correctness, deadlock freedom and
livelock freedom. All these cases are mainly concerned with the abstract modelling of operating
systems but do not deal with implementation nor with the issue of scalability.
Reference [29] is one of the typical approaches that model the memory map as new global vari-
ables in the embedded C source code and use a model checker to verify assertions. This approach
does not take priority-based task scheduling into account. [10] presents a verification result for time
partitioning in the DEOS scheduling kernel by using the S PIN model checker. It is the closest to our
case study in its use of the S PIN model checker and model translation from the kernel code. How-
ever, its verification is focused on one property regarding the scheduling algorithm so that aggressive
abstraction techniques specialized for the property can be applied to effectively avoid the statespace
explosion problem.
The Verisoft project [30] verifies run-time environment layers including OSEKtime and FlexRay,
and application processes using the Isabelle theorem prover [9, 31]. The L4.verified project aims
at providing consistency proofs for different layers of abstractions for the seL4 micro-kernel [11].
It takes the model-driven approach to develop a high-performance, low-complexity micro-kernel
named seL4 from scratch, where the high-level micro-kernel model is specified in Haskell and
is refined down to actual C code. The theorem prover Isabelle/HOL is used to verify the func-
tional correctness of the micro-kernel and the consistency of the inter-layer functionality. However,
the use of a theorem prover requires extensive knowledge about both the technique itself and the
verification domain.
There have been a couple of approaches for model checking embedded software in general. [8]
developed a special-purpose and domain-specific model checker named [mc]square that verifies C
code after it is compiled into Assembler. [7] verifies the embedded C code by using an abstract
memory model and then verifies software and hardware together by using only the return value
from independently running hardware modules for static verification, which is similar, in principle,
to the method supported by embedded C in P ROMELA [25].

8. DISCUSSION

This work demonstrated an application of the model checking technique on the safety analysis of
automotive software by using the Trampoline operating system as a case example. We provided
a conversion approach for the Trampoline kernel written in C into a formal model in P ROMELA,
a generic task-modelling method for performing all possible interactions between tasks and the
operating system, and a modelling approach for simulating a context-switching mechanism on a
single processor. We believe that the suggested approaches are general enough to be applied to
other embedded software running on a single-processor machine. Nevertheless, one can still argue

Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
Y. CHOI

against the effectiveness of using formal methods in safety analysis for embedded software. This
section discusses several related issues on the basis of our experience from this study.

8.1. Why use SPIN?


Many existing approaches choose theorem proving for its thoroughness and soundness [9, 11, 31]
for formal verification. However, the accessibility and usability of theorem proving are known
to be lower than those of other automated verification techniques such as model checking and
dynamic testing.
This work aims at providing a rigorous but efficient analysis technique that can be used by engi-
neers without much knowledge regarding the theory of formal methods. S PIN is equipped with a
visualized simulation and verification tool, which facilitates early assessment of the correctness of
the model and assists in intuitive analysis through a visualized counterexample tracing mechanism.
S PIN has been continuously evolved with enhanced techniques, such as swarm verification and
model extraction [25, 32, 33], and is well supported with extensive experience on software model
checking [10, 21, 34–37]. These are our reasons for choosing S PIN as a primary verification tool.

8.2. Problems with existing C code model checking tools


There are a couple of well-known verification tools for C code. Most notably, C BMC [38],
M ODEX [16] and B LAST [39] have been well accepted and applied in C code verification. One
of the benefits of these tools is that they can be directly applied to the source code without requiring
conversion of the code into formal languages. C BMC and B LAST directly apply model checking
techniques to the C code; M ODEX automatically translates fragments of C code into P ROMELA
so that the modelling process can be eliminated. Nevertheless, this benefit is minimized when it
comes to the verification of embedded software, where specifics of hardware-related controls and
access need to be taken into account. We need to modify the C source code itself to reflect the
hardware environment to apply these language-specific model checking tools [8]. M ODEX works
quite well for translating small-sized C source code into the P ROMELA model, but the translation
becomes error prone when the size of the code gets larger and the code includes complex data struc-
tures. After an initial trial with those C code verifiers, we concluded that domain-specific model
extraction, even though semi-automatic, can be much more efficient.

8.3. Testing versus formal safety analysis


The safety problem identified in the Trampoline OS in this work might, in fact, have been identified
using dynamic testing techniques if test cases were thoroughly defined. For example, we can drive
exhaustive test cases on the basis of all possible combinations of normal/abnormal values for each
argument for each API. However, defining and executing such test cases for all possible combi-
nations are quite costly. Trampoline defines more than 26 basic interfaces for system calls, where
each of them has two arguments with 8-bit numeric types on average; this requires 26  2  28 test
cases for exhaustive testing. Even when we choose only boundary values for the arguments, at least
26  2  3 test cases are required. The possible number of execution sequences for these 26  2  3
test cases would rise to 156 factorial. Thus, model checking is not more expensive than testing when
it comes to safety analysis.

8.4. Benefits of formal models


A notable benefit of having a formal model is that we can pre-check application designs before
starting to code them. Like most other small-sized embedded software, each application is compiled
together with the Trampoline kernel code to generate an executable. This means that even a minor
change in the application program requires recompilation and re-testing of the whole system. With
formal models of the kernel code, we can perform arbitrary task behaviour during the verification
process and freely restrict the task behaviour if a specific task sequence is to be simulated. Our mod-
els are designed in such a way that the generic task model can be simply replaced by a specific task

Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
MODEL CHECKING TRAMPOLINE OS: A CASE STUDY

design without affecting the kernel model so that model checking safety properties for a specific
application is quite straightforward.
One can argue that the model extracted from the source code may miss important errors such
as de-referencing null pointers, accessing out of array bounds, and failures in memory allocation
that might reside in the source code. We suggest handling behavioural safety separately and inde-
pendently from code safety; code safety, which includes the safe use of array indices and pointer
arithmetic, can be treated using static code analysis tools before handling behavioural safety issues.

9. CONCLUSION AND FUTURE WORK

We have presented our experience on model checking the Trampoline operating system for the
purpose of safety analysis. To the best of our knowledge, this is the first extensive analysis of the
Trampoline OS that has found an actual problem in the system.
Our approach can be differentiated from existing approaches in that the conversion and modelling
of Trampoline into P ROMELA faithfully preserve the original code except for some re-structuring
and simplification of hardware-dependent code. This reduces the discrepancy between the original
code and the model, thus minimizing accidental errors that might arise from the modelling pro-
cess and providing straightforward counterexample replay in the actual code. However, this faithful
translation naturally results in high complexity during verification. We anticipate that we will have to
trade off either the accuracy of the kernel model or the generality of its environment for verification
comprehensiveness. Experiments show that constraining the task model, which is the environment of
the operating system, provides comprehensiveness up to a certain point, which is believed sufficient
for automotive E CU controllers.
Although this work shows promising results that model checking may be engineered in such a
way that it can be routinely applied for the safety analysis of automotive operating systems, it also
includes pitfalls and room for improvement. First, the conversion from the Trampoline kernel to
P ROMELA is carried out manually with the aid of a code analysis tool and thus includes poten-
tial human mistakes. To eliminate human mistakes, the manual model construction was thoroughly
tested using the S PIN simulator and model checker, which in fact took more time than model con-
struction itself.¶ The best way to avoid such manual construction and validation costs is to have the
conversion process automated. We plan to develop a domain-specific model extraction tool on the
basis of our experience.
Second, model checking the Trampoline kernel involves several parameters other than the number
of API calls per task, such as the number of activated tasks, the number of maximum activations
per task and the number of maximum resources/events. The model checking experiments in this
work used fixed values for these parameters: four tasks with maximum of two multiple activations
and resources/events per task. More refined experiments with varying parameters are necessary
for a complete analysis. Because we expect that varying such parameters will result in increasing
model checking complexity, future work would involve an investigation into developing a system-
atic method for reducing the complexity of the kernel model itself without requiring knowledge
about implementation details.

ACKNOWLEDGEMENTS
This work largely benefited from prior work carried out by Seunghyun Yoon, Yeonjoon Kim and
Minkyu-Park, who performed SFTA for automotive operating systems in a related project and built the
code extractor on top of the Understand tool. This work was supported by the Engineering Research Cen-
ter of Excellence Program of the Korean Ministry of Education, Science and Technology (MEST)/National
Research Foundation of Korea (NRF), Grant 2011-0000978 and the National Research Foundation of Korea
Grant funded by the Korean Government (2010-0017156).


It took about 2 person-months to construct the initial model and about 3 person-months to validate the model in this
case study.

Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
Y. CHOI

REFERENCES
1. Choi Y. Safety analysis of the Trampoline OS using model checking: an experience report. In Proceedings of 22nd
IEEE International Symposium on Software Reliability Engineering, Hiroshima, Japan, 2011; 200–209.
2. Broy M. Challenges in automotive software engineering. In Proceedings of the 28th International Conference on
Software Engineering, New York, USA, 2006; 33–42.
3. Mossinger
R J. Software in automotive systems. IEEE Software 2010; 27(2):92–94.
4. Lutz RR, Shaw HY. Applying adaptive safety analysis techniques. In 10th International Symposium on Software
Reliability Engineering, Boca Raton, USA, 1999; 42–49.
5. Oh Y, Yoo J, Cha S, Son HS. Software safety analysis of function block diagrams using fault trees. Reliability
Engineering and System Safety 2005; 88(3):215–228.
6. Dingel J, Liang H. Automated comprehensive safety analysis of concurrent programs using Verisoft and TXL. In
ACM/SIGSOFT International Conference on Foundations of Software Engineering, Newport Beach, USA, 2004;
13–22.
7. Cordeiro L, Fischer B, Chen H, Marques-Silva J. Semiformal verification of embedded software in medical
devices considering stringent hardware constraints. In International Conference on Embedded Software and Systems,
HangZhou, China, 2009.
8. Schlich B, Kowalewski S. Model checking C source code for embedded systems. International Journal of Software
Tools and Technology Transfer 2009; 11:187–202.
9. Endres E, Muller
R C, Shadrin A, Tverdyshev S. Towards the formal verification of a distributed real-time automotive
system. In Proceedings of NASA Formal Method Symposium, Washington DC, USA, 2010; 212–216.
10. Penix J, Visser W, Park S, Pasareanu C, Engstrom E, Larson A, Weininger N. Verifying time partitioning in the
DEOS scheduling kernel. Formal Methods in Systems Design Journal 2005; 26(2):103–135.
11. Klein G, Elphinstone K, Heiser G, Andronick J, Cock D, Derrin P, Elkaduwe D, Engelhardt K, Kolanski R,
Norrish M, Sewell T, Tuch H, Winwood S. seL4: formal verification of an OS kernel. Communications of the ACM
2010; 53(6):107–115.
12. Li P, Ravindran B, Suhaib S, Feizabadi S. A formally verified application-level framework for real-time scheduling
on posix real-time operating systems. IEEE Transactions on Software Engineering 2004; 30(9):613–629.
13. Trampoline – opensource RTOS project. http://trampoline.rts-software.org.
14. OSEK/VDX operating system specification 2.2.3. http://portal.osek-vdx.org/files/pdf/specs/os223.pdf.
15. Holzmann GJ. The SPIN Model Checker: Primer and Reference Manual. Addison-Wesley Publishing Company:
Boston, MA, 2003.
16. Holzmann GJ, Ruys TC. Effective bug hunting with Spin and Modex. In Model checking software: The SPIN
Workshop, San Francisco, USA, 2005; 24.
17. Understand: source code analysis and metrics. http://www.scitools.com/.
18. Leveson NG. Safeware: System Safety and Computers. Addison Wesley: Reading, MA, 1995.
19. Choi Y, Yoon S, Kim Y, Lee S. Safety certification for automotive realtime operating systems. Technical Report,
Electronics and Telecommunications Research Institiute, South Korea, 2009.
20. Clarke EM, Grumberg O, Peled D. Model Checking. MIT Press: Cambridge, MA, 1999.
21. Zaks A, Joshi R. Verifying multi-threaded C programs with SPIN. In 15th International Workshop on Model
Checking Software, Los Angeles, USA, 2008; 325–342.
22. Holzmann GJ. Logic verification of ANSI-C code with Spin. In Proc. of the 7th international SPIN Workshop on
Model Checking of Software, Stanford, USA, 2000; 131–147.
23. Holzmann GJ, Smith MH. An automated verification method for distributed systems software based on model
extraction. IEEE Transactions on Software Engineering 2002; 28(4):364–377.
24. Holzmann GJ. An analysis of bitstate hashing. Formal Methods in System Design 1998; 13(3):289–307.
25. Holzmann GJ, Joshi R, Groce A. Model driven code checking. Automated Software Engineering 2008; 15(3–4):
283–297.
26. Balarin F, Petty K, Sangiovanni-Vincenteli AL, Varaiya P. Formal verification of the PATHO real-time operating
system. In Proceedings of the 33rd Conference on Decision and Control, Lake Buena Vista, USA, 1994; 2459–2465.
27. Waszniowski L, Hanzalek K Z. Formal verification on multitasking applications based on timed automata model.
Real-Time Systems 2008; 38:39–65.
28. Waszniowski L, Kr akora
K J, Hanzalek
K Z. Case study on distributed and fault tolerant system modeling based on
timed automata. Journal of Systems and Software 2009; 82(10):1678–1694.
29. Bucur D, Kwiatowska MZ. Poster abstract: software verification for TinyOS. In 9th ACM/IEEE International
Conference on Information Processing in Sensor Networks, Stockholm, Sweden, 2010; 400–401.
30. The Verisoft project homepage. http://www.verisoft.de.
31. der Riden TI, Kanpp S. An approach to the pervasive formal specification and verification of an automotive system.
In Proceedings of the International Workshop on Formal Methods in Industrial Critical Systems, Lisbon, Portugal,
2005; 115–124.
32. Groce A, Joshi R. Extending model checking with dynamic analysis. In Verification, Model Checking, and Abstract
Interpretation, San Francisco, USA, 2008; 142–156.
33. Holzmann GJ, Florian M. Model checking with bounded context switching. Formal Aspects of Computing
2011; 23(3):365–389.

Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
MODEL CHECKING TRAMPOLINE OS: A CASE STUDY

34. Byun Y, Sanders BA, Keum C-S. Design of communication protocols using a message transfer pattern. International
Journal of Commnunication Systems 2005; 15(5):465–485.
35. Dong Y, Du X, Holzmann GJ, Smolka SA. Fighting livelock in the GNU i-protocol: a case study in explicit-state
model checking. International Journal on Software Tools for Technology Transfer 2003; 4:505–528.
36. Choi Y. From NuSMV to SPIN: experiences with model checking flight guidance systems. Formal Methods in
System Design 2007; 30(3):199–216.
37. Kim M, Kim Y, Kim H. A comparative study of software model checkers as unit testing tools: an industrial case
study. IEEE Transactions on Software Engineering 2011; 37(2):146–160.
38. Clarke E, Kroening D, Lerda F. A tool for checking ANSI-C programs. In 10th International Conference on Tools
and Algorithms for the Construction and Analysis of Systems, Barcelona, Spain, 2004; 168–176.
39. Beyer D, Hensinger TA, Jhala R, Majumdar R. The software model checker Blast: applications to software
engineering. International Journal on Software Tools for Technology Transfer 2007; 9(5):505–525.
40. Choi Y. Model Checking an OSEK/VDX-based Operating System for Automobile Safety Analysis, submitted to
IEICE Transactions on Information and Systems (Letter).

Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr

You might also like