Choi 2012
Choi 2012
Yunja Choi* ,†
School of Computer Science and Engineering, Kyungpook National University, Daegu, Korea
SUMMARY
Model checking is an effective technique used to identify subtle problems in software safety using a
comprehensive search algorithm. However, this comprehensiveness requires a large number of resources
and is often too expensive to be applied in practice. This work strives to find a practical solution to model-
checking automotive operating systems for the purpose of safety analysis, with minimum requirements and a
systematic engineering approach for applying the technique in practice. The paper presents methods
for converting the Trampoline kernel code into formal models for the model checker SPIN, a series
of experiments using an incremental verification approach, and the use of embedded C constructs for
performance improvement. The conversion methods include functional modularization and treatment for
hardware-dependent code, such as memory access for context switching. The incremental verification
approach aims at increasing the level of confidence in the verification even when comprehensiveness cannot
be provided because of the limitations of the hardware resource. We also report on potential safety issues
found in the Trampoline operating system during the experiments and present experimental evidence of
the performance improvement using the embedded C constructs in SPIN. Copyright © 2012 John Wiley &
Sons, Ltd.
KEY WORDS: model checking; Trampoline operating system; safety analysis; O SEK /V DX; S PIN
1. INTRODUCTION
The operating system is the core part of automotive control software; its malfunction can cause crit-
ical errors in the automotive system, which in turn may result in loss of lives and assets. Much effort
has been spent on developing a standard domain-specific development framework in automotive
software [2, 3] to support a systematic and cost-effective safety analysis/assurance method.
So far, safety analysis for such systems is typically applied at the system level [4, 5] or at the
small-scale source code level [6–8], separately with different kinds of focuses. Although interna-
tional standards for the safe development of electronic/electrical devices, such as IEC 61508 and
ISO 26262, recommend formal verification methods as a safety verification technique, practical
experiences with processes or methods for applying formal methods in this domain are still rare,
with little related literature on this matter [9–12]. In fact, most existing work is focused on a certain
aspect of an operating system, such as the scheduling algorithm and timing analysis, and requires
extensive human expertise for effective verification, which is application dependent.
This work studies how automated formal verification techniques, such as model checking, can
be systematically and efficiently used for the safety analysis of an automotive operating system
*Correspondence to: Yunja Choi, School of Computer Science and Engineering, Kyungpook National University,
Daegu, Korea.
† E-mail: [email protected]
‡ This is an extended version of [1] and [40].
under two basic premises: (i) The analyst does not necessarily have knowledge about the details
of implementation nor extensive knowledge regarding a specific verification tool, and (ii) general
safety properties for automotive software are the target of the verification. With these premises,
theorem proving or aggressive abstraction techniques for model checking, which require extensive
domain knowledge during the verification process, would not be applicable. The aim of this work
is to assess the applicability of model checking in a practical setting by using only a systematic
engineering approach. The study is conducted on the Trampoline [13] operating system, which is an
open source operating system written in C and which is based on the OSEK/VDX [14] international
standard for automotive real-time operating systems.
Because the safety of an operating system cannot be addressed without considering how it is used
at the application level as well as at the system level, we take the system-level safety requirements
and the functional requirements/constraints specified in the OSEK/VDX standard into account when
we build the verification model from the kernel code: (i) The safety properties are identified from
the automotive system level and then elaborated in terms of the Trampoline kernel code, and (ii) the
functional requirements and constraints in the OSEK/VDX standard are imposed on the task model,
which is a unit of an application program.
The kernel code itself is faithfully converted into a formal model in P ROMELA, the modelling
language of S PIN [15], using existing C-to-P ROMELA conversion methods [16]. However, methods
for functional modularization, for task modelling to simulate the arbitrary behaviour of a task and
for the context-switching mechanism are newly introduced. The approach is property based because
the model conversion process selectively extracts only those functions that have dependency rela-
tionships with respect to the safety properties. For this, we have developed a property-based code
extractor on top of the static analysis tool Understand [17].
Five representative safety properties are verified or refuted using the model checker S PIN in three-
step approaches. The initial light-weight verification focuses on error finding by using a generic task
model with arbitrary behaviour. After identifying potential safety issues and analyzing false nega-
tives, the task model is further constrained to exclude illegal behaviours against the standard. The
second step incrementally increases the complexity of the task model and analyses the scalability
of model checking. Further abstraction is performed in the third step by using the embedded C
constructs in P ROMELA to reduce verification cost and improve model checking performance.
This approach enables us to identify a potential safety gap in the kernel code as well as several
subtle issues that are difficult to identify using typical dynamic testing or theorem proving. The
identified safety problem is confirmed by run-time testing using the counterexample sequence gen-
erated from the S PIN model checker as a test scenario. As expected, the model checking cannot scale
with an indefinite number of system calls. Nevertheless, we anticipate that the typical complexity
of a task with respect to its number of system calls is rather small in automotive electronic control
units (ECUs), and thus, the comprehensive verification result of a fixed number (15 in this case) of
arbitrary API calls per task would be sufficient for providing confidence in the system.
The major contribution of this work can be summarized as follows:
1. A systematic model construction method together with a verification process is provided for
model checking automotive operating system kernels on single-processor ECUs.
2. Safety analysis using model checking is demonstrated on the Trampoline operating system.
This is the first extensive case study on safety analysis in general using model checking for an
automotive operating system.
3. An incremental verification process and the new use of embedded C constructs in P ROMELA
are suggested to achieve better performance in practice.
The remainder of this paper is organized as follows: Section 2 provides some background knowl-
edge for this work, including the safety properties in question, major requirements specified in
the OSEK/VDX standards and the model checking process for the Trampoline operating system.
Section 3 introduces the conversion and modelling method for constructing a formal specification
from the Trampoline kernel code. The result of the first-step verification is presented in Section 4,
followed by the result of incremental verification in Section 5. Section 6 suggests an approach
and experimental result for further improving model checking performance. We conclude with a
Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
MODEL CHECKING TRAMPOLINE OS: A CASE STUDY
summary and future work in Section 9, following a survey of related work (Section 7) and discussion
of potential issues (Section 8).
Because the Trampoline OS follows the OSEK/VDX standard, the safety requirements need to be
analysed under the requirements and constraints imposed by the standard.
Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
Y. CHOI
A.
A.1.3
Delayed change Delay in
of direction resource
allocation
and the right side of the figure shows the lower part of the SFTA from A.1.3. Safety properties are
identified from the leaf component of the fault tree. For example, the property Tasks shall not wait
for events while occupying resources is identified from component A.1.3.1.1. We have identified 56
safety properties in this way [19]; five representative safety properties are given as follows:
- SR1. Tasks and ISRs shall not terminate while occupying resources.
- SR2. Tasks shall not wait for events while occupying resources.
- SR3. Tasks shall not wait for events indefinitely.
- SR4. Any activated tasks shall be executed in the end.
- SR5. A task with higher static priority always starts its execution earlier than a task with lower
static priority.
SR1 and SR2 are meant to eliminate the possibility of process deadlocks because of circular
waiting of resources or indefinite waiting for a resource occupied by other tasks. SR1 is espe-
cially required because the operating system based on OSEK/VDX does not automatically reclaim
allocated resources even after task termination. SR3 and SR4 are there to ensure the absence of
starvation either by mismatched event sequences, mistakenly designed task priorities or process
deadlocks. SR5 is one of the safety properties identified from the SFTA A.1.3.2.2 to ensure that
there is no dynamic lowering of the priority that may unexpectedly change intended execution order.
Unlike functional requirements, the verification of safety requirements mostly requires showing
the absence of behaviour rather than the existence of behaviour, and therefore, typical scenario-
based testing approaches are not sufficient for this purpose. That is why most existing approaches,
including safety standards, recommend formal verification as an alternative and necessary solution.
Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
MODEL CHECKING TRAMPOLINE OS: A CASE STUDY
OSEK/VDX
constraints
Light-weight
Model
model Verified? YES end
Simulation
checking
No
Counter
Run-time Test Incremental
example YES Refuted? No
test scenario verification
analysis
Each refuted safety property is analysed via the counterexample replay facility provided by the
simulator. If it turns out that the counterexample is a false alarm, because of mistakes in the model,
for example, then the correction is made to the model directly and re-verification is performed.
Otherwise, we generate a test scenario from the counterexample and perform run-time testing of the
Trampoline code on Linux to confirm that the counterexample does, in fact, reflect an actual prop-
erty violation. For those properties that are neither verified nor refuted, because of the limitation of
the resource, the process goes to the second step. In this step, the verification is performed iteratively
using larger resources by constraining verification conditions and relaxing them incrementally. The
incremental verification process includes the application of embedded C constructs in P ROMELA to
improve scalability as the third step of the verification process.
The first step of the verification was performed on a PC with Pentium III CPU and 4 Gbytes
of memory for the quick and easy identification of problems and false alarms. After addressing
issues identified from the initial verification, more extensive verification was performed on a SUN
workstation with 30 Gbytes of memory. S PIN version 6.0.1 and its graphical run-time environment
iSpin 1.0.0 were used for the verification.
A formal model of the Trampoline OS was constructed using one-to-one translation from C
constructs to P ROMELA constructs by using existing approaches [22, 23]. This section explains
the translation and modelling approaches unique to our work, while omitting details of conventional
translation rules.
Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
Y. CHOI
We first identify core functions that have a dependency relationship with major operating system
APIs. The core functions are classified by their service types and are grouped into modules accord-
ing to the classification. Each module and its member functions are converted into corresponding
P ROMELA constructs.
tpl_put_new_proc
tpl_get_task_lock
PostTaskHook
tpl_get_interrupt_lock_status
tpl_put_preempted_proc proc_service
resource scheduler
tpl_activate_task
tpl_get_proc start_schedule Init_proc
get schedule_from_running put_new_proc
tpl_activate_task_service tpl_schedule_from_running release Schedule_from_waiting get_proc
tpl_init_proc Schedule_from_dying put_preempted_proc
tpl_switch_context
tpl_get_internal_resource event task task_service
type
tpl_call_error_hook
terminate_task
PreTaskHook wait chain_task
set activate_task
tpl_release_task_lock clear
get
ErrorHook
Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
MODEL CHECKING TRAMPOLINE OS: A CASE STUDY
The core functions are collected and classified into six categories: task service, resource man-
agement, event management, scheduler, process service and common service functions. Figure 3(b)
illustrates the modular structure on the basis of the categorization. For example, tpl_activate_task
identified from the ActivateTask API becomes a member function of task_service. tpl_init_proc,
tpl_put_new_proc, tpl_get_proc and tpl_put_preempted_proc are members of proc_service. On the
other hand, tpl_get_internal_resource belongs to the module for common functions. In the figure,
the task is not a part of the operating system kernel but acts as a user of the operating system. We
have two types of tasks, basic and extended, as defined in OSEK/VDX. Information on individual
tasks, such as task id, priority and resources, is maintained in separate data structures as will be
explained in Section 3.4. The module for common functions, which is used by all the modules, is
not included in the figure.
Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
Y. CHOI
17:
18: schedule_from_running:
19: // perform tpl_schedule_from_running
...
20: mch!ret,origin;
21: goto get_message;
22:
23: schedule_from_dying:
24: ...
25: }
The module scheduler is modeled as a communicating process, which is defined as proctype in
P ROMELA and is initially in the get_message state. As a function call is received through a message
channel mch, it jumps to the corresponding label, performs the corresponding functions, returns
a message to the caller and jumps back to the initial state. For example, if the start message is
received (line 5), it jumps to the start_schedule label (line 11), where it starts performing the func-
tion tpl_start_schedule. After finishing the corresponding function, it returns a return message to
the caller (line 15) and jumps back to its initial state (line 16). Each proctype can be instantiated as
an independent process; these processes are synchronized with each other through message passing.
Functions often used by other functions are categorized as common functions and modeled using
the inline construct in P ROMELA. The following is an example for the function tpl_init_proc, which
is used in the function tpl_start_schedule (line 13).
inline tpl_init_proc(proc_id){
// assign static values to the dynamic process table
tpl_dyn_proc_table[proc_id].ppriority =
tpl_stat_proc_table[proc_id].base_priority;
...
// initialize process context by calling another inline function
tpl_init_context(proc_id);
}
In this way, the user-level interactions remain traceable through message-passing sequences,
whereas verification costs are reduced by minimizing the number of communicating processes.
The P ROMELA model translated from the core kernel code includes a minimum of six commu-
nicating processes and 18 inline functions comprising a total of 1500 lines of code. The number of
communicating processes may increase depending on the number of task types.
Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
MODEL CHECKING TRAMPOLINE OS: A CASE STUDY
FIFO queue
ceiling priority
owner s previous resource id/ task id
id
priority event_mask
owner id alarm base priority
...
FIFO ready list
a switching point. This is an exceptional case; we added the field for storing the context-switching
point to support simulation of context switching. This will be explained further in Sections 3.6
and 3.7.
We did not apply aggressive abstractions on the data structure and global variables for two rea-
sons: First, we did not have a complete understanding of the implementation details. Aggressive
abstraction is effective in reducing verification costs but can be dangerous if performed by some-
one who does not understand the rationale behind the implementation details. Second, aggressive
abstraction makes the counterexample analysis difficult because of the large difference between the
verification model and the actual code.
Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
Y. CHOI
OSEK/VDX specifies that a task may have four states: suspended, ready, running and waiting.
A notable fact is that OSEK/VDX requires all tasks to be designed statically and to be loaded in
the memory as the operating system starts up. Therefore, all tasks are in suspended states initially.
We have elaborated the original task model by refining the running state with details of possible
interactions with the Trampoline kernel code.
Figure 5 shows the refined task model for the extended task type; each sub-state in the running
state is labelled with a possible system call that can be performed in the state. Basically, all the
sub-states in the running state are strongly connected with transitions except for some restrictions
imposed by the basic requirements of the OSEK/VDX standard. Examples of such requirements are
as follows:
1. A task must not terminate or chain another task while holding resources, that is, a task cannot
transit to TerminateTask or ChainTask from the getResource state.
2. A task must not be in a waiting state while holding resources, that is, a task cannot transit to
WaitEvent from the getResource state.
3. A terminated or chained task goes into the suspended state.
Figure 5 reflects those constraints in the task transitions; for example, after the getResource API
call, a task transits to sp4, from which a non-deterministic choice to transit to ReleaseResouce,
ActiateTask or SetEvent can be made, but it cannot transit to TerminateTask, ChainTask or
WaitEvent from sp4. Self-transitions are allowed, although not explicitly depicted in the figure,
for each state except for the WaitEvent, ChainTask and TerminateTask states, which are drawn in
dashed lines.
We note that the generic task model excludes only obvious illegal behaviours from a user task
but allows potential design mistakes; for example, the model does not restrict the number of calls to
ActivateTask and allows a task to terminate while holding a resource, such as sp0 ! getResource !
SetEvent ! sp1 ! TerminateTask. The purpose is to construct a generic task that subsumes poten-
tial corner cases in task design while reducing unnecessary complexities, rather than defining a
perfect user task. In this way, we can identify potential issues of the kernel code that may be caused
by illegal usage patterns or bad task design.
We also note that the non-deterministic choice is not only available for the system calls but also for
the values of the parameters for each system call; this may include infeasible selection of parameter
values depending on the context, which matches with our purpose to find corner cases.
The task model for the basic task type is the same as the one for the extended task type, except
that the WaitEvent state and its related transitions are removed.
Schedule TerminateTask
suspended
sp0
ChainTask
activate ActivateTask
start sp1 sp3
ready
switch out SetEvent
switch in
Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
MODEL CHECKING TRAMPOLINE OS: A CASE STUDY
COPY(tpl_kern.old, tpl_kern.running);
COPY(tpl_kern.s_old,tpl_kern.s_running);
:: ...
fi;
The next code shows the way context switching is modeled in P ROMELA: If the kernel determines
that switching is required, it first sends the switch command to the currently running task through
the channel cch, then sends the start command to the highest-priority task in the ready queue and
finally sends the switching point where it can resume execution.
if
::tpl_kern.need_switch != NO_NEED_SWITCH ->
id =tpl_kern.s_old.context;
cch!switch, id;
id = tpl_kern.s_running.context;
cch!start,id;
cch!startAt, tpl_kern.s_running.switchPoint;
:: else ->skip;
fi;
Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
Y. CHOI
Typically, it is not necessary to explicitly model the context-switching behaviour because pro-
cesses in P ROMELA are already designed to arbitrarily interleave with each other, simulating
arbitrary context switching at the statement level. However, arbitrary interleaving includes invalid
behaviours, such as context switching in the middle of executing system calls. We remove such
invalid behaviours by explicitly modelling context switching. This also enables us to simulate
multiple tasks by using one generic task model as explained in the next section.
Figure 6(b) is an illustration of such a simulation. We modified the ready queue in the
P ROMELA model to maintain the information on the switching point for each activation of a
task so that one active proctype simulates multiple processes of the same task type. Consider-
ing that the number of active processes in P ROMELA is a critical factor in verification cost, this
modification, which is intended to explicitly handle the switching points, is a minor sacrifice to
improve scalability.
T1 T2 T3 T Ready queue
3.T3
E E E E
Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
MODEL CHECKING TRAMPOLINE OS: A CASE STUDY
Table IV is a list of the safety properties introduced in Section 2 and their corresponding formal
specification together with the verification results. The formal specification is written in temporal
logic LTL, which is a propositional logic with temporal operators. Given two propositional formulas
x and y, the meaning of each temporal logic operator used in the safety property is as follows:
Œx : x is always true for all execution paths
<> x : x is true at some time in future states
xUy : x is true until y is true
For example, the formal specification for SR1 can be interpreted as For all execution paths, if
TerminateTask or ChainTask is called and the calling task has a resource, then an error will be
set at some time in future states. We note that the LTL formulas are somewhat simplified from the
original version; for example, a variable i is used in the LTL formula of SR3 and SR4 to specify
them generally, but an explicit value is used for actual verification, such as Œ..wait_id DD 0 ! <>
.tpl_kern.running_id DD 0//, for each task identifier i. Also, taskhasResource of SR2 and wait_id
of SR3 are macros representing corresponding values from the Trampoline data structure. SR5 is
specified as SR51 in the initial formal specification but is strengthened to SR52 after a counterex-
ample analysis is performed; SR51 specifies that it is always the case that if taski has lower static
priority than taskj , and if taski and taskj are in the ready state at the same time, then taski does not
get into the running state before taskj unless an error occurs.
We performed initial verification using the kernel model and the generic task model. The result
shows that SR1, SR2 and SR52 were neither verified nor refuted because they ran out of memory.
SR3, SR4 and SR51 were refuted. The following is a detailed analysis of the result.
Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
Y. CHOI
Activate t2 Terminate
t1 running ready running suspended
Activate t3 Wait
evt2
t2 suspended running waiting
Wait evt0
t4 suspended running waiting
4. Task t4 has priority 4. Once it is started, it waits for event 0, sets event 2 for t2 and
terminates afterwards.
The scenario looks normal, and it is expected that all the tasks terminate normally. As Figure 7
illustrates, however, the S PIN verifier finds an abnormal behaviour for this task scenario: t4 waits
for event 0 indefinitely even though task t3 sets event 0 for t4 , and thus, t4 cannot run to set event 2
for task t2 , which again makes t2 wait for the event indefinitely. As a result, two of the four tasks
cannot terminate normally. It turns out that the source of the problem is in the encoding and check-
ing mechanism for events in the Trampoline kernel code, as shown in the following code fragment
from the Trampoline kernel.
1:tpl_status tpl_set_event(tpl_task_id task_id, tpl_event_mask in_event){
2: ....
3: if((events->evt_wait & in_event)!=0){
4: ....
5: // wake up and put the waiting process in the ready queue
6: ...
7: }
8: ...
9:}
All events are represented in an 8-bit mask; if a task calls the WaitEvent for event i, then the ith bit
of the event mask is set. When a task sets the event i for the waiting task, it calls tpl_set_event with
the task identifier and the event number. As stated in line 3, it performs a bitwise-and operation of
the event mask and the event number to check that a task is indeed waiting for the event. However,
this encoding and checking mechanism only works correctly when the event number is greater than
0; WaitEvent.0/ does not have any effect on the event mask since the bitwise-and operation of the
event mask and the event number are always equal to 0. In this case, the lines between 4 and 6 are
not executed and thus, cannot wake up the task waiting for event 0.
In fact, according to the OSEK/VDX specifications, the events are supposed to be declared
with names, such as evt1 and evt2, and it is assumed that those names are used when calling the
WaitEvent and SetEvent system services instead of using event numbers, such as WaitEvent.evt1/
and SetEvent.t 2, evt1/. Trampoline internally converts the event names into numbers starting from
1 so that the identified erroneous behaviour cannot be possible. Nevertheless, it is allowed to use
event numbers directly in the user task in Trampoline, that is, we can still code WaitEvent.0/ and
SetEvent.t 2, 0/ without getting warnings or error messages. We anticipate that this is a typical case
of a safety gap; considering that a safety problem is mostly caused by unexpected corner cases,
it is always recommended to safeguard potential issues. For example, this type of counterexample
should be avoided if any one of the following conditions is satisfied:
Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
MODEL CHECKING TRAMPOLINE OS: A CASE STUDY
Because we cannot afford safety failures in automotive systems, at least one of the systematic
guarding and checking mechanisms, other than human responsibility, is necessary.
The Type 2 case is more problematic because the starvation may not be obvious at the task
design level, but no error-handling mechanism is provided to detect such a situation because self-
reactivation is not prohibited by the OSEK/VDX standard. We may be able to avoid the Type 2 case
by prohibiting self-activation using ChainTask or by performing static/dynamic infinite loop detec-
tion. Although it is possible for some applications that infinite reactivation of a task is necessary, we
anticipate that such a case is only exceptional and can by handled by an application-specific way,
that is, by using a deterministic task sequence designed for the specific case.
4.3. Unsatisfied property due to the use of the Priority Ceiling Protocol
The safety property SR5, A task with higher static priority always starts its execution earlier than
a task with lower static priority, formally specified as SR51 , is not satisfied by the Trampoline OS
because of the use of O SEK’s Priority Ceiling Protocol (P CP) [14]. The O SEK P CP is designed to
avoid the problem of priority inversion and deadlocks by statically assigning a ceiling priority for
each resource and temporarily raising the priority of a task while the task allocates a resource whose
ceiling priority is higher than the priority of the task. The following is a counterexample scenario
identified by S PIN:
1. Task t1 , which has static priority 1, runs first and activates task t2 , which has static priority 5.
2. t2 preempts t1 and waits for event evt1.
3. t1 resumes and allocates resource r whose ceiling priority is 6. Then, the priority of t1 is
temporarily promoted to 6.
4. t1 activates task t3 , which has static priority 7, and is preempted. Now, t1 is in the ready state.
5. t3 sets event evt1 for t2 . Now, t2 is in the ready state.
6. t3 terminates. Then, t1 goes to the running state first because its priority is temporarily higher
than that of t2 .
SR52 is a stricter specification of SR5, adding the precondition that the priority of taskj is higher
than the maximum ceiling priority of all resources. The model checker did not find any counterex-
ample for the property within the given resources. This will be discussed in more detail in the
next section.
Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
Y. CHOI
Although the first-step verification was successful in finding potential safety issues in the kernel code
as well as in the task design, we still need a better answer for those properties where verification
was incomplete.
The inefficiency of the first-step verification results from two factors: (i) The Trampoline kernel
itself is too large in the statespace, and (ii) the task model used in the first-step verification is too gen-
erous in that it allows an arbitrary number of system calls in a task as well as infinite task sequences.
Because we aim at avoiding aggressive abstractions on the model, we focus on the second factor to
find room to improve performance and efficiency. The following is some observations made.
1. A task normally uses a limited number of system calls in practice.
2. Many counterexamples are caused by atypical infinite task sequencing.
Therefore, the second-step verification puts more constraints on the task model to exclude such
cases and tries to obtain a meaningful measure for comprehensive verification. The following is the
constraints imposed on the initial generic task model:
C1. The number of system calls is limited per task, producing more conservative scenarios but
still mimicking an arbitrary behaviour of a task.
C2. Over-activation of a task is prohibited.
C3. Self-activation using ChainTask, that is, calling ChainTask.t / from task t , is prohibited.
C1 is imposed on the task model by inserting a counter cnt_APIcalls for API calls, incrementing
the counter for each API call and imposing a guarding condition cnt_APIcalls < CallLimit for each
transition. Note that even though the number of system calls is limited, non-determinism is still
retained and so is the arbitrary behaviour of the task. C 2 is imposed on the task model by insert-
ing a guarding condition active_count < max_activation_count for the transitions going into the
ActivateTask or ChainTask states so that those system calls are called only when the current acti-
vation count is below the threshold. C 2 is not a necessary constraint because the over-activation is
to be caught by the error-handling mechanism in the Trampoline OS, but it helps to reduce unin-
teresting internal transitions and thus, reduce verification costs. The problem with atypical infinite
§
A hash factor between 10 and 100 has an expected coverage from 84% to 98%, and S PIN recommends trusting the
result of a bitstate search only if the hash factor is greater than 100 [24].
Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
MODEL CHECKING TRAMPOLINE OS: A CASE STUDY
task sequencing is addressed by limiting the number of API calls and by imposing C 3. We note that
infinite task sequencing is still allowed by activating each other, but an infinite loop within a task
and infinitely self-activating tasks are not allowed under these constraints.
With this more constrained task model, all three properties that were incomplete in the initial
verification, SR1, SR2 and SR52 , are verified within the given resources. SR42 , which was refuted
because of the infinite task sequencing, is also verified.
Table VI shows the performance of model checking the safety property SR2 as the number of
system calls increases from 3 to 15. The columns from left to right represent the number of API
calls, the depth of the verification search, the number of states explored, the number of transitions,
the amount of memory used in Megabytes and the time required to finish verification in seconds.
The DHC4 option is used for the experiment. We note that around 29 Gbytes of memory are
consumed for comprehensive verification, with a maximum of 15 system calls per task. That is, 15
system calls per task are the limit of comprehensive verification in the second verification step.
The third verification step applies the embedded C constructs in P ROMELA to improve model
checking performance. The embedded C constructs were introduced to facilitate model-driven
verification of software systems, making it possible to directly embed implementation code into
P ROMELA models [25]. This section explains how embedded C constructs are applied to exist-
ing Trampoline models and compares model checking performance before and after applying
embedded C constructs.
As we already have Promela models for the Trampoline kernel, the use of embedded C con-
structs is not for embedding C code. Instead, we performed partial conversion of existing Trampoline
models into embedded versions as follows:
Figure 8 shows a fragment of the Trampoline model converted from a pure Promela model into a
model with embedded C constructs. The atomic sequence in the inline functions is converted into a
c_code block, the global variable tpl_fifo_rw accessed from the c_code block is declared in a c_code
block and then the user-defined data type TPL_FIFO_STATE is declared in a c_decl block. Finally,
the global variable is traced by using the c_track construct. Each c_code block is invoked during
the model checking process, which is executed separately and returns its computation results. In this
case, the model checker only needs to know the location and the size of the variable computed in
the c_code block, regardless of how it is accessed or computed.
One thing worth noting is that each c_code construct is considered as a single transition and does
not produce intermediate states in the model checking process, no matter how many statements are
embedded inside, whereas each statement in the atomic sequence produces intermediate states in the
Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
Y. CHOI
}
}
pure P ROMELA model. Considering that the cost for model checking grows linearly with the num-
ber of states and that the number of states tends to grow exponentially during the model checking
process, this can result in a huge performance difference.
Table VII shows the performance data after applying embedded C constructs to the Trampoline
kernel model. We see that both the absolute value of the verification costs and the rate of the cost
increment as the number of API calls increases are greatly reduced. Figure 9 shows the comparative
memory and time requirements for verifying the original Trampoline model and the model with
embedded C; we note that the costs for the model with embedded C increase linearly as the number
of API calls increases, whereas the original model shows an exponential cost increase.
original original
Mbytes Seconds
embeddedC embeddedC
35,000 20,000
18,000
30,000
16,000
25,000 14,000
12,000
20,000
10,000
15,000 8,000
10,000 6,000
4,000
5,000
2,000
0 0
3 5 7 9 11 13 15 #of APIs 3 5 7 9 11 13 15 #of APIs
Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
MODEL CHECKING TRAMPOLINE OS: A CASE STUDY
We delayed the application of the embedded C constructs to the third step because merging tran-
sitions makes it difficult to analyse counterexamples and simulation results; the original model
is preferred for initial and incremental verification as long as the available resources allow this.
Embedded C is the last choice for better scalability.
7. RELATED WORK
There have been a number of works on the formal verification of operating systems regarding
various problems.
Reference [26] models and verifies the PATHO OS for vehicles by using timed automata. Recent
similar works are presented in [27, 28]. They model an OSEK/VDX application and the core part
of its kernel in timed automata and perform a rigorous timing analysis by using the model checker
U PPAAL. Their modelling and verification focus on non-preemptive scheduling and an analysis of
worst-case response time. It is subject to state space explosion as the number of tasks increases
because each task is explicitly modeled as an independent timed automata.
The authors in [12] suggested a meta-scheduler framework for implementing real-time schedul-
ing algorithms. They presented the design of the meta-scheduler and outlined its implementation.
The meta-scheduler is verified using U PPAAL with respect to correctness, deadlock freedom and
livelock freedom. All these cases are mainly concerned with the abstract modelling of operating
systems but do not deal with implementation nor with the issue of scalability.
Reference [29] is one of the typical approaches that model the memory map as new global vari-
ables in the embedded C source code and use a model checker to verify assertions. This approach
does not take priority-based task scheduling into account. [10] presents a verification result for time
partitioning in the DEOS scheduling kernel by using the S PIN model checker. It is the closest to our
case study in its use of the S PIN model checker and model translation from the kernel code. How-
ever, its verification is focused on one property regarding the scheduling algorithm so that aggressive
abstraction techniques specialized for the property can be applied to effectively avoid the statespace
explosion problem.
The Verisoft project [30] verifies run-time environment layers including OSEKtime and FlexRay,
and application processes using the Isabelle theorem prover [9, 31]. The L4.verified project aims
at providing consistency proofs for different layers of abstractions for the seL4 micro-kernel [11].
It takes the model-driven approach to develop a high-performance, low-complexity micro-kernel
named seL4 from scratch, where the high-level micro-kernel model is specified in Haskell and
is refined down to actual C code. The theorem prover Isabelle/HOL is used to verify the func-
tional correctness of the micro-kernel and the consistency of the inter-layer functionality. However,
the use of a theorem prover requires extensive knowledge about both the technique itself and the
verification domain.
There have been a couple of approaches for model checking embedded software in general. [8]
developed a special-purpose and domain-specific model checker named [mc]square that verifies C
code after it is compiled into Assembler. [7] verifies the embedded C code by using an abstract
memory model and then verifies software and hardware together by using only the return value
from independently running hardware modules for static verification, which is similar, in principle,
to the method supported by embedded C in P ROMELA [25].
8. DISCUSSION
This work demonstrated an application of the model checking technique on the safety analysis of
automotive software by using the Trampoline operating system as a case example. We provided
a conversion approach for the Trampoline kernel written in C into a formal model in P ROMELA,
a generic task-modelling method for performing all possible interactions between tasks and the
operating system, and a modelling approach for simulating a context-switching mechanism on a
single processor. We believe that the suggested approaches are general enough to be applied to
other embedded software running on a single-processor machine. Nevertheless, one can still argue
Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
Y. CHOI
against the effectiveness of using formal methods in safety analysis for embedded software. This
section discusses several related issues on the basis of our experience from this study.
Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
MODEL CHECKING TRAMPOLINE OS: A CASE STUDY
design without affecting the kernel model so that model checking safety properties for a specific
application is quite straightforward.
One can argue that the model extracted from the source code may miss important errors such
as de-referencing null pointers, accessing out of array bounds, and failures in memory allocation
that might reside in the source code. We suggest handling behavioural safety separately and inde-
pendently from code safety; code safety, which includes the safe use of array indices and pointer
arithmetic, can be treated using static code analysis tools before handling behavioural safety issues.
We have presented our experience on model checking the Trampoline operating system for the
purpose of safety analysis. To the best of our knowledge, this is the first extensive analysis of the
Trampoline OS that has found an actual problem in the system.
Our approach can be differentiated from existing approaches in that the conversion and modelling
of Trampoline into P ROMELA faithfully preserve the original code except for some re-structuring
and simplification of hardware-dependent code. This reduces the discrepancy between the original
code and the model, thus minimizing accidental errors that might arise from the modelling pro-
cess and providing straightforward counterexample replay in the actual code. However, this faithful
translation naturally results in high complexity during verification. We anticipate that we will have to
trade off either the accuracy of the kernel model or the generality of its environment for verification
comprehensiveness. Experiments show that constraining the task model, which is the environment of
the operating system, provides comprehensiveness up to a certain point, which is believed sufficient
for automotive E CU controllers.
Although this work shows promising results that model checking may be engineered in such a
way that it can be routinely applied for the safety analysis of automotive operating systems, it also
includes pitfalls and room for improvement. First, the conversion from the Trampoline kernel to
P ROMELA is carried out manually with the aid of a code analysis tool and thus includes poten-
tial human mistakes. To eliminate human mistakes, the manual model construction was thoroughly
tested using the S PIN simulator and model checker, which in fact took more time than model con-
struction itself.¶ The best way to avoid such manual construction and validation costs is to have the
conversion process automated. We plan to develop a domain-specific model extraction tool on the
basis of our experience.
Second, model checking the Trampoline kernel involves several parameters other than the number
of API calls per task, such as the number of activated tasks, the number of maximum activations
per task and the number of maximum resources/events. The model checking experiments in this
work used fixed values for these parameters: four tasks with maximum of two multiple activations
and resources/events per task. More refined experiments with varying parameters are necessary
for a complete analysis. Because we expect that varying such parameters will result in increasing
model checking complexity, future work would involve an investigation into developing a system-
atic method for reducing the complexity of the kernel model itself without requiring knowledge
about implementation details.
ACKNOWLEDGEMENTS
This work largely benefited from prior work carried out by Seunghyun Yoon, Yeonjoon Kim and
Minkyu-Park, who performed SFTA for automotive operating systems in a related project and built the
code extractor on top of the Understand tool. This work was supported by the Engineering Research Cen-
ter of Excellence Program of the Korean Ministry of Education, Science and Technology (MEST)/National
Research Foundation of Korea (NRF), Grant 2011-0000978 and the National Research Foundation of Korea
Grant funded by the Korean Government (2010-0017156).
¶
It took about 2 person-months to construct the initial model and about 3 person-months to validate the model in this
case study.
Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
Y. CHOI
REFERENCES
1. Choi Y. Safety analysis of the Trampoline OS using model checking: an experience report. In Proceedings of 22nd
IEEE International Symposium on Software Reliability Engineering, Hiroshima, Japan, 2011; 200–209.
2. Broy M. Challenges in automotive software engineering. In Proceedings of the 28th International Conference on
Software Engineering, New York, USA, 2006; 33–42.
3. Mossinger
R J. Software in automotive systems. IEEE Software 2010; 27(2):92–94.
4. Lutz RR, Shaw HY. Applying adaptive safety analysis techniques. In 10th International Symposium on Software
Reliability Engineering, Boca Raton, USA, 1999; 42–49.
5. Oh Y, Yoo J, Cha S, Son HS. Software safety analysis of function block diagrams using fault trees. Reliability
Engineering and System Safety 2005; 88(3):215–228.
6. Dingel J, Liang H. Automated comprehensive safety analysis of concurrent programs using Verisoft and TXL. In
ACM/SIGSOFT International Conference on Foundations of Software Engineering, Newport Beach, USA, 2004;
13–22.
7. Cordeiro L, Fischer B, Chen H, Marques-Silva J. Semiformal verification of embedded software in medical
devices considering stringent hardware constraints. In International Conference on Embedded Software and Systems,
HangZhou, China, 2009.
8. Schlich B, Kowalewski S. Model checking C source code for embedded systems. International Journal of Software
Tools and Technology Transfer 2009; 11:187–202.
9. Endres E, Muller
R C, Shadrin A, Tverdyshev S. Towards the formal verification of a distributed real-time automotive
system. In Proceedings of NASA Formal Method Symposium, Washington DC, USA, 2010; 212–216.
10. Penix J, Visser W, Park S, Pasareanu C, Engstrom E, Larson A, Weininger N. Verifying time partitioning in the
DEOS scheduling kernel. Formal Methods in Systems Design Journal 2005; 26(2):103–135.
11. Klein G, Elphinstone K, Heiser G, Andronick J, Cock D, Derrin P, Elkaduwe D, Engelhardt K, Kolanski R,
Norrish M, Sewell T, Tuch H, Winwood S. seL4: formal verification of an OS kernel. Communications of the ACM
2010; 53(6):107–115.
12. Li P, Ravindran B, Suhaib S, Feizabadi S. A formally verified application-level framework for real-time scheduling
on posix real-time operating systems. IEEE Transactions on Software Engineering 2004; 30(9):613–629.
13. Trampoline – opensource RTOS project. http://trampoline.rts-software.org.
14. OSEK/VDX operating system specification 2.2.3. http://portal.osek-vdx.org/files/pdf/specs/os223.pdf.
15. Holzmann GJ. The SPIN Model Checker: Primer and Reference Manual. Addison-Wesley Publishing Company:
Boston, MA, 2003.
16. Holzmann GJ, Ruys TC. Effective bug hunting with Spin and Modex. In Model checking software: The SPIN
Workshop, San Francisco, USA, 2005; 24.
17. Understand: source code analysis and metrics. http://www.scitools.com/.
18. Leveson NG. Safeware: System Safety and Computers. Addison Wesley: Reading, MA, 1995.
19. Choi Y, Yoon S, Kim Y, Lee S. Safety certification for automotive realtime operating systems. Technical Report,
Electronics and Telecommunications Research Institiute, South Korea, 2009.
20. Clarke EM, Grumberg O, Peled D. Model Checking. MIT Press: Cambridge, MA, 1999.
21. Zaks A, Joshi R. Verifying multi-threaded C programs with SPIN. In 15th International Workshop on Model
Checking Software, Los Angeles, USA, 2008; 325–342.
22. Holzmann GJ. Logic verification of ANSI-C code with Spin. In Proc. of the 7th international SPIN Workshop on
Model Checking of Software, Stanford, USA, 2000; 131–147.
23. Holzmann GJ, Smith MH. An automated verification method for distributed systems software based on model
extraction. IEEE Transactions on Software Engineering 2002; 28(4):364–377.
24. Holzmann GJ. An analysis of bitstate hashing. Formal Methods in System Design 1998; 13(3):289–307.
25. Holzmann GJ, Joshi R, Groce A. Model driven code checking. Automated Software Engineering 2008; 15(3–4):
283–297.
26. Balarin F, Petty K, Sangiovanni-Vincenteli AL, Varaiya P. Formal verification of the PATHO real-time operating
system. In Proceedings of the 33rd Conference on Decision and Control, Lake Buena Vista, USA, 1994; 2459–2465.
27. Waszniowski L, Hanzalek K Z. Formal verification on multitasking applications based on timed automata model.
Real-Time Systems 2008; 38:39–65.
28. Waszniowski L, Kr akora
K J, Hanzalek
K Z. Case study on distributed and fault tolerant system modeling based on
timed automata. Journal of Systems and Software 2009; 82(10):1678–1694.
29. Bucur D, Kwiatowska MZ. Poster abstract: software verification for TinyOS. In 9th ACM/IEEE International
Conference on Information Processing in Sensor Networks, Stockholm, Sweden, 2010; 400–401.
30. The Verisoft project homepage. http://www.verisoft.de.
31. der Riden TI, Kanpp S. An approach to the pervasive formal specification and verification of an automotive system.
In Proceedings of the International Workshop on Formal Methods in Industrial Critical Systems, Lisbon, Portugal,
2005; 115–124.
32. Groce A, Joshi R. Extending model checking with dynamic analysis. In Verification, Model Checking, and Abstract
Interpretation, San Francisco, USA, 2008; 142–156.
33. Holzmann GJ, Florian M. Model checking with bounded context switching. Formal Aspects of Computing
2011; 23(3):365–389.
Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr
MODEL CHECKING TRAMPOLINE OS: A CASE STUDY
34. Byun Y, Sanders BA, Keum C-S. Design of communication protocols using a message transfer pattern. International
Journal of Commnunication Systems 2005; 15(5):465–485.
35. Dong Y, Du X, Holzmann GJ, Smolka SA. Fighting livelock in the GNU i-protocol: a case study in explicit-state
model checking. International Journal on Software Tools for Technology Transfer 2003; 4:505–528.
36. Choi Y. From NuSMV to SPIN: experiences with model checking flight guidance systems. Formal Methods in
System Design 2007; 30(3):199–216.
37. Kim M, Kim Y, Kim H. A comparative study of software model checkers as unit testing tools: an industrial case
study. IEEE Transactions on Software Engineering 2011; 37(2):146–160.
38. Clarke E, Kroening D, Lerda F. A tool for checking ANSI-C programs. In 10th International Conference on Tools
and Algorithms for the Construction and Analysis of Systems, Barcelona, Spain, 2004; 168–176.
39. Beyer D, Hensinger TA, Jhala R, Majumdar R. The software model checker Blast: applications to software
engineering. International Journal on Software Tools for Technology Transfer 2007; 9(5):505–525.
40. Choi Y. Model Checking an OSEK/VDX-based Operating System for Automobile Safety Analysis, submitted to
IEICE Transactions on Information and Systems (Letter).
Copyright © 2012 John Wiley & Sons, Ltd. Softw. Test. Verif. Reliab. (2012)
DOI: 10.1002/stvr