Reliability Analysis of The SPARC V8 Architecture Via Fault Trees and UPPAL-SMC
Reliability Analysis of The SPARC V8 Architecture Via Fault Trees and UPPAL-SMC
Abstract—This paper proposes a system-level dynamic fault This paper introduces a new DFT approach to compute an
tree approach to model, analyze, and estimate the vulnerability accurate estimation of a system’s vulnerability to soft-faults,
to soft-faults of the 7-stage pipelined SPARC V8 integer unit. using the 32-bit SPARC V8 integer pipeline as a case-study.
A preliminary analysis of the architecture is used to derive a
dynamic fault tree diagram which is modeled as a priced-timed The fault propagation paths in the targeted architecture are
automaton. The assessment of the architecture’s vulnerability to obtained from the SPARC V8 architecture manual [5] by
radiation effects is obtained through fault tree analysis, showing applying the technique introduced in [6]. Subsequently, the
consistent results with radiation and simulation tests. fault propagation paths have been combined in order to obtain
Index Terms—Fault-tree, analysis, UPPAAL, model-checking a representative FT of the SPARC V8 integer pipeline, through
the use of the Behaviour-Based Method [7]. Next, a new
I. I NTRODUCTION modeling of FT gates is proposed, using the Priced-Timed
The cost and the complexity involved in the development Automata theory. Finally, a fully automatic FTA is performed
of safety-critical systems are a prime motivator to the use of with the use of UPPAAL-SMC model checking. The analysis
reliability assessment techniques as early in the design cycle performed consists in the estimation of the probability of
as possible. Existing techniques often lack the capacity to each type of Trap Exception (TE) to occur in the targeted
perform a comprehensive and exhaustive analysis on complex architecture, as well as the impact of individual registers to
architectures exposed to Single Event Upsets (SEUs), leading the overall reliability of the processor, and the probability of
to the necessity of conducting expensive ground tests that are failures over time. The obtained results are compared with
not able to fully characterize the system’s vulnerabilities. An radiation and simulation tests, showing remarkable consistency
SEU is characterized by an unforeseen change of state in one with radiation and simulation tests.
or more elements within the system memory. That change of
state is known as a soft-fault, and it can often be detected
and corrected by safety measures designed in the system. II. E XISTING FAULT A NALYSIS OF SPARC V8
However, failing to detect the presence of soft-faults may have
catastrophic consequences, especially in environments where Extensive literature exists on the verification of the SPARC
safety is paramount. V8 architecture, which is mainly performed through simulation
In recent years, as the demand for fast, flexible and reliable and radiation testing. Radiation testing, such as the work in
techniques have increased, many alternative methods for soft- [8] are very important in characterizing this architecture’s
fault estimation have emerged. Among the most popular vulnerability when exposed to total ionizing dose. In [9] the
techniques in the literature is the fault tree analysis (FTA). The authors use software handlers that enable the classification of
FTA method is widely used for risk assessment, mainly in the the types of crashes, and the measured crash cross-sections are
area of avionics, nuclear and chemical industries [1]. Dynamic compared with those predicted by fault injection simulation.
Fault Trees (DFTs) [1, 2] are an extension of static fault trees This demonstrates that the data extracted from radiation tests
that cater to complex functional dependencies between system may be used to conduct predictability experiments at a higher-
components, such as priority, order of occurrence, triggering, level. However, these techniques are extremely costly and
and spare component management. Recent literature such as rather limited in their coverage, since it is virtually impossible
the work in [3] propose extensions to the conventional boolean to test or simulate for all possible scenarios. On the other hand,
FTA in order to take sequence dependencies into account regular system-level approaches (such as FTs) are often bound
for qualitative and probabilistic analyses without state-space to simplistic analyses due to a lack the expressiveness resultant
transformations. This allows for modelling of event sequences from the high-levels of abstraction employed. To address these
at all levels within a fault tree. The analysis of uncertainty shortcomings, we propose a new time-enhanced modeling
over time is the focus in [4], in which a FTA technique has approach to FT gates. The proposed modeling enhances each
been introduced to accurately model the effects of SEUs in of the gates with multiple clocks that may keep track of the
non-deterministic environments, with the use of probabilistic duration of each fault individually, as well as global clocks
model-checking and Markov Decision Processes (MDPs). that enable the analysis of the impacts of faults over time.
,(((
Authorized licensed use limited to: Concordia University Library. Downloaded on January 26,2022 at 15:28:14 UTC from IEEE Xplore. Restrictions apply.
III. M ODELING THE SPARC V8 P IPELINE AS DFT
In order to conduct the proposed analysis, we must first xc_result
generate a fault tree of the SPARC V8 integer pipeline. We
have adopted an analytical approach for the generation of fault
trees for complex systems, known as the Behaviour-Based G7
Method [7] for the FT generation. This approach considers
faults as behaviours, and fault-tree gates as operations on those d_cache
behaviours. The behavioural patterns of the system under the
effects of SEUs have been obtained by applying the method
previously introduced in [6]. By applying these techniques to G6
the SPARC V8 7-stage pipeline, and based on the structural
information available in the SPARC V8 architecture manual result
[5, 10], the fault tree of the integer pipeline is constructed,
as shown in Fig. 1. The fault tree is divided into 7 levels,
each representing a stage of the pipeline. A more detailed G5
explanation of how the FT was obtained, as well as the
reasoning for the FT gates used is given in the following
subsections. op_2 op_1
Authorized licensed use limited to: Concordia University Library. Downloaded on January 26,2022 at 15:28:14 UTC from IEEE Xplore. Restrictions apply.
different fault propagation path. Register imm may propagate
to the Execute stage. Register rd may propagate to the RA
stage and register rfa can be non-deterministically propagated
to adr 1 or to adr 2.
Register Access (RA): During the register access stage,
operands are read from the register file or from internal data
bypasses. In our model, a soft-fault at this stage (either direct
or propagated) may firstly affect the registers adr 1 or adr 2.
These registers, along with the rd register from the DE stage, Fig. 2: Sample of an OR Gate PTA
are inputs to the OR gate G2. A soft-fault in the inputs of
gate G2 may propagate to the Reg. File component. It is and modeling of the probabilistic behavior of FT gates and
important to note that the Reg. File component may propagate events over time. To achieve this, we define each FT gate
the soft-fault to either dt 1 or dt 2. According to [5], a soft- as a PTA model, where a state (l, ν) ∈ L × Rχ≥0 such that
fault in adr 1 may only propagate to dt 1. Similarly, a soft- ν |= inv(l). In any state (l, ν), there is a nondeterministic
fault in adr 2 may only propagate to dt 2. Therefore, the Reg. choice of either making a discrete transition or letting time
File component and its children are considered special cases, pass. A discrete transition can be made according to any
deviating from the previously mentioned general assumption (l, g, p) ∈ P , with current state l being enabled and zone g is
since the choice of propagation path is always deterministic. satisfied by the current clock valuation ν. The probability of
Execute (EX): During this pipeline stage, the logical and moving to location l and resetting all clocks in X to 0 is given
shift operations are performed. For memory operations (such by p(X, l ). The option of letting time pass is available only if
as JMPL), the address is generated. In this stage, soft-faults the invariant condition inv(l) is satisfied while time elapses.
originated from the imm, dt 2, or branch addr. may propagate The complete model of the desired FT can be obtained by
to component op 1 through the OR gate G3. Similarly, soft- synchronizing the inputs and outputs of the required FT gates,
faults originated from the dt 1 or branch addr. may propagate therefore composing the full FT diagram. As an example, Fig.
to component op 2 through the OR gate G4. Furthermore, 2 shows the PTA of a possible configuration of the OR gate
soft-faults in op 1, op 2, or pc may propagate to the result with two inputs. In the case of the x input, it may fail at
register. any time with probability px. At which point, the variable x
Memory Access(MA): Data cache is read or written in the becomes 1. The fault in x then propagates to the output of the
memory at this stage. In the proposed FT, an error in this stage gate, setting out to 1. The same logic also applies to the y
may arise either from a soft-fault in the result or in the branch input.
addr. registers. This is modeled with OR gate G6, which may
propagate the soft-fault from either of those registers. IV. S TOCHASTIC S OFT- FAULT A NALYSIS WITH UPPAAL
Exception (XC): In this pipeline stage, traps and interrupts In this section, we present and discuss the results of our
are resolved. In the proposed FT, OR gate G7 may propagate analysis of the fault tree of the SPARC V8 pipeline (Fig.
soft-faults coming from the d cache or from the result. At this 1). For this analysis, the proposed FT has been modeled in
level of the FT, our model computes the probabilities of each Uppaal-SMC, where a PTA model of each of the FT gates
of the different traps, based on the probability of soft-faults in has been generated. The model of the FT is then obtained
the instructions performed and the paths taken. through the parallel composition of all the gate models. The
Write-Back (WB): The result of any ALU, logical, shift, or proposed PTA models in Uppaal rely heavily on two aspects:
cache operations are written back to the register file. The value 1) As previously mentioned, the probabilities used for the
of the xc result register is generated during this pipeline stage. failure rates in the model are derived from the cross-section
C. PTA Model Composition values reported in [9]; 2) The proposed DFT models are
The modeling formalism of UPPAAL-SMC is based on a constructed specifically for the verification of a microchip.
stochastic interpretation and extension of the Timed Automata This means that the PTAs of each gate of the FT have certain
(TA) formalism used in the classical model checking version configurations that accommodate the particularities of such
of UPPAAL. For individual TA components, the stochastic system. For example, exit rates in the models are set to 1.
interpretation replaces the non-deterministic choices between This forces the tool to evaluate each state at every unit of time
multiple enabled transitions by probabilistic choices. Similarly, (clock-cycle). Furthermore, each register has an internal clock
the nondeterministic choices of time delays are refined by that tracks the amount of time that the soft-fault is active, as
probability distributions, which at the component level are well as a global clock that estimates the average propagation
given either uniform distributions in cases with time-bounded times in the FT. This feature is extremely valuable for the
delays or exponential distributions (with user-defined rates) in modeling of soft-faults, since it allows the model-checker
cases of unbounded delays. These structures are defined as to verify the model with different time parameters in each
Priced-Timed Automata (PTA). verification instance. For example, the expected propagation
In order to accurately model the fault dependencies in the time of a fault greatly impacts its probability to cause a failure
proposed DFT, our approach focuses on the formalization in the system.
Authorized licensed use limited to: Concordia University Library. Downloaded on January 26,2022 at 15:28:14 UTC from IEEE Xplore. Restrictions apply.
(27.2 % of cases), the rfa register (22.3 % of cases), and
the d cache (19.3 % of cases). The access to this kind of
information early in the development cycle means that possible
points of vulnerability are easier to detect and quicker to fix,
resulting in increased productivity at lower costs.
V. C ONCLUSION AND F UTURE W ORK
Vulnerability to soft-errors is a major concern for micro-
electronic systems exposed to radiation. This paper proposes a
probabilistic verification approach for vulnerability estimation
of the SPARC V8 architecture, when it is exposed to soft-
errors. This approach seeks to overcome the limitations of
emulation and simulation-based techniques, by performing
fault tree analysis through stochastic model checking, which
Fig. 3: Probability of Trap Exceptions in different approaches. provides an accurate and exhaustive estimation of the effects
Simulation and radiation test results are reproduced from [9] of soft-errors in the system. The modeling experiments that
This experiment has the objective to determine the prob- were conducted produced results that are very consistent with
ability of an SEU to rise a trap exception in the pipeline of previously reported radiation ground testing. The proposed
the processor. Moreover, the proposed analysis can identify the approach is also able to accurately estimate metrics such as
type of trap exception that has occurred. With the FT proposed the impact of different bit flips on the system’s dependability
in Fig. 1 and following the behavior detailed in the SPARC V8 measurement, and availability over time. Future work will seek
manual [5], the proposed model is able to estimate the proba- to expand the analysis domain by integrating multiple layers
bility of occurrence of the following trap exceptions: instruc- of abstraction in order to produce even more accurate results.
tion access exception, data store error, illegal instruction, R EFERENCES
data access exception, and mem address not aligned. All
other types of trap exception are classified under Others. The [1] W.E. Vesely et al. Fault tree handbook, 1981. In
probabilities used in the proposed analysis are derived from http://www.hq.nasa.gov/office/codeq/doctree/fthb.pdf.
cross-section metrics published in [9]. The analysis has been [2] J. B. Dugan et al. Fault trees and sequence dependen-
evaluated over 3800 iterations, with a confidence level of cies. In Reliability and Maintainability Symposium, pages
95%. The results show that approximately 41% of all injected 286–293, 1990.
faults have been captured as trap exceptions. Fig. 3 shows a [3] S. J. Schilling. Contribution to temporal fault tree
break down of each of the identified errors. For validation, the analysis without modularization and transformation into
obtained results (shown under Proposed FTA) are compared the state space. arXiv preprint arXiv:1505.04511, 2015.
with the results reported in [9] (shown under Radiation Test [4] M. Ammar et al. Efficient probabilistic fault tree analysis
and Simulation). It can be observed that the results estimated of safety critical systems via probabilistic model check-
by the proposed analysis show remarkable similarity with the ing. In Forum on Specification and Design Languages
values obtained by the of the Radiation Test and Simulation (FDL), pages 1–8, 2016.
techniques. This shows that the proposed analysis can be a [5] SPARC Internationa. The sparc architecture manual
viable option for early analysis of microprocessor designs, as version 8. SPARC International Inc, 1998.
it is an inexpensive, fast, and highly customizable alternative [6] M. Ammar et al. System-level analysis of the vulnera-
to other adopted methods. bility of processors exposed to single-event upsets via
probabilistic model checking. IEEE Transactions on
The proposed FTA can be customized to evaluate other
Nuclear Science, 64(9):2523–2530, 2017.
metrics, such as the estimated time before a failure, the failure
[7] A. Rae et al. A behaviour-based method for fault tree
rate over time, and the impact of soft-errors on different
generation. In Int. System Safety Conference, System
components. As an example, Table I shows the probability
Safety Society, pages 289–298, 2004.
of trap exceptions over time, the probability of detected
[8] F. Sturesson et al. Radiation characterization of a dual
errors over time, and the probability of undetected errors over
core leon3-ft processor. In European Conference on
time. Although relatively small, the probability of undetected
Radiation and Its Effects on Components and Systems,
errors in the system may represent a serious issue in certain
2011, pages 938–944.
conditions, where the system is expected to operate without
[9] C. Bottoni et al. Heavy ions test result on a 65nm sparc-
maintenance. Our analysis shows that the biggest contributors
v8 radiation-hard microprocessor. In IEEE International
to the occurrence of undetected errors are the nPC register
Reliability Physics Symposium, pages 5F–5, 2014.
TABLE I: SPARC V8 Probability of Failure Over Time [10] M. Daněk et al. The leon3 processor. In UTLEON3:
Prob. of TE Prob. of Error Prob. of Undetected Errors Exploring Fine-Grain Multi-Threading in FPGAs, pages
(million hours) (million hours) (million hours) 9–14. Springer, 2013.
SPARC V8
0.133 0.268 0.097
Pipeline
Authorized licensed use limited to: Concordia University Library. Downloaded on January 26,2022 at 15:28:14 UTC from IEEE Xplore. Restrictions apply.