0% found this document useful (0 votes)
29 views

Reliability Analysis of The SPARC V8 Architecture Via Fault Trees and UPPAL-SMC

This paper introduces a new dynamic fault tree approach to model and analyze the vulnerability of the 32-bit SPARC V8 integer pipeline to soft-faults. The fault propagation paths in the SPARC V8 architecture are obtained and combined to generate a fault tree. A modeling of fault tree gates is proposed using priced-timed automata theory. Finally, a fully automatic fault tree analysis is performed to estimate failure probabilities and analyze the impact of individual components.

Uploaded by

cvsetf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Reliability Analysis of The SPARC V8 Architecture Via Fault Trees and UPPAL-SMC

This paper introduces a new dynamic fault tree approach to model and analyze the vulnerability of the 32-bit SPARC V8 integer pipeline to soft-faults. The fault propagation paths in the SPARC V8 architecture are obtained and combined to generate a fault tree. A modeling of fault tree gates is proposed using priced-timed automata theory. Finally, a fully automatic fault tree analysis is performed to estimate failure probabilities and analyze the impact of individual components.

Uploaded by

cvsetf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Reliability Analysis of the SPARC V8 Architecture

via Fault Trees and UPPAL-SMC


Marwan Ammar, Ghaith Bany Hamad, Otmane Ait Mohamed Yvon Savaria
Concordia University Polytechnique Montreal
Montreal, Canada Montreal, Canada

Abstract—This paper proposes a system-level dynamic fault This paper introduces a new DFT approach to compute an
tree approach to model, analyze, and estimate the vulnerability accurate estimation of a system’s vulnerability to soft-faults,
to soft-faults of the 7-stage pipelined SPARC V8 integer unit. using the 32-bit SPARC V8 integer pipeline as a case-study.
A preliminary analysis of the architecture is used to derive a
dynamic fault tree diagram which is modeled as a priced-timed The fault propagation paths in the targeted architecture are
automaton. The assessment of the architecture’s vulnerability to obtained from the SPARC V8 architecture manual [5] by
radiation effects is obtained through fault tree analysis, showing applying the technique introduced in [6]. Subsequently, the
consistent results with radiation and simulation tests. fault propagation paths have been combined in order to obtain
Index Terms—Fault-tree, analysis, UPPAAL, model-checking a representative FT of the SPARC V8 integer pipeline, through
the use of the Behaviour-Based Method [7]. Next, a new
I. I NTRODUCTION modeling of FT gates is proposed, using the Priced-Timed
The cost and the complexity involved in the development Automata theory. Finally, a fully automatic FTA is performed
of safety-critical systems are a prime motivator to the use of with the use of UPPAAL-SMC model checking. The analysis
reliability assessment techniques as early in the design cycle performed consists in the estimation of the probability of
as possible. Existing techniques often lack the capacity to each type of Trap Exception (TE) to occur in the targeted
perform a comprehensive and exhaustive analysis on complex architecture, as well as the impact of individual registers to
architectures exposed to Single Event Upsets (SEUs), leading the overall reliability of the processor, and the probability of
to the necessity of conducting expensive ground tests that are failures over time. The obtained results are compared with
not able to fully characterize the system’s vulnerabilities. An radiation and simulation tests, showing remarkable consistency
SEU is characterized by an unforeseen change of state in one with radiation and simulation tests.
or more elements within the system memory. That change of
state is known as a soft-fault, and it can often be detected
and corrected by safety measures designed in the system. II. E XISTING FAULT A NALYSIS OF SPARC V8
However, failing to detect the presence of soft-faults may have
catastrophic consequences, especially in environments where Extensive literature exists on the verification of the SPARC
safety is paramount. V8 architecture, which is mainly performed through simulation
In recent years, as the demand for fast, flexible and reliable and radiation testing. Radiation testing, such as the work in
techniques have increased, many alternative methods for soft- [8] are very important in characterizing this architecture’s
fault estimation have emerged. Among the most popular vulnerability when exposed to total ionizing dose. In [9] the
techniques in the literature is the fault tree analysis (FTA). The authors use software handlers that enable the classification of
FTA method is widely used for risk assessment, mainly in the the types of crashes, and the measured crash cross-sections are
area of avionics, nuclear and chemical industries [1]. Dynamic compared with those predicted by fault injection simulation.
Fault Trees (DFTs) [1, 2] are an extension of static fault trees This demonstrates that the data extracted from radiation tests
that cater to complex functional dependencies between system may be used to conduct predictability experiments at a higher-
components, such as priority, order of occurrence, triggering, level. However, these techniques are extremely costly and
and spare component management. Recent literature such as rather limited in their coverage, since it is virtually impossible
the work in [3] propose extensions to the conventional boolean to test or simulate for all possible scenarios. On the other hand,
FTA in order to take sequence dependencies into account regular system-level approaches (such as FTs) are often bound
for qualitative and probabilistic analyses without state-space to simplistic analyses due to a lack the expressiveness resultant
transformations. This allows for modelling of event sequences from the high-levels of abstraction employed. To address these
at all levels within a fault tree. The analysis of uncertainty shortcomings, we propose a new time-enhanced modeling
over time is the focus in [4], in which a FTA technique has approach to FT gates. The proposed modeling enhances each
been introduced to accurately model the effects of SEUs in of the gates with multiple clocks that may keep track of the
non-deterministic environments, with the use of probabilistic duration of each fault individually, as well as global clocks
model-checking and Markov Decision Processes (MDPs). that enable the analysis of the impacts of faults over time.

‹,((( 

Authorized licensed use limited to: Concordia University Library. Downloaded on January 26,2022 at 15:28:14 UTC from IEEE Xplore. Restrictions apply.
III. M ODELING THE SPARC V8 P IPELINE AS DFT
In order to conduct the proposed analysis, we must first xc_result
generate a fault tree of the SPARC V8 integer pipeline. We
have adopted an analytical approach for the generation of fault
trees for complex systems, known as the Behaviour-Based G7
Method [7] for the FT generation. This approach considers
faults as behaviours, and fault-tree gates as operations on those d_cache
behaviours. The behavioural patterns of the system under the
effects of SEUs have been obtained by applying the method
previously introduced in [6]. By applying these techniques to G6
the SPARC V8 7-stage pipeline, and based on the structural
information available in the SPARC V8 architecture manual result
[5, 10], the fault tree of the integer pipeline is constructed,
as shown in Fig. 1. The fault tree is divided into 7 levels,
each representing a stage of the pipeline. A more detailed G5
explanation of how the FT was obtained, as well as the
reasoning for the FT gates used is given in the following
subsections. op_2 op_1

A. General Considerations and Assumptions


G3 G4
Before discussing how the FT mapping has been done, it is
important to present some of the general considerations and
assumptions used in this work. Firstly, it is assumed that the dt_1 Branch
addr dt_2
outputs of the fault tree gates are also susceptible to soft-faults.
For example, the probability of failure (λ) of “inst” is given
by (λ iCache OR λ P C OR λ inst). Secondly, we assume Reg.File
that the events in the fault tree (circles and rectangles) do not
represent components but rather the occurrence of a soft-fault G2
on the component [7]. For example, the element Reg. File
means that a soft-fault has occurred in the component register
file. Therefore, each event in the fault tree has a probability of adr_1 adr_2

failing due to soft-faults. Such probability is estimated through


the equation: σdevice = σf f ×Nf f ×(1−α), where σcomponent imm rfa rd
is the cross section of the device, σf f is the intrinsic cross
section of the flip-flop mapped on the design, Nf f is the inst
number of flip-flops, and α is the masking of the device npc
[9]. Unlike conventional logic gates, the inputs and outputs
of FT gates are probabilities related to the set operations G1
of Boolean logic. For example, the AND gate represents the
assumption of the combination of independent events (i.e., the
iCache pc
intersection of the input event sets). On the other hand, the
OR gate represents the assumption that the inputs are mutually
exclusive events (i.e., the union of the input event sets) [1].
Fig. 1: DFT of the 7-stage integer pipeline of the SPARC V8
An important consideration is that some FT behavior has been
slightly altered. For example, the register inst connects directly Instruction Fetch (FE): In this stage, the PC register is
to three other registers (i.e., imm, rfa, or rd) through different read and the instruction is fetched from the instruction cache
transitions. Based on which bit of register inst is affected by (iCache). Therefore, in this stage, a soft-fault may originate
the SEU, a different fault propagation path may take place. from the PC or from the iCache. This relationship is repre-
This decision is taken probabilistically. sented in the proposed FT as gate G1, which is an OR gate
connected to the basic events iCache and PC. Additionally,
B. System Level Fault Abstraction Soft-faults in the nPC components may be propagated to the
This subsection details how each of the SPARC V8 pipeline EX stage of the pipeline.
stages have been mapped into a FT. This mapping has been Decode (DE): In this stage, the instruction (inst) is de-
based on extensive research on fault injection and propagation coded. A soft-fault that occurs in the inst register can non-
experiments in the literature, as well as in technical reports deterministically propagate the error to one of three registers:
and in the SPARC V8 architecture manual. imm, rfa, or rd. Each of these registers will, in turn, activate a



Authorized licensed use limited to: Concordia University Library. Downloaded on January 26,2022 at 15:28:14 UTC from IEEE Xplore. Restrictions apply.
different fault propagation path. Register imm may propagate
to the Execute stage. Register rd may propagate to the RA
stage and register rfa can be non-deterministically propagated
to adr 1 or to adr 2.
Register Access (RA): During the register access stage,
operands are read from the register file or from internal data
bypasses. In our model, a soft-fault at this stage (either direct
or propagated) may firstly affect the registers adr 1 or adr 2.
These registers, along with the rd register from the DE stage, Fig. 2: Sample of an OR Gate PTA
are inputs to the OR gate G2. A soft-fault in the inputs of
gate G2 may propagate to the Reg. File component. It is and modeling of the probabilistic behavior of FT gates and
important to note that the Reg. File component may propagate events over time. To achieve this, we define each FT gate
the soft-fault to either dt 1 or dt 2. According to [5], a soft- as a PTA model, where a state (l, ν) ∈ L × Rχ≥0 such that
fault in adr 1 may only propagate to dt 1. Similarly, a soft- ν |= inv(l). In any state (l, ν), there is a nondeterministic
fault in adr 2 may only propagate to dt 2. Therefore, the Reg. choice of either making a discrete transition or letting time
File component and its children are considered special cases, pass. A discrete transition can be made according to any
deviating from the previously mentioned general assumption (l, g, p) ∈ P , with current state l being enabled and zone g is
since the choice of propagation path is always deterministic. satisfied by the current clock valuation ν. The probability of
Execute (EX): During this pipeline stage, the logical and moving to location l and resetting all clocks in X to 0 is given
shift operations are performed. For memory operations (such by p(X, l ). The option of letting time pass is available only if
as JMPL), the address is generated. In this stage, soft-faults the invariant condition inv(l) is satisfied while time elapses.
originated from the imm, dt 2, or branch addr. may propagate The complete model of the desired FT can be obtained by
to component op 1 through the OR gate G3. Similarly, soft- synchronizing the inputs and outputs of the required FT gates,
faults originated from the dt 1 or branch addr. may propagate therefore composing the full FT diagram. As an example, Fig.
to component op 2 through the OR gate G4. Furthermore, 2 shows the PTA of a possible configuration of the OR gate
soft-faults in op 1, op 2, or pc may propagate to the result with two inputs. In the case of the x input, it may fail at
register. any time with probability px. At which point, the variable x
Memory Access(MA): Data cache is read or written in the becomes 1. The fault in x then propagates to the output of the
memory at this stage. In the proposed FT, an error in this stage gate, setting out to 1. The same logic also applies to the y
may arise either from a soft-fault in the result or in the branch input.
addr. registers. This is modeled with OR gate G6, which may
propagate the soft-fault from either of those registers. IV. S TOCHASTIC S OFT- FAULT A NALYSIS WITH UPPAAL
Exception (XC): In this pipeline stage, traps and interrupts In this section, we present and discuss the results of our
are resolved. In the proposed FT, OR gate G7 may propagate analysis of the fault tree of the SPARC V8 pipeline (Fig.
soft-faults coming from the d cache or from the result. At this 1). For this analysis, the proposed FT has been modeled in
level of the FT, our model computes the probabilities of each Uppaal-SMC, where a PTA model of each of the FT gates
of the different traps, based on the probability of soft-faults in has been generated. The model of the FT is then obtained
the instructions performed and the paths taken. through the parallel composition of all the gate models. The
Write-Back (WB): The result of any ALU, logical, shift, or proposed PTA models in Uppaal rely heavily on two aspects:
cache operations are written back to the register file. The value 1) As previously mentioned, the probabilities used for the
of the xc result register is generated during this pipeline stage. failure rates in the model are derived from the cross-section
C. PTA Model Composition values reported in [9]; 2) The proposed DFT models are
The modeling formalism of UPPAAL-SMC is based on a constructed specifically for the verification of a microchip.
stochastic interpretation and extension of the Timed Automata This means that the PTAs of each gate of the FT have certain
(TA) formalism used in the classical model checking version configurations that accommodate the particularities of such
of UPPAAL. For individual TA components, the stochastic system. For example, exit rates in the models are set to 1.
interpretation replaces the non-deterministic choices between This forces the tool to evaluate each state at every unit of time
multiple enabled transitions by probabilistic choices. Similarly, (clock-cycle). Furthermore, each register has an internal clock
the nondeterministic choices of time delays are refined by that tracks the amount of time that the soft-fault is active, as
probability distributions, which at the component level are well as a global clock that estimates the average propagation
given either uniform distributions in cases with time-bounded times in the FT. This feature is extremely valuable for the
delays or exponential distributions (with user-defined rates) in modeling of soft-faults, since it allows the model-checker
cases of unbounded delays. These structures are defined as to verify the model with different time parameters in each
Priced-Timed Automata (PTA). verification instance. For example, the expected propagation
In order to accurately model the fault dependencies in the time of a fault greatly impacts its probability to cause a failure
proposed DFT, our approach focuses on the formalization in the system.



Authorized licensed use limited to: Concordia University Library. Downloaded on January 26,2022 at 15:28:14 UTC from IEEE Xplore. Restrictions apply.
(27.2 % of cases), the rfa register (22.3 % of cases), and
the d cache (19.3 % of cases). The access to this kind of
information early in the development cycle means that possible
points of vulnerability are easier to detect and quicker to fix,
resulting in increased productivity at lower costs.
V. C ONCLUSION AND F UTURE W ORK
Vulnerability to soft-errors is a major concern for micro-
electronic systems exposed to radiation. This paper proposes a
probabilistic verification approach for vulnerability estimation
of the SPARC V8 architecture, when it is exposed to soft-
errors. This approach seeks to overcome the limitations of
emulation and simulation-based techniques, by performing
fault tree analysis through stochastic model checking, which
Fig. 3: Probability of Trap Exceptions in different approaches. provides an accurate and exhaustive estimation of the effects
Simulation and radiation test results are reproduced from [9] of soft-errors in the system. The modeling experiments that
This experiment has the objective to determine the prob- were conducted produced results that are very consistent with
ability of an SEU to rise a trap exception in the pipeline of previously reported radiation ground testing. The proposed
the processor. Moreover, the proposed analysis can identify the approach is also able to accurately estimate metrics such as
type of trap exception that has occurred. With the FT proposed the impact of different bit flips on the system’s dependability
in Fig. 1 and following the behavior detailed in the SPARC V8 measurement, and availability over time. Future work will seek
manual [5], the proposed model is able to estimate the proba- to expand the analysis domain by integrating multiple layers
bility of occurrence of the following trap exceptions: instruc- of abstraction in order to produce even more accurate results.
tion access exception, data store error, illegal instruction, R EFERENCES
data access exception, and mem address not aligned. All
other types of trap exception are classified under Others. The [1] W.E. Vesely et al. Fault tree handbook, 1981. In
probabilities used in the proposed analysis are derived from http://www.hq.nasa.gov/office/codeq/doctree/fthb.pdf.
cross-section metrics published in [9]. The analysis has been [2] J. B. Dugan et al. Fault trees and sequence dependen-
evaluated over 3800 iterations, with a confidence level of cies. In Reliability and Maintainability Symposium, pages
95%. The results show that approximately 41% of all injected 286–293, 1990.
faults have been captured as trap exceptions. Fig. 3 shows a [3] S. J. Schilling. Contribution to temporal fault tree
break down of each of the identified errors. For validation, the analysis without modularization and transformation into
obtained results (shown under Proposed FTA) are compared the state space. arXiv preprint arXiv:1505.04511, 2015.
with the results reported in [9] (shown under Radiation Test [4] M. Ammar et al. Efficient probabilistic fault tree analysis
and Simulation). It can be observed that the results estimated of safety critical systems via probabilistic model check-
by the proposed analysis show remarkable similarity with the ing. In Forum on Specification and Design Languages
values obtained by the of the Radiation Test and Simulation (FDL), pages 1–8, 2016.
techniques. This shows that the proposed analysis can be a [5] SPARC Internationa. The sparc architecture manual
viable option for early analysis of microprocessor designs, as version 8. SPARC International Inc, 1998.
it is an inexpensive, fast, and highly customizable alternative [6] M. Ammar et al. System-level analysis of the vulnera-
to other adopted methods. bility of processors exposed to single-event upsets via
probabilistic model checking. IEEE Transactions on
The proposed FTA can be customized to evaluate other
Nuclear Science, 64(9):2523–2530, 2017.
metrics, such as the estimated time before a failure, the failure
[7] A. Rae et al. A behaviour-based method for fault tree
rate over time, and the impact of soft-errors on different
generation. In Int. System Safety Conference, System
components. As an example, Table I shows the probability
Safety Society, pages 289–298, 2004.
of trap exceptions over time, the probability of detected
[8] F. Sturesson et al. Radiation characterization of a dual
errors over time, and the probability of undetected errors over
core leon3-ft processor. In European Conference on
time. Although relatively small, the probability of undetected
Radiation and Its Effects on Components and Systems,
errors in the system may represent a serious issue in certain
2011, pages 938–944.
conditions, where the system is expected to operate without
[9] C. Bottoni et al. Heavy ions test result on a 65nm sparc-
maintenance. Our analysis shows that the biggest contributors
v8 radiation-hard microprocessor. In IEEE International
to the occurrence of undetected errors are the nPC register
Reliability Physics Symposium, pages 5F–5, 2014.
TABLE I: SPARC V8 Probability of Failure Over Time [10] M. Daněk et al. The leon3 processor. In UTLEON3:
Prob. of TE Prob. of Error Prob. of Undetected Errors Exploring Fine-Grain Multi-Threading in FPGAs, pages
(million hours) (million hours) (million hours) 9–14. Springer, 2013.
SPARC V8
0.133 0.268 0.097
Pipeline



Authorized licensed use limited to: Concordia University Library. Downloaded on January 26,2022 at 15:28:14 UTC from IEEE Xplore. Restrictions apply.

You might also like