0% found this document useful (0 votes)
3 views

LECT-7A-Software Reliability metrics

The document outlines the concepts of software reliability, including functional and non-functional requirements, and various reliability metrics such as Probability of Failure on Demand (POFOD) and Mean Time to Failure (MTTF). It emphasizes the importance of specifying reliability quantitatively and discusses the need for safety and security specifications in system design. Additionally, it covers hazard analysis, risk assessment, and the distinction between safety and security requirements.

Uploaded by

mtulivukidd
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

LECT-7A-Software Reliability metrics

The document outlines the concepts of software reliability, including functional and non-functional requirements, and various reliability metrics such as Probability of Failure on Demand (POFOD) and Mean Time to Failure (MTTF). It emphasizes the importance of specifying reliability quantitatively and discusses the need for safety and security specifications in system design. Additionally, it covers hazard analysis, risk assessment, and the distinction between safety and security requirements.

Uploaded by

mtulivukidd
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 37

Software Reliability

Teresa Abuya
Functional and Non-functional
Requirements
• System functional requirements may
specify error checking, recovery features,
and system failure protection
• System reliability and availability are
specified as part of the non-functional
requirements for the system.
System Reliability Specification
• Hardware reliability
– probability a hardware component fails
• Software reliability
– probability a software component will produce an
incorrect output
– software does not wear out
– software can continue to operate after a bad result
• Operator reliability
– probability system user makes an error
Failure Probabilities
• If there are two independent components in a
system and the operation of the system depends on
them both then
P(S) = P(A) + P(B)
• If the components are replicated then the
probability of failure is
P(S) = P(A)n
meaning that all components fail at once
Functional Reliability Requirements

• The system will check the all operator


inputs to see that they fall within their
required ranges.
• The system will check all disks for bad
blocks each time it is booted.
• The system must be implemented in using a
standard implementation of Ada.
Non-functional Reliability
Specification
• The required level of reliability must be
expressed quantitatively.
• Reliability is a dynamic system attribute.
• Source code reliability specifications are
meaningless (e.g. N faults/1000 LOC)
• An appropriate metric should be chosen to
specify the overall system reliability.
Hardware Reliability Metrics
• Hardware metrics are not suitable for
software since its metrics are based on
notion of component failure
• Software failures are often design failures
• Often the system is available after the
failure has occurred
• Hardware components can wear out
Software Reliability Metrics
• Reliability metrics are units of measure for system
reliability
• System reliability is measured by counting the
number of operational failures and relating these
to demands made on the system at the time of
failure
• A long-term measurement program is required to
assess the reliability of critical systems
Reliability Metrics - part 1
• Probability of Failure on Demand (POFOD)
– POFOD = 0.001
– For one in every 1000 requests the service fails
per time unit
• Rate of Fault Occurrence (ROCOF)
– ROCOF = 0.02
– Two failures for each 100 operational time units
of operation
Reliability Metrics - part 2
• Mean Time to Failure (MTTF)
– average time between observed failures (aka
MTBF)
• Availability = MTBF / (MTBF+MTTR)
– MTBF = Mean Time Between Failure
– MTTR = Mean Time to Repair
• Reliability = MTBF / (1+MTBF)
Time Units
• Raw Execution Time
– non-stop system
• Calendar Time
– If the system has regular usage patterns
• Number of Transactions
– demand type transaction systems
Availability
• Measures the fraction of time system is
really available for use
• Takes repair and restart times into account
• Relevant for non-stop continuously running
systems (e.g. traffic signal)
Probability of Failure on Demand
• Probability system will fail when a service request
is made
• Useful when requests are made on an intermittent
or infrequent basis
• Appropriate for protection systems service
requests may be rare and consequences can be
serious if service is not delivered
• Relevant for many safety-critical systems with
exception handlers
Rate of Fault Occurrence
• Reflects rate of failure in the system
• Useful when system has to process a large
number of similar requests that are
relatively frequent
• Relevant for operating systems and
transaction processing systems
Mean Time to Failure
• Measures time between observable system
failures
• For stable systems MTTF = 1/ROCOF
• Relevant for systems when individual
transactions take lots of processing time
(e.g. CAD or WP systems)
Failure Consequences - part 1

• Reliability does not take consequences into


account
• Transient faults have no real consequences
but other faults might cause data loss or
corruption
• May be worthwhile to identify different
classes of failure, and use different metrics
for each
Failure Consequences - part 2
• When specifying reliability both the number of
failures and the consequences of each matter
• Failures with serious consequences are more
damaging than those where repair and recovery is
straightforward
• In some cases, different reliability specifications
may be defined for different failure types
Failure Classification
• Transient - only occurs with certain inputs
• Permanent - occurs on all inputs
• Recoverable - system can recover without
operator help
• Unrecoverable - operator has to help
• Non-corrupting - failure does not corrupt system
state or data
• Corrupting - system state or data are altered
Building Reliability Specification
• For each sub-system analyze consequences
of possible system failures
• From system failure analysis partition
failure into appropriate classes
• For each class send out the appropriate
reliability metric
Examples
Failure Class Example Metric

Permanent ATM fails to


Non-corrupting operate with any ROCOF = .0001
card, must restart to Time unit = days
correct

Transient Magnetic stripe POFOD = .0001


Non-corrupting can't be read on Time unit =
undamaged card transactions
Specification Validation
• It is impossible to empirically validate high
reliability specifications
• No database corruption really means
POFOD class < 1 in 200 million
• If each transaction takes 1 second to verify,
simulation of one day’s transactions takes
3.5 days
Statistical Reliability Testing
• Test data used, needs to follow typical
software usage patterns
• Measuring numbers of errors needs to be
based on errors of omission (failing to do
the right thing) and errors of commission
(doing the wrong thing)
Difficulties with Statistical
Reliability Testing
• Uncertainty when creating the operational
profile
• High cost of generating the operational
profile
• Statistical uncertainty problems when high
reliabilities are specified
Safety Specification
• Each safety specification should be specified
separately
• These requirements should be based on hazard and
risk analysis
• Safety requirements usually apply to the system as
a whole rather than individual components
• System safety is an an emergent system property
Safety Life Cycle - part 1
• Concept and scope definition
• Hazard and risk analysis
• Safety requirements specification
– safety requirements derivation
– safety requirements allocation
• Planning and development
– safety related systems development
– external risk reduction facilities
Safety Life Cycle - part 2
• Deployment
– safety validation
– installation and commissioning
• Operation and maintenance
• System decommissioning
Safety Processes
• Hazard and risk analysis
– assess the hazards and risks associated with the system
• Safety requirements specification
– specify system safety requirements
• Designation of safety-critical systems
– identify sub-systems whose incorrect operation can
compromise entire system safety
• Safety validation
– check overall system safety
Hazard Analysis Stages
• Hazard identification
– identify potential hazards that may arise
• Risk analysis and hazard classification
– assess risk associated with each hazard
• Hazard decomposition
– seek to discover potential root causes for each hazard
• Risk reduction assessment
– describe how each hazard is to be taken into account
when system is designed
Fault-tree Analysis
• Hazard analysis method that starts with an
identified fault and works backwards to the
cause of the fault
• Can be used at all stages of hazard analysis
• It is a top-down technique, that may be
combined with a bottom-up hazard analysis
techniques that start with system failures
that lead to hazards
Fault-tree Analysis Steps
• Identify hazard
• Identify potential causes of hazards
• Link combinations of alternative causes
using “or” or “and” symbols as appropriate
• Continue process until “root” causes are
identified (result will be an and/or tree or a
logic circuit) the causes are the “leaves”
How does it work?
• What would a fault tree look like for a fault
tree describing the causes for a hazard like
“data deleted”?
Risk Assessment
• Assess the hazard severity, hazard probability, and
accident probability
• Outcome of risk assessment is a statement of
acceptability
– Intolerable (can never occur)
– ALARP (as low as possible given cost and schedule
constraints)
– Acceptable (consequences are acceptable and no extra
cost should be incurred to reduce it further)
Risk Acceptability
• Determined by human, social, and political
considerations
• In most societies, the boundaries between
regions are pushed upwards with time
(meaning risk becomes less acceptable)
• Risk assessment is always subjective (what
is acceptable to one person is ALARP to
another)
Risk Reduction
• System should be specified so that hazards do not
arise or result in an accident
• Hazard avoidance
– system designed so hazard can never arise during normal
operation
• Hazard detection and removal
– system designed so that hazards are detected and
neutralized before an accident can occur
• Damage limitation
– system designed to minimized accident consequences
Security Specification
• Similar to safety specification
– not possible to specify quantitatively
– usually stated in “system shall not” terms rather
than “system shall” terms
• Differences
– no well-defined security life cycle yet
– security deals with generic threats rather than
system specific hazards
Security Specification Stages - part 1
• Asset identification and evaluation
– data and programs identified with their level of
protection
– degree of protection depends on asset value
• Threat analysis and risk assessment
– security threats identified and risks associated with each
is estimated
• Threat assignment
– identified threats are related to assets so that asset has a
list of associated threats
Security Specification Stages - part 2
• Technology analysis
– available security technologies and their applicability
against the threats
• Security requirements specification
– where appropriate these will identify the security
technologies that may be used to protect against different
threats to the system

You might also like