2.fall 23 Lecture2QualityMetrics
2.fall 23 Lecture2QualityMetrics
Measurement
CSE 4495 - Lecture 2 - 22/10/2022
Quality Attributes
• Availability
• Ability to carry out a task when needed, to minimize
“downtime”, and to recover from failures.
• Modifiability
• Ability to enhance software by fixing issues, adding
features, and adapting to new environments.
• Testability
• Ability to easily identify faults in a system.
• Probability that a fault will result in a visible failure.
Quality Attributes
• Performance
• Ability to meet timing requirements. When events occur,
the system must respond quickly.
• Security
• Ability to protect information from unauthorized access
while providing service to authorized users.
• Scalability
• Ability to “grow” the system to process more concurrent
requests.
Quality Attributes
• Interoperability
• Ability to exchange information with and provide
functionality to other systems.
• Usability
• Ability to enable users to perform tasks and provide
support to users.
• How easy it is to use the system, learn features, adapt
to
meet user needs, and increase confidence and
satisfaction in usage.
Quality Attributes
● Resilience
● Supportability
● Portability
● Development Efficiency
● Time to Deliver
● Tool Support
● Geographic Distribution
Quality Attributes
Quality Attributes
• These qualities often conflict.
• Fewer subsystems improves performance, but hurts
modifiability.
• Redundant data helps availability, but lessens security.
• Localizing safety-critical features ensures safety, but
degrades performance.
• Important to decide what is important, and set a
threshold on when it is “good enough”.
Quality Attributes
Our Focus
• Dependability
• Availability
• Performance
• Scalability
• Security
• (Others important - but not enough time for all!)
Dependability
When is Software Ready for Release?
• That is is robust.
Correctness
Correctness
• A program is correct if it is always consistent with
its specification.
• Depends on quality and detail of requirements.
• Easy to show with respect to a weak specification.
• Often impossible to prove with a detailed specification.
• Correctness is rarely provably achieved.
Reliability
Reliability
• Statistical approximation of correctness.
• The likelihood of correct behavior from some
period of observed behavior.
• Time period, number of system executions
• Measured relative to a specification and usage
profile (expected pattern of interaction).
• Dependent on how the system is used by a type of user.
Dependence on Specifications
Dependence on Specifications
• Correctness and reliability:
• Success relative to the strength of the specification.
• Hard to meaningfully prove anything for strong spec.
• Severity of a failure is not considered.
• Some failures are worse than others.
• Safety revolves around a restricted specification.
• Robustness focuses on everything not specified.
Safety
Safety
• Safety is the ability to avoid hazards.
• Hazard = defined undesirable situation.
• Generally serious problems.
• Relies on a specification of hazards.
• Defines what the hazard is, how it will be avoided in the
software.
• We prove or show evidence that the hazard is avoided.
• Only concerned with hazards, so proofs often possible.
Robustness
Robustness
• Software that is “correct” may fail when the
assumptions of its design are violated.
• How it fails matters.
• Software that “gracefully” fails is
robust.
• Design the software to counteract unforeseen issues or
perform graceful degradation of services.
• Look at how a program could fail and handle those situations.
• Cannot be proved, but is a goal to aspire to.
Dependability Property Relations
Measuring Dependability
• Must establish criteria for when the system is
dependable enough to release.
• Correctness hard to prove conclusively.
• Robustness/Safety important, but do not demonstrate
functional correctness.
• Reliability is the basis for arguing
dependability.
• Can be measured.
• Can be demonstrated through testing.
Let’s take a break!
Measuring Reliability
What is Reliability?
What is Reliability?
• Probability of failure-free operation for a specified
time in a specified environment for a given
purpose.
• Depends on system and type of user.
• How well users think the system provides services
they require.
Metric 1: Availability
Metric 1: Availability
• Can the software carry out a task when needed?
• Encompasses reliability and repair.
• Does the system tend to show correct behavior?
• Can the system recover from an error?
• The ability to mask or repair faults such that cumulative
outages do not exceed a required value over a time
interval.
• Both a reliability measurement AND an independent
quality attribute.
Metric 1: Availability
Metric 1: Availability
• Measured as (uptime) / (total time observed)
• Takes repair and restart time into account.
• Does not consider incorrect computations.
• Only considers crashes/freezing.
• 0.9 = down for 144 minutes a day.
• 0.99 =14.4 minutes
• 0.999 = 84 seconds
• 0.9999 = 8.4 seconds
Availability Considerations
Availability Considerations
• Time to repair is the time until the failure is no
longer observable.
• Can be hard to define. Stuxnet caused problems for
months. How does that impact availability?
• Software can remain partially available more easily
than hardware.
• If code containing fault is executed, but system is
able to recover, there was no failure.
Metric 2: Probability of Failure on
Demand (POFOD)
Metric 2: Probability of Failure on Demand (POFOD)
Probabilistic Availability
• (alternate definition)
• Probability that system will provide a service within
required bounds over a specified time interval.
• Availability = MTBF / (MTBF + MTTR)
• MTBF: Mean time between failures.
• MTTR: Mean time to repair
Reliability Metrics
Reliability Metrics
• Availability: (uptime) / (total time observed)
• POFOD: (failures/ requests over period)
• ROCOF: (failures / total time observed)
• MTBF: Average time between observed failures.
• MTTR: Average time to recover from failure.
Reliability Examples
Reliability Examples
• Provide software with 10000 requests.
• Wrong result on 35 requests, crash on 5 requests.
• What is the POFOD?
Reliability Examples
• Provide software with 10000 requests.
• Wrong result on 35 requests, crash on 5 requests.
• What is the POFOD?
• 40 / 10000 = 0.0004
• Run the software for 144 hours
• (6 million requests). Software failed on 6 requests.
• What is the ROCOF? The POFOD?
Reliability Examples
Reliability Examples
• Provide software with 10000 requests.
• Wrong result on 35 requests, crash on 5 requests.
• What is the POFOD?
• 40 / 10000 = 0.0004
• Run the software for 144 hours
• (6 million requests). Software failed on 6 requests.
• What is the ROCOF? The POFOD?
• ROCOF = 6/144 = 1/24 = 0.04
• POFOD = 6/6000000 = (10-6)
Reliability Examples
Additional Examples
• Want availability of at least 99%, POFOD of less than 0.1,
and ROCOF of less than 2 failures per 8 hours.
• After 7 full days, 972 requests were made.
• Product failed 64 times (37 crashes, 27 bad output).
• Average of 2 minutes to restart after each failure.
• What is the availability, POFOD, and ROCOF?
• Can we calculate MTBF?
• Is the product ready to ship? If not, why not?
Additional Examples
Additional Examples
• Want availability of at least 99%, POFOD of less than 0.1,
and ROCOF of less than 2 failures per 8 hours.
• After 7 full days, 972 requests were made.
• Product failed 64 times (37 crashes, 27 bad output).
• Average of 2 minutes to restart after each failure.
• ROCOF: 64/168 hours
• = 0.38/hour
• = 3.04/8 hour work day
Additional Examples
Additional Examples
• Want availability of at least 99%, POFOD of less than 0.1,
and ROCOF of less than 2 failures per 8 hours.
• After 7 full days, 972 requests were made.
• Product failed 64 times (37 crashes, 27 bad output).
• Average of 2 minutes to restart after each failure.
• POFOD: 64/972 = 0.066
• Availability: Down for (37*2) = 74 minutes / 168
hrs
• = 74/10089 minutes = 0.7% of the time =
99.3%
Additional Examples
Additional Examples
• Can we calculate MTBF?
• No - need timestamps. We know how long they were down
(on average), but not when each crash occurred.
• Is the product ready to ship?
• No. Availability/POFOD are good, but ROCOF is too low.
Exam problem practice
Additional Examples
• You manage an online service that sells downloadable video
recordings of classic movies. A typical download takes one
hour, and an interrupted download must be restarted from
the beginning. The number of customers engaged in a
download at any given time ranges from about 10 to about
150 during peak hours. On average, your system goes
down (dropping all connections) about two times per week,
for an average of three minutes each time. If you can
increase availability by
• reducing downtime or
• double mean time between failures,
• but not both, which will you choose? Why?
Exam problem practice
Additional Examples
• You manage an online service that sells downloadable video
recordings of classic movies. If the system crashes once on
avg. everyday, and it takes about an hour usually to restart
the system what is the probabilistic availability of this
system?
Quality Attributes-
Performance,
Scalability and Security
Performance
• Ability to meet timing requirements.
• Characterize pattern of input events and responses
• Requests served per minute.
• Variation in output time.
• Driving factor in software design.
• Often at expense of other quality attributes.
• All systems have performance requirements.
Performance Measurements
• Latency: The time between the arrival of the stimulus and the
system’s response to it.
• Response Jitter: The allowable variation in latency.
• Throughput: Usually number of transactions the system can
process in a unit of time.
• Deadlines in processing: Points where processing must
have reached a particular stage.
• Number of events not processed because the system
was too busy to respond.
Measurements - Latency
• Time it takes to complete an interaction.
• Responsiveness - how quickly system responds to
routine tasks.
• Key consideration: user productivity.
• How responsive is the user’s device? The system?
• Measured probabilistically (... 95% of the time)
• “Under load of 350 updates per minute, 90% of ‘open
account’ requests should complete within 10 seconds.”
Measurements - Latency
• Turnaround time = time to complete larger tasks.
• Can task be completed in available time?
• Impact on system while running?
• Can partial results be produced?
• Ex: “With daily throughput of 850,000 requests, process
should take < 4 hours, including writing to a database.”
• Ex: “It must be possible to resynchronize monitoring
stations and reset database within 5 minutes.”
Measurements - Response Jitter
54
Security
• Confidentiality
• Data and services protected from unauthorized access.
• A hacker cannot access your tax returns on an IRS server.
• Integrity
• Data/services not subject to unauthorized manipulation.
• Your grade has not changed since assigned.
• Availability
• The system will be available for legitimate use.
• A DDOS attack will not prevent your purchase.
Supporting CIA
• Authentication - Verifies
identities of all parties.
• Nonrepudiation - Guarantees
that sender cannot deny
sending, and recipient cannot
deny receiving.
• Authorization - Grants privilege
of performing a task.
Security Approaches