SlideShare a Scribd company logo
An Automated Approach for
Recommending When to Stop
Performance Tests
Hammam
AlGhamdi
Weiyi
Shang 
Mark D.
Syer 
Ahmed E.
Hassan
1
Failures in ultra
large-scale systems
are often due to
performance issues
rather than
functional issues
2
A 25-minutes service outage in 2013
cost Amazon approximately $1.7M
3
4
Performance testing is essential to
prevent these failures
System under
test
requests
requests
requests
Performance
counters, e.g.,
CPU, memory, I/O
and response time
Pre-defined
workload
Performance testing
environment
5
Determining the length of a
performance test is challenging
Time
Repetitive data is generated
from the test
Optimal stopping
time
6
Determining the length of a
performance test is challenging
Time
Stopping too early,
misses performance
issues
Stopping too late, 
delays the release
and wastes testing
resources
Optimal stopping
time
7
Our approach for recommending
when to stop a performance test
Collected data
Likelihood of
repetitiveness
First
derivatives
Whether to
stop the test
1) Collect the
already-generated
data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop the
test
STOP
No
Yes
8
Our approach for recommending
when to stop a performance test
Collected data
Likelihood of
repetitiveness
First
derivatives
Whether to
stop the test
1) Collect the
already-generated
data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop the
test
STOP
No
Yes
9
Our approach for recommending
when to stop a performance test
Collected data
Likelihood of
repetitiveness
First
derivatives
Whether to
stop the test
1) Collect the
already-generated
data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop the
test
STOP
No
Yes
10
Our approach for recommending
when to stop a performance test
Collected data
Likelihood of
repetitiveness
First
derivatives
Whether to
stop the test
1) Collect the
already-generated
data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop the
test
STOP
No
Yes
11
Time
Current time
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
Step 1: Collect
the data that the
test generates
Performance counters,
e.g., CPU, memory, I/O
and response time
12
Time
Step 2: Measure
the likelihood of
repetitiveness
Select a random time period (e.g. 30 min) 
Current time
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
A
13
Time
Current time
Search for another non-overlapping time period that is
NOT statistically significantly different. 
…
Step 2: Measure
the likelihood of
repetitiveness
… B…A
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
14
Time
Wilcoxon test
between the
distributions of
every performance
counter across both
periods
…
Current time
Step 2: Measure
the likelihood of
repetitiveness
B…A
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
15
Step 2: Measure
the likelihood of
repetitiveness
Response
time
CPU Memory IO
p-values 0.0258 0.313 0.687 0.645
Statistically significantly
different in response time!
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
Time
Wilcoxon test
between every
performance
counter from
both periods
…
Current
time
B…A
16
Step 2: Measure
the likelihood of
repetitiveness
Response
time
CPU Memory IO
p-values 0.67 0.313 0.687 0.645
Find a time period that is NOT
statistically significantly different
in ALL performance metrics!
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
Time
Wilcoxon test
between every
performance
counter from
both periods
…
Current
time
B…A
17
Find a period that
is NOT statistically
significantly
different?
Yes. Repetitive!

No. Not
repetitive!
Step 2: Measure
the likelihood of
repetitiveness
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
Time
Wilcoxon test
between every
performance
counter from both
periods
…
Current time
B…A
18
Step 2: Measure
the likelihood of
repetitiveness
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
Repeat this process a large
number (e.g., 1,000) times
to calculate the:

likelihood of
repetitiveness
19
Step 2: Measure
the likelihood of
repetitiveness
30 min
40 min
Time
…
1h 10 min
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
A new likelihood of repetitiveness is
measured periodically, e.g., every 10 min, in
order to get more frequent feedback on the
repetitiveness
20
Step 2: Measure
the likelihood of
repetitiveness
Time
likelihood of
repetitiveness
00:00
 24:00
1%
100%
Stabilization
(little new information)
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
The likelihood of repetitiveness
eventually starts stabilizing.
21
Step 3: Extrapolate
the likelihood of
repetitiveness
Time
likelihood of
repetitiveness
00:00
 24:00
1%
100%
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
To know when the repetitiveness stabilizes,
we calculate the first derivative.
22
Step 4: Determine
whether to stop
the test
Time
likelihood of
repetitivenes
s
00:00
 24:00
1%
100%
Stop the test if the
fist derivative is
close to 0.
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
To know when the repetitiveness stabilizes, we
calculate the first derivative.
23
Our approach for recommending
when to stop a performance test
Collected data
Likelihood of
repetitiveness
First
derivatives
Whether to
stop the test
1) Collect the
already-generated
data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop the
test
STOP
No
Yes
PetClinic
 Dell DVD Store
CloudStore
24
We conduct 24-hour performance
tests on three systems
25
We evaluate whether our
approach: 
Stops the test too
early?
Stops the test too late?
Optimal
stopping time
1 2
26
Pre-stopping data
 Post-stopping data
Time
STOP
Does our approach stop the test
too early?
00:00
 24:00
1
1) Select a random time
period from the post-
stopping data
2) Check if the random
time period has a
repetitive one from the
pre-stopping data
The test is likely to generate little new data,
after the stopping times (preserving more
than 91.9% of the information).
Repeat
1,000
times
27
We apply our evaluation approach in RQ1
at the end of every hour during the test to
find the most cost effective stopping time.
Does our approach stop the test
too late?
2
1h
 2h
Time
…
10h
 20h
 24h
The most cost-effective
stopping time has:

1.  A big difference to
the previous hour
2.  A small difference to
the next hour
28
1%
100%
00:00
 04:00
05:00
06:00
Does our approach stop the test
too late?
2
likelihood of
repetitiveness
29
There is a short delay
between the
recommended stopping
times and the most cost
effective stopping times
(The majority are under
4-hour delay).
Short
delay
Does our approach stop the test
too late?
2
30
Determining the length of a
performance test is challenging
Time
Stopping too early,
misses performance
issues
Stopping too late, 
delays the release
and wastes testing
resources
Optimal stopping
time
31
30
Determining the length of a
performance test is challenging
Time
Stopping too early,
misses performance
issues
Stopping too late, 
delays the release
and wastes testing
resources
Optimal stopping
time
32
Our approach for recommending
when to stop a performance test
Collected data
Likelihood of
repetitiveness
First
derivatives
Whether to
stop the test
1) Collect the
already-generated
data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop the
test
STOP
No
Yes
33
30
Determining the length of a
performance test is challenging
Time
Stopping too early,
misses performance
issues
Stopping too late, 
delays the release
and wastes testing
resources
Optimal stopping
time
32
Our approach for recommending
when to stop a performance test
Collected data
Likelihood of
repetitiveness
First
derivatives
Whether to
stop the test
1) Collect the
already-generated
data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop the
test
STOP
No
Yes
34
Pre-stopping data
 Post-stopping data
Time
STOP
Does our approach stop the test
too early?
00:00
 24:00
1
1) Select a random time
period from the post-
stopping data
2) Check if the random
time period has a
repetitive one from the
pre-stopping data
The test is likely to generate little new data,
after the stopping times (preserving more
than 91.9% of the information).
Repeat
1,000
times
35
30
Determining the length of a
performance test is challenging
Time
Stopping too early,
misses performance
issues
Stopping too late, 
delays the release
and wastes testing
resources
Optimal stopping
time
32
Pre-stopping data
 Post-stopping data
Time
STOP
Does our approach stop the test
too early?
00:00
 24:00
1
1) Select a random time
period from the post-
stopping data
2) Check if the random
time period has a
repetitive one from the
pre-stopping data
The test is likely to generate little new data,
after the stopping times (preserving more
than 91.9% of the information).
Repeat
1,000
times
32
Our approach for recommending
when to stop a performance test
Collected data
Likelihood of
repetitiveness
First
derivatives
Whether to
stop the test
1) Collect the
already-generated
data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop the
test
STOP
No
Yes
36
There is a short delay
between the
recommended stopping
times and the most cost
effective stopping times
(The majority are under
4-hour delay).
Short
delay
Does our approach stop the test
too late?
2
37
30
Determining the length of a
performance test is challenging
Time
Stopping too early,
misses performance
issues
Stopping too late, 
delays the release
and wastes testing
resources
Optimal stopping
time
32
Pre-stopping data
 Post-stopping data
Time
STOP
Does our approach stop the test
too early?
00:00
 24:00
1
1) Select a random time
period from the post-
stopping data
2) Check if the random
time period has a
repetitive one from the
pre-stopping data
The test is likely to generate little new data,
after the stopping times (preserving more
than 91.9% of the information).
Repeat
1,000
times
33
There is a short delay
between the
recommended stopping
times and the most cost
effective stopping times
(The majority are under
4-hour delay).
Short
delay
Does our approach stop the test
too late?
2
32
Our approach for recommending
when to stop a performance test
Collected data
Likelihood of
repetitiveness
First
derivatives
Whether to
stop the test
1) Collect the
already-generated
data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop the
test
STOP
No
Yes
Ad

More Related Content

Similar to An Automated Approach for Recommending When to Stop Performance Tests (20)

Compsac2010 malik
Compsac2010 malikCompsac2010 malik
Compsac2010 malik
SAIL_QU
 
Automated Parameterization of Performance Models from Measurements
Automated Parameterization of Performance Models from MeasurementsAutomated Parameterization of Performance Models from Measurements
Automated Parameterization of Performance Models from Measurements
Weikun Wang
 
Input modeling
Input modelingInput modeling
Input modeling
De La Salle University-Manila
 
Manual estimation approach for Pre-sale phase of a project
Manual estimation approach for Pre-sale phase of a projectManual estimation approach for Pre-sale phase of a project
Manual estimation approach for Pre-sale phase of a project
Vladimir Primakov (Volodymyr Prymakov)
 
Test Case Prioritization for Acceptance Testing of Cyber Physical Systems
Test Case Prioritization for Acceptance Testing of Cyber Physical SystemsTest Case Prioritization for Acceptance Testing of Cyber Physical Systems
Test Case Prioritization for Acceptance Testing of Cyber Physical Systems
Lionel Briand
 
Log-Based Slicing for System-Level Test Cases (ISSTA 2021)
Log-Based Slicing for System-Level Test Cases (ISSTA 2021)Log-Based Slicing for System-Level Test Cases (ISSTA 2021)
Log-Based Slicing for System-Level Test Cases (ISSTA 2021)
Donghwan Shin
 
Building resilient applications
Building resilient applicationsBuilding resilient applications
Building resilient applications
Nuno Caneco
 
Log-Based Slicing for System-Level Test Cases
Log-Based Slicing for System-Level Test CasesLog-Based Slicing for System-Level Test Cases
Log-Based Slicing for System-Level Test Cases
Lionel Briand
 
Deadlocks Part- III.pdf
Deadlocks Part- III.pdfDeadlocks Part- III.pdf
Deadlocks Part- III.pdf
Harika Pudugosula
 
Chapter06
Chapter06Chapter06
Chapter06
jh.cloudnine
 
Big Data Makes The Flake Go Away
Big Data Makes The Flake Go AwayBig Data Makes The Flake Go Away
Big Data Makes The Flake Go Away
Dave Cadwallader
 
Applying principles of chaos engineering to Serverless (CodeMotion Berlin)
Applying principles of chaos engineering to Serverless (CodeMotion Berlin)Applying principles of chaos engineering to Serverless (CodeMotion Berlin)
Applying principles of chaos engineering to Serverless (CodeMotion Berlin)
Yan Cui
 
Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...
Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...
Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...
Codemotion
 
Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...
Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...
Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...
Codemotion
 
Mattias Ratert - Incremental Scenario Testing
Mattias Ratert - Incremental Scenario TestingMattias Ratert - Incremental Scenario Testing
Mattias Ratert - Incremental Scenario Testing
TEST Huddle
 
Setting up and managing a test lab
Setting up and managing a test labSetting up and managing a test lab
Setting up and managing a test lab
kurkj
 
Good practices (and challenges) for reproducibility
Good practices (and challenges) for reproducibilityGood practices (and challenges) for reproducibility
Good practices (and challenges) for reproducibility
Javier Quílez Oliete
 
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos Toolkit
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos ToolkitExploring OpenFaaS autoscalability on Kubernetes with the Chaos Toolkit
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos Toolkit
Sylvain Hellegouarch
 
Unit iios process scheduling and synchronization
Unit iios process scheduling and synchronizationUnit iios process scheduling and synchronization
Unit iios process scheduling and synchronization
donny101
 
QMRAS Project Presentation
QMRAS Project PresentationQMRAS Project Presentation
QMRAS Project Presentation
Gary Spencer
 
Compsac2010 malik
Compsac2010 malikCompsac2010 malik
Compsac2010 malik
SAIL_QU
 
Automated Parameterization of Performance Models from Measurements
Automated Parameterization of Performance Models from MeasurementsAutomated Parameterization of Performance Models from Measurements
Automated Parameterization of Performance Models from Measurements
Weikun Wang
 
Test Case Prioritization for Acceptance Testing of Cyber Physical Systems
Test Case Prioritization for Acceptance Testing of Cyber Physical SystemsTest Case Prioritization for Acceptance Testing of Cyber Physical Systems
Test Case Prioritization for Acceptance Testing of Cyber Physical Systems
Lionel Briand
 
Log-Based Slicing for System-Level Test Cases (ISSTA 2021)
Log-Based Slicing for System-Level Test Cases (ISSTA 2021)Log-Based Slicing for System-Level Test Cases (ISSTA 2021)
Log-Based Slicing for System-Level Test Cases (ISSTA 2021)
Donghwan Shin
 
Building resilient applications
Building resilient applicationsBuilding resilient applications
Building resilient applications
Nuno Caneco
 
Log-Based Slicing for System-Level Test Cases
Log-Based Slicing for System-Level Test CasesLog-Based Slicing for System-Level Test Cases
Log-Based Slicing for System-Level Test Cases
Lionel Briand
 
Big Data Makes The Flake Go Away
Big Data Makes The Flake Go AwayBig Data Makes The Flake Go Away
Big Data Makes The Flake Go Away
Dave Cadwallader
 
Applying principles of chaos engineering to Serverless (CodeMotion Berlin)
Applying principles of chaos engineering to Serverless (CodeMotion Berlin)Applying principles of chaos engineering to Serverless (CodeMotion Berlin)
Applying principles of chaos engineering to Serverless (CodeMotion Berlin)
Yan Cui
 
Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...
Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...
Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...
Codemotion
 
Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...
Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...
Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...
Codemotion
 
Mattias Ratert - Incremental Scenario Testing
Mattias Ratert - Incremental Scenario TestingMattias Ratert - Incremental Scenario Testing
Mattias Ratert - Incremental Scenario Testing
TEST Huddle
 
Setting up and managing a test lab
Setting up and managing a test labSetting up and managing a test lab
Setting up and managing a test lab
kurkj
 
Good practices (and challenges) for reproducibility
Good practices (and challenges) for reproducibilityGood practices (and challenges) for reproducibility
Good practices (and challenges) for reproducibility
Javier Quílez Oliete
 
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos Toolkit
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos ToolkitExploring OpenFaaS autoscalability on Kubernetes with the Chaos Toolkit
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos Toolkit
Sylvain Hellegouarch
 
Unit iios process scheduling and synchronization
Unit iios process scheduling and synchronizationUnit iios process scheduling and synchronization
Unit iios process scheduling and synchronization
donny101
 
QMRAS Project Presentation
QMRAS Project PresentationQMRAS Project Presentation
QMRAS Project Presentation
Gary Spencer
 

More from SAIL_QU (20)

Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...
SAIL_QU
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
SAIL_QU
 
Improving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load testsImproving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load tests
SAIL_QU
 
Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...
SAIL_QU
 
Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...
SAIL_QU
 
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
SAIL_QU
 
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
SAIL_QU
 
Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...
SAIL_QU
 
Which Log Level Should Developers Choose For a New Logging Statement?
Which Log Level Should Developers Choose For a New Logging Statement?Which Log Level Should Developers Choose For a New Logging Statement?
Which Log Level Should Developers Choose For a New Logging Statement?
SAIL_QU
 
Towards Just-in-Time Suggestions for Log Changes
Towards Just-in-Time Suggestions for Log ChangesTowards Just-in-Time Suggestions for Log Changes
Towards Just-in-Time Suggestions for Log Changes
SAIL_QU
 
The Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution AnalysesThe Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution Analyses
SAIL_QU
 
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
SAIL_QU
 
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
SAIL_QU
 
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
SAIL_QU
 
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
SAIL_QU
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
SAIL_QU
 
What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?
SAIL_QU
 
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
SAIL_QU
 
Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...
SAIL_QU
 
Measuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with ProfessionalsMeasuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with Professionals
SAIL_QU
 
Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...
SAIL_QU
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
SAIL_QU
 
Improving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load testsImproving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load tests
SAIL_QU
 
Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...
SAIL_QU
 
Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...
SAIL_QU
 
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
SAIL_QU
 
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
SAIL_QU
 
Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...
SAIL_QU
 
Which Log Level Should Developers Choose For a New Logging Statement?
Which Log Level Should Developers Choose For a New Logging Statement?Which Log Level Should Developers Choose For a New Logging Statement?
Which Log Level Should Developers Choose For a New Logging Statement?
SAIL_QU
 
Towards Just-in-Time Suggestions for Log Changes
Towards Just-in-Time Suggestions for Log ChangesTowards Just-in-Time Suggestions for Log Changes
Towards Just-in-Time Suggestions for Log Changes
SAIL_QU
 
The Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution AnalysesThe Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution Analyses
SAIL_QU
 
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
SAIL_QU
 
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
SAIL_QU
 
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
SAIL_QU
 
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
SAIL_QU
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
SAIL_QU
 
What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?
SAIL_QU
 
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
SAIL_QU
 
Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...
SAIL_QU
 
Measuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with ProfessionalsMeasuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with Professionals
SAIL_QU
 
Ad

Recently uploaded (20)

Building the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdfBuilding the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdf
Cheryl Hung
 
Middle East and Africa Cybersecurity Market Trends and Growth Analysis
Middle East and Africa Cybersecurity Market Trends and Growth Analysis Middle East and Africa Cybersecurity Market Trends and Growth Analysis
Middle East and Africa Cybersecurity Market Trends and Growth Analysis
Preeti Jha
 
Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?
Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?
Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?
Christian Folini
 
Developing Product-Behavior Fit: UX Research in Product Development by Krysta...
Developing Product-Behavior Fit: UX Research in Product Development by Krysta...Developing Product-Behavior Fit: UX Research in Product Development by Krysta...
Developing Product-Behavior Fit: UX Research in Product Development by Krysta...
UXPA Boston
 
Mastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B LandscapeMastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B Landscape
marketing943205
 
Google DeepMind’s New AI Coding Agent AlphaEvolve.pdf
Google DeepMind’s New AI Coding Agent AlphaEvolve.pdfGoogle DeepMind’s New AI Coding Agent AlphaEvolve.pdf
Google DeepMind’s New AI Coding Agent AlphaEvolve.pdf
derrickjswork
 
UX for Data Engineers and Analysts-Designing User-Friendly Dashboards for Non...
UX for Data Engineers and Analysts-Designing User-Friendly Dashboards for Non...UX for Data Engineers and Analysts-Designing User-Friendly Dashboards for Non...
UX for Data Engineers and Analysts-Designing User-Friendly Dashboards for Non...
UXPA Boston
 
Refactoring meta-rauc-community: Cleaner Code, Better Maintenance, More Machines
Refactoring meta-rauc-community: Cleaner Code, Better Maintenance, More MachinesRefactoring meta-rauc-community: Cleaner Code, Better Maintenance, More Machines
Refactoring meta-rauc-community: Cleaner Code, Better Maintenance, More Machines
Leon Anavi
 
Building Connected Agents: An Overview of Google's ADK and A2A Protocol
Building Connected Agents:  An Overview of Google's ADK and A2A ProtocolBuilding Connected Agents:  An Overview of Google's ADK and A2A Protocol
Building Connected Agents: An Overview of Google's ADK and A2A Protocol
Suresh Peiris
 
ICDCC 2025: Securing Agentic AI - Eryk Budi Pratama.pdf
ICDCC 2025: Securing Agentic AI - Eryk Budi Pratama.pdfICDCC 2025: Securing Agentic AI - Eryk Budi Pratama.pdf
ICDCC 2025: Securing Agentic AI - Eryk Budi Pratama.pdf
Eryk Budi Pratama
 
Is Your QA Team Still Working in Silos? Here's What to Do.
Is Your QA Team Still Working in Silos? Here's What to Do.Is Your QA Team Still Working in Silos? Here's What to Do.
Is Your QA Team Still Working in Silos? Here's What to Do.
marketing943205
 
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Safe Software
 
RFID in Supply chain management and logistics.pdf
RFID in Supply chain management and logistics.pdfRFID in Supply chain management and logistics.pdf
RFID in Supply chain management and logistics.pdf
EnCStore Private Limited
 
Best 10 Free AI Character Chat Platforms
Best 10 Free AI Character Chat PlatformsBest 10 Free AI Character Chat Platforms
Best 10 Free AI Character Chat Platforms
Soulmaite
 
SQL Database Design For Developers at PhpTek 2025.pptx
SQL Database Design For Developers at PhpTek 2025.pptxSQL Database Design For Developers at PhpTek 2025.pptx
SQL Database Design For Developers at PhpTek 2025.pptx
Scott Keck-Warren
 
Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...
Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...
Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...
UXPA Boston
 
Secondary Storage for a microcontroller system
Secondary Storage for a microcontroller systemSecondary Storage for a microcontroller system
Secondary Storage for a microcontroller system
fizarcse
 
Right to liberty and security of a person.pdf
Right to liberty and security of a person.pdfRight to liberty and security of a person.pdf
Right to liberty and security of a person.pdf
danielbraico197
 
Longitudinal Benchmark: A Real-World UX Case Study in Onboarding by Linda Bor...
Longitudinal Benchmark: A Real-World UX Case Study in Onboarding by Linda Bor...Longitudinal Benchmark: A Real-World UX Case Study in Onboarding by Linda Bor...
Longitudinal Benchmark: A Real-World UX Case Study in Onboarding by Linda Bor...
UXPA Boston
 
Risk Analysis 101: Using a Risk Analyst to Fortify Your IT Strategy
Risk Analysis 101: Using a Risk Analyst to Fortify Your IT StrategyRisk Analysis 101: Using a Risk Analyst to Fortify Your IT Strategy
Risk Analysis 101: Using a Risk Analyst to Fortify Your IT Strategy
john823664
 
Building the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdfBuilding the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdf
Cheryl Hung
 
Middle East and Africa Cybersecurity Market Trends and Growth Analysis
Middle East and Africa Cybersecurity Market Trends and Growth Analysis Middle East and Africa Cybersecurity Market Trends and Growth Analysis
Middle East and Africa Cybersecurity Market Trends and Growth Analysis
Preeti Jha
 
Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?
Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?
Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?
Christian Folini
 
Developing Product-Behavior Fit: UX Research in Product Development by Krysta...
Developing Product-Behavior Fit: UX Research in Product Development by Krysta...Developing Product-Behavior Fit: UX Research in Product Development by Krysta...
Developing Product-Behavior Fit: UX Research in Product Development by Krysta...
UXPA Boston
 
Mastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B LandscapeMastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B Landscape
marketing943205
 
Google DeepMind’s New AI Coding Agent AlphaEvolve.pdf
Google DeepMind’s New AI Coding Agent AlphaEvolve.pdfGoogle DeepMind’s New AI Coding Agent AlphaEvolve.pdf
Google DeepMind’s New AI Coding Agent AlphaEvolve.pdf
derrickjswork
 
UX for Data Engineers and Analysts-Designing User-Friendly Dashboards for Non...
UX for Data Engineers and Analysts-Designing User-Friendly Dashboards for Non...UX for Data Engineers and Analysts-Designing User-Friendly Dashboards for Non...
UX for Data Engineers and Analysts-Designing User-Friendly Dashboards for Non...
UXPA Boston
 
Refactoring meta-rauc-community: Cleaner Code, Better Maintenance, More Machines
Refactoring meta-rauc-community: Cleaner Code, Better Maintenance, More MachinesRefactoring meta-rauc-community: Cleaner Code, Better Maintenance, More Machines
Refactoring meta-rauc-community: Cleaner Code, Better Maintenance, More Machines
Leon Anavi
 
Building Connected Agents: An Overview of Google's ADK and A2A Protocol
Building Connected Agents:  An Overview of Google's ADK and A2A ProtocolBuilding Connected Agents:  An Overview of Google's ADK and A2A Protocol
Building Connected Agents: An Overview of Google's ADK and A2A Protocol
Suresh Peiris
 
ICDCC 2025: Securing Agentic AI - Eryk Budi Pratama.pdf
ICDCC 2025: Securing Agentic AI - Eryk Budi Pratama.pdfICDCC 2025: Securing Agentic AI - Eryk Budi Pratama.pdf
ICDCC 2025: Securing Agentic AI - Eryk Budi Pratama.pdf
Eryk Budi Pratama
 
Is Your QA Team Still Working in Silos? Here's What to Do.
Is Your QA Team Still Working in Silos? Here's What to Do.Is Your QA Team Still Working in Silos? Here's What to Do.
Is Your QA Team Still Working in Silos? Here's What to Do.
marketing943205
 
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Safe Software
 
RFID in Supply chain management and logistics.pdf
RFID in Supply chain management and logistics.pdfRFID in Supply chain management and logistics.pdf
RFID in Supply chain management and logistics.pdf
EnCStore Private Limited
 
Best 10 Free AI Character Chat Platforms
Best 10 Free AI Character Chat PlatformsBest 10 Free AI Character Chat Platforms
Best 10 Free AI Character Chat Platforms
Soulmaite
 
SQL Database Design For Developers at PhpTek 2025.pptx
SQL Database Design For Developers at PhpTek 2025.pptxSQL Database Design For Developers at PhpTek 2025.pptx
SQL Database Design For Developers at PhpTek 2025.pptx
Scott Keck-Warren
 
Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...
Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...
Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...
UXPA Boston
 
Secondary Storage for a microcontroller system
Secondary Storage for a microcontroller systemSecondary Storage for a microcontroller system
Secondary Storage for a microcontroller system
fizarcse
 
Right to liberty and security of a person.pdf
Right to liberty and security of a person.pdfRight to liberty and security of a person.pdf
Right to liberty and security of a person.pdf
danielbraico197
 
Longitudinal Benchmark: A Real-World UX Case Study in Onboarding by Linda Bor...
Longitudinal Benchmark: A Real-World UX Case Study in Onboarding by Linda Bor...Longitudinal Benchmark: A Real-World UX Case Study in Onboarding by Linda Bor...
Longitudinal Benchmark: A Real-World UX Case Study in Onboarding by Linda Bor...
UXPA Boston
 
Risk Analysis 101: Using a Risk Analyst to Fortify Your IT Strategy
Risk Analysis 101: Using a Risk Analyst to Fortify Your IT StrategyRisk Analysis 101: Using a Risk Analyst to Fortify Your IT Strategy
Risk Analysis 101: Using a Risk Analyst to Fortify Your IT Strategy
john823664
 
Ad

An Automated Approach for Recommending When to Stop Performance Tests

  • 1. An Automated Approach for Recommending When to Stop Performance Tests Hammam AlGhamdi Weiyi Shang Mark D. Syer Ahmed E. Hassan 1
  • 2. Failures in ultra large-scale systems are often due to performance issues rather than functional issues 2
  • 3. A 25-minutes service outage in 2013 cost Amazon approximately $1.7M 3
  • 4. 4 Performance testing is essential to prevent these failures System under test requests requests requests Performance counters, e.g., CPU, memory, I/O and response time Pre-defined workload Performance testing environment
  • 5. 5 Determining the length of a performance test is challenging Time Repetitive data is generated from the test Optimal stopping time
  • 6. 6 Determining the length of a performance test is challenging Time Stopping too early, misses performance issues Stopping too late, delays the release and wastes testing resources Optimal stopping time
  • 7. 7 Our approach for recommending when to stop a performance test Collected data Likelihood of repetitiveness First derivatives Whether to stop the test 1) Collect the already-generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test STOP No Yes
  • 8. 8 Our approach for recommending when to stop a performance test Collected data Likelihood of repetitiveness First derivatives Whether to stop the test 1) Collect the already-generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test STOP No Yes
  • 9. 9 Our approach for recommending when to stop a performance test Collected data Likelihood of repetitiveness First derivatives Whether to stop the test 1) Collect the already-generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test STOP No Yes
  • 10. 10 Our approach for recommending when to stop a performance test Collected data Likelihood of repetitiveness First derivatives Whether to stop the test 1) Collect the already-generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test STOP No Yes
  • 11. 11 Time Current time Collected data Likelihood of repetitiveness First derivative Whether to stop the test 1) Collect the already- generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test Step 1: Collect the data that the test generates Performance counters, e.g., CPU, memory, I/O and response time
  • 12. 12 Time Step 2: Measure the likelihood of repetitiveness Select a random time period (e.g. 30 min) Current time Collected data Likelihood of repetitiveness First derivative Whether to stop the test 1) Collect the already- generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test A
  • 13. 13 Time Current time Search for another non-overlapping time period that is NOT statistically significantly different. … Step 2: Measure the likelihood of repetitiveness … B…A Collected data Likelihood of repetitiveness First derivative Whether to stop the test 1) Collect the already- generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test
  • 14. 14 Time Wilcoxon test between the distributions of every performance counter across both periods … Current time Step 2: Measure the likelihood of repetitiveness B…A Collected data Likelihood of repetitiveness First derivative Whether to stop the test 1) Collect the already- generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test
  • 15. 15 Step 2: Measure the likelihood of repetitiveness Response time CPU Memory IO p-values 0.0258 0.313 0.687 0.645 Statistically significantly different in response time! Collected data Likelihood of repetitiveness First derivative Whether to stop the test 1) Collect the already- generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test Time Wilcoxon test between every performance counter from both periods … Current time B…A
  • 16. 16 Step 2: Measure the likelihood of repetitiveness Response time CPU Memory IO p-values 0.67 0.313 0.687 0.645 Find a time period that is NOT statistically significantly different in ALL performance metrics! Collected data Likelihood of repetitiveness First derivative Whether to stop the test 1) Collect the already- generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test Time Wilcoxon test between every performance counter from both periods … Current time B…A
  • 17. 17 Find a period that is NOT statistically significantly different? Yes. Repetitive! No. Not repetitive! Step 2: Measure the likelihood of repetitiveness Collected data Likelihood of repetitiveness First derivative Whether to stop the test 1) Collect the already- generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test Time Wilcoxon test between every performance counter from both periods … Current time B…A
  • 18. 18 Step 2: Measure the likelihood of repetitiveness Collected data Likelihood of repetitiveness First derivative Whether to stop the test 1) Collect the already- generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test Repeat this process a large number (e.g., 1,000) times to calculate the: likelihood of repetitiveness
  • 19. 19 Step 2: Measure the likelihood of repetitiveness 30 min 40 min Time … 1h 10 min Collected data Likelihood of repetitiveness First derivative Whether to stop the test 1) Collect the already- generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test A new likelihood of repetitiveness is measured periodically, e.g., every 10 min, in order to get more frequent feedback on the repetitiveness
  • 20. 20 Step 2: Measure the likelihood of repetitiveness Time likelihood of repetitiveness 00:00 24:00 1% 100% Stabilization (little new information) Collected data Likelihood of repetitiveness First derivative Whether to stop the test 1) Collect the already- generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test The likelihood of repetitiveness eventually starts stabilizing.
  • 21. 21 Step 3: Extrapolate the likelihood of repetitiveness Time likelihood of repetitiveness 00:00 24:00 1% 100% Collected data Likelihood of repetitiveness First derivative Whether to stop the test 1) Collect the already- generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test To know when the repetitiveness stabilizes, we calculate the first derivative.
  • 22. 22 Step 4: Determine whether to stop the test Time likelihood of repetitivenes s 00:00 24:00 1% 100% Stop the test if the fist derivative is close to 0. Collected data Likelihood of repetitiveness First derivative Whether to stop the test 1) Collect the already- generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test To know when the repetitiveness stabilizes, we calculate the first derivative.
  • 23. 23 Our approach for recommending when to stop a performance test Collected data Likelihood of repetitiveness First derivatives Whether to stop the test 1) Collect the already-generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test STOP No Yes
  • 24. PetClinic Dell DVD Store CloudStore 24 We conduct 24-hour performance tests on three systems
  • 25. 25 We evaluate whether our approach: Stops the test too early? Stops the test too late? Optimal stopping time 1 2
  • 26. 26 Pre-stopping data Post-stopping data Time STOP Does our approach stop the test too early? 00:00 24:00 1 1) Select a random time period from the post- stopping data 2) Check if the random time period has a repetitive one from the pre-stopping data The test is likely to generate little new data, after the stopping times (preserving more than 91.9% of the information). Repeat 1,000 times
  • 27. 27 We apply our evaluation approach in RQ1 at the end of every hour during the test to find the most cost effective stopping time. Does our approach stop the test too late? 2 1h 2h Time … 10h 20h 24h
  • 28. The most cost-effective stopping time has: 1.  A big difference to the previous hour 2.  A small difference to the next hour 28 1% 100% 00:00 04:00 05:00 06:00 Does our approach stop the test too late? 2 likelihood of repetitiveness
  • 29. 29 There is a short delay between the recommended stopping times and the most cost effective stopping times (The majority are under 4-hour delay). Short delay Does our approach stop the test too late? 2
  • 30. 30 Determining the length of a performance test is challenging Time Stopping too early, misses performance issues Stopping too late, delays the release and wastes testing resources Optimal stopping time
  • 31. 31 30 Determining the length of a performance test is challenging Time Stopping too early, misses performance issues Stopping too late, delays the release and wastes testing resources Optimal stopping time
  • 32. 32 Our approach for recommending when to stop a performance test Collected data Likelihood of repetitiveness First derivatives Whether to stop the test 1) Collect the already-generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test STOP No Yes
  • 33. 33 30 Determining the length of a performance test is challenging Time Stopping too early, misses performance issues Stopping too late, delays the release and wastes testing resources Optimal stopping time 32 Our approach for recommending when to stop a performance test Collected data Likelihood of repetitiveness First derivatives Whether to stop the test 1) Collect the already-generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test STOP No Yes
  • 34. 34 Pre-stopping data Post-stopping data Time STOP Does our approach stop the test too early? 00:00 24:00 1 1) Select a random time period from the post- stopping data 2) Check if the random time period has a repetitive one from the pre-stopping data The test is likely to generate little new data, after the stopping times (preserving more than 91.9% of the information). Repeat 1,000 times
  • 35. 35 30 Determining the length of a performance test is challenging Time Stopping too early, misses performance issues Stopping too late, delays the release and wastes testing resources Optimal stopping time 32 Pre-stopping data Post-stopping data Time STOP Does our approach stop the test too early? 00:00 24:00 1 1) Select a random time period from the post- stopping data 2) Check if the random time period has a repetitive one from the pre-stopping data The test is likely to generate little new data, after the stopping times (preserving more than 91.9% of the information). Repeat 1,000 times 32 Our approach for recommending when to stop a performance test Collected data Likelihood of repetitiveness First derivatives Whether to stop the test 1) Collect the already-generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test STOP No Yes
  • 36. 36 There is a short delay between the recommended stopping times and the most cost effective stopping times (The majority are under 4-hour delay). Short delay Does our approach stop the test too late? 2
  • 37. 37 30 Determining the length of a performance test is challenging Time Stopping too early, misses performance issues Stopping too late, delays the release and wastes testing resources Optimal stopping time 32 Pre-stopping data Post-stopping data Time STOP Does our approach stop the test too early? 00:00 24:00 1 1) Select a random time period from the post- stopping data 2) Check if the random time period has a repetitive one from the pre-stopping data The test is likely to generate little new data, after the stopping times (preserving more than 91.9% of the information). Repeat 1,000 times 33 There is a short delay between the recommended stopping times and the most cost effective stopping times (The majority are under 4-hour delay). Short delay Does our approach stop the test too late? 2 32 Our approach for recommending when to stop a performance test Collected data Likelihood of repetitiveness First derivatives Whether to stop the test 1) Collect the already-generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test STOP No Yes