Elbow Metrics
Elbow Metrics
Abstract—Computer systems often reach a point at which the encounter knee detection problems similar to those we de-
relative cost to increase some tunable parameter is no longer scribe [1], [2], [3], [4], [5]. In these systems, researchers
worth the corresponding performance benefit. These “knees” typ- either use ad hoc or system-specific approaches to detect
ically represent beneficial points that system designers have long
selected to best balance inherent trade-offs. While prior work knees, or defer the problem to future work. While a finely-
largely uses ad hoc, system-specific approaches to detect knees, crafted system-specific approach will perform better than a
we present Kneedle, a general approach to online and offline general knee detection approach, a designer may not take
knee detection that is applicable to a wide range of systems. the time to design one. Thus, our aim is not to improve
We define a knee formally for continuous functions using the or optimize a specific system or protocol, but to provide
mathematical concept of curvature and compare our definition
against alternatives. We then evaluate Kneedle’s accuracy against system designers a general tool for improving the parts of
existing algorithms on both synthetic and real data sets, and their system they generally do not take the time to optimize.
evaluate its performance in two different applications. In network protocol and system design, rules-of-thumb often
serve researchers and operators well in the absence of an
I. I NTRODUCTION
optimal solution. We believe that a tool for knee detection
Selecting the “right” operating point for a given system is adds to their problem solving arsenal. Our hypothesis is that
often thought of as an art form, since the direct and indirect a knee detection algorithm that does not require tuning for a
costs and benefits of changing different system parameters specific system or operational characteristics is applicable in a
are difficult or even impossible to quantify. For example, an wide range of settings where developers do not take the time
important operating point in a large MapReduce job occurs to design, test, and optimize a system-specific algorithm.
when the job should no longer wait for “slow” tasks to finish,
but instead speculatively re-execute work on other nodes in II. D EFINING AND D ETECTING K NEES
hopes of finishing the job sooner [1]. Since MapReduce’s goal While the notion of a knee is well-known, we are not
is to finish all tasks as fast as possible, it must decide when the aware of a broadly accepted definition in prior literature.
cost, in terms of a job’s running time and cluster utilization, The confusion stems from the fact that researchers, in many
is worth the corresponding performance benefit, in terms of cases unknowingly, use knees as a substitute for a more
task completion percentage. Congestion-responsive network comprehensive cost-benefit analysis that is either difficult
protocols face a related challenge when setting a sending rate: or impossible to perform. Performing a direct cost-benefit
a protocol must decide a rate that maximizes performance analysis is often complex, since it is inherently system-,
without exceeding its fair share and causing congestion. platform-, and workload-specific. Further, many systems are
In prior work, the issue has frequently been couched as not predictable due to volatile operating conditions.
identifying one or more “knees”—operating points, based on For example, unpredictable failure rates in large clusters,
recent trends, where the perceived cost to alter a system param- which may change over time, are the root cause of stragglers in
eter is no longer worth the expected performance benefit. For MapReduce jobs [1]. Likewise, since multiple flows share net-
MapReduce, triggering speculative execution after observing work links in the Internet, network protocols cannot predict in
a knee in the task completion percentage ensures that the advance the rapidly changing level of TCP-friendly bandwidth
system re-executes tasks that are significantly slower than available, but must instead continuously adapt to the indirect
other similar tasks that have finished execution. In the case signals of packet loss and delay [6]. In lieu of a complex
of a network protocol, successive increases to the sending system-specific analysis, operators tend to select operating
rate should cease if delay signals congestion by increasing points, or knees, that are “good enough” by observing where
steeply, forming a knee. However, while the problem of performance improvements start to level off as a function of
knee detection—finding “good” operating points in system one or more tunable system parameters. Note that we focus on
behavior—seems straightforward, to the best of our knowledge knee detection for complex systems that change their behavior
there exists neither an accepted definition of a knee nor a according to volatile, and potentially unpredictable, operating
general systematic approach for detecting one. conditions, and not for simple systems that permit standard
Numerous researchers in widely disparate areas frequently closed-form models, e.g., M/M/1 queues [7].
opää Knee Detection - Spring 2010
2
Gaussian Curvature
0.8
case, we could determine curvature by fitting a continuous
function to the data and using the function’s point of maximum
Arrivals
The difficulty with defining a knee formally is that “good a circle—end-points in a data set do not have curvature values
enough” in one system may not be “good enough” in another. by definition. Thus, using the closed-form formulation as a
0.0
K
Since knees only serve as an approximation, operators interpret direct basis for knee detection on discrete data is not possible.
them differently in different situations. Thus, knee detection is
-0.2
an inherently heuristic process. However, to design a general B. Knee Detection in Discrete Data Sets
-4 -2 0 2 4
application-independent knee detection algorithm, we require Researchers have proposed multiple previous approaches
a consistent definition applicablex to any system. In this work, to detecting knees in discrete data. Before formulating our
as in [8], we use the mathematical definition of curvature for curvature-inspired algorithm in Section III, we present two
graph represents the CDF andfunction
a continuous the bottom
as graph is thefor
the basis associated
our knee curvature. The For
definition. verticalexisting
line indicates the
approaches—Angle-based and EWMA—from prior
m curvature, i.e.
anythecontinuous
knee, This seems to match
function the intuitive
f , there exists definition
a standard of closed-form
a knee very precisely.
research for comparison, as well as another approach we
Kf (x) that defines the curvature of f at any point as a function formulate based on Menger curvature, a direct discrete equiv-
of its first and second derivative: alent of continuous curvature. Note that the Angle-based and
00
f (x) Menger algorithms are designed specifically for offline cases,
Kf (x) = 0 2 1.5 where the entire data set is known in advance, while EWMA is
(1 + f (x) )
designed to detect knees online as data points become known.
The point of maximum curvature is well-matched to the ad Angle-based. The geometric “angle-based” approach of
hoc methods operators use to select a knee, since curvature is Zhao et al. [13] is an extension of the L-method for detecting
a mathematical measure of how much a function differs from knees in clustering applications [8]. The Angle-based approach
a straight line. As a result, maximum curvature captures the first finds the local minima of the successive differences
leveling off effect operators use to identify knees. Importantly, (y1 + y3 − 2y2 ) for each consecutive triple of points. For
unlike other common definitions, curvature is application- example, consider a straight line that goes through the con-
independent and (i) does not depend on the relationship secutive points (x1 , y1 ), (x2 , y2 ), and (x3 , y3 ). Assuming x-
between system parameters and performance, or (ii) require values are evenly spaced, then y1 + y3 − 2y2 = 0 for any
setting system-specific thresholds. Note that knee detection straight segment. However, if these three points form a knee,
does depend on the selection of proper adjustable system (x2 , y2 ) must be above the the straight line that goes through
parameters and performance metrics, as we show for our (x1 , y1 ) and (x3 , y3 ). In this case y1 +y3 −2y2 < 0. “Sharper”
examples in Section V. knees have more negative difference values.
It is important to realize why a knee definition based only on Next, since successive differences are local measures and
the first derivative is not enough to identify a knee. Consider ignore the overall trend of the curve, the algorithm combines
the simple example in Figure 1, where the y-axis represents the differences with an angle value. After obtaining the local
some performance metric, the x-axis represents a tunable minima of the successive differences, the algorithm sorts the
system parameter, and the vertical bar represents the point minima, and, starting from the point with the largest difference
of maximum curvature. The maximum of the first derivative value, calculates the two angles formed by the y-axis and the
is the inflection point of the curve, which occurs at x = 0 line going through each successive pair of points associated
in Figure 1. The inflection point is not representative of the with the corresponding difference value. The sum of these
knee since performance continues to improve significantly two angles is the angle value. Knees are detected at the local
beyond it. Instead, the inflection point only captures where the maxima of these angle values.
rate of performance increase reaches a maximum. In contrast, Menger Curvature. While curvature is not well-defined
the curvature definition precisely matches the concept of a for arbitrary discrete data sets, Menger curvature defines the
knee. [8] includes a survey of a range of other knee defini- curvature for three discrete points as the curvature of the
tions from prior work, primarily in the context of clustering circle circumscribed about those points [14]. Thus, we define
algorithms [7], [9], [10], [11], [12]. We discuss alternative the Menger curvature for each point pi = (xi , yi ) in an
definitions below. n point data set as being equal to 1/r for the circle of
While curvature is well-defined for continuous functions, radius r circumscribed about p1 , pi , and pn . The curvature
it is not well-defined for discrete data sets. In the discrete of the circumscribed circle is straightforward to compute and
3
1 1 1 Difference
0 0 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.5
100
Kneedle Kneedle
0.10
●
Definition
Kneedle Menger Menger
Menger Angle−based Angle−based
Angle−based
0.4
EWMA
80
0.08
●
Probability Density
0.3
0.06
60
F−Score
● ●
0.2
0.04
40
● ●
●
● ●
0.02
0.1
● ●
20
●
●
● ●
●
●
0.00
0.0
0
1.0
Sensitivity
0.12
Kneedle ●
EWMA 0.001
0.35
●
1.0
5.0
0.10
0.8
●
●
0.30
●
Probability Density
●
0.08
0.6
F−Score
F−Score
●
0.25
●
0.06
0.4
0.20
0.04
0.2
0.02
0.15
●
●
0.00
0.0
● ● ●
0.10
● ●
1.0
200
8000
0.8
Popularity-to-size ratios
150
6000
0.6
4000
100
0.4
2000
0.2
50
Without Kneedle UDP
0.0
With Kneedle TCP
0