Eurocontrol Kpi Measurement Monitoring Analysis Guide
Eurocontrol Kpi Measurement Monitoring Analysis Guide
AIR NAVIGATION
EUROCONTROL
KPI Measurement,
Monitoring and Analysis
Guide
AIM/AEP/S-LEV/0008
Edition : 0.2
Edition Date : 19 Apr 2002
Status : Draft
Class : General Public
DOCUMENT DESCRIPTION
Document Title
KPI Measurement, Monitoring and Analysis Guide
This guide provides an introduction to Key Performance Indicator (KPI) measurement, monitoring and
analysis.
Keywords
Quality Management Service Level Performance Indicator
KPI Measurement Data Collection Analysis
Monitoring
DOCUMENT APPROVAL
The following table identifies all management authorities that have successively approved
the present issue of this document.
The following table records the complete history of the successive editions of the present
document.
SECTIONS
EDITION DATE REASON FOR CHANGE PAGES
AFFECTED
0.1 30 Nov 2001 Creation All
TABLE OF CONTENTS
1. INTRODUCTION ......................................................................................................3
1.1 Purpose and scope.................................................................................................3
1.2 References ..............................................................................................................3
1.3 Glossary ..................................................................................................................3
2. DATA COLLECTION ...............................................................................................3
2.1 Data Collection Plan ...............................................................................................3
2.2 Measurement Techniques......................................................................................3
2.2.1 Event-Driven Measurement.........................................................................3
2.2.2 Sampling-Based Measurement ...................................................................3
2.2.3 Simulation ...................................................................................................3
2.3 Measurement Types: Variable and Attribute Measures .......................................3
2.4 Designing a Data Collection System .....................................................................3
3. MONITORING ..........................................................................................................3
3.1 Operational Report .................................................................................................3
3.2 Real-Time Reports ..................................................................................................3
3.3 Executive Summaries.............................................................................................3
3.4 Customer Reports ..................................................................................................3
4. ANALYSING AND INTERPRETING KPIS ...............................................................3
4.1 Variation and Trend Analysis.................................................................................3
4.2 Interpreting Charts for Variance and Trend Analysis...........................................3
4.2.1 Interpreting Run Charts...............................................................................3
4.2.2 Control Chart...............................................................................................3
4.2.3 Histogram ...................................................................................................3
4.3 Root-Cause Analysis..............................................................................................3
4.3.1 Casual Table...............................................................................................3
4.3.2 Cause and Effect Diagram ..........................................................................3
4.3.3 Interrelations Digraph..................................................................................3
4.4 Identifying Relationships .......................................................................................3
4.4.1 Scatter Diagrams ........................................................................................3
4.4.2 Stratification ................................................................................................3
4.5 Capability Analysis.................................................................................................3
4.6 Determining Baselines ...........................................................................................3
4.6.1 Process Capability ......................................................................................3
4.6.2 Analysing Distribution..................................................................................3
4.6.3 Interpreting Histogram.................................................................................3
4.6.4 Capability Indices........................................................................................3
4.6.5 Capacity Analysis........................................................................................3
Edition: 0.2 Draft Page vii
AIM/AEP/S-LEV/0008 KPI Measurement, Monitoring and Analysis
Guide
1. INTRODUCTION
While reading of the document, please keep in mind that you have to adapt
solutions according to the
Data Collection
Monitoring
1.2 References
[1] Foundations of Service Level Management, April 2000, Rick Sturm, Wayne
Morris and Mary Jander, SAMS Publications, ISBN 0-672-31743-5.
[2] Change Management the 5-step action kit, C. Rye, ISBN 0749433809, Kogan
Page Limited, 2001.
[5] Basic Tools for Process Improvement, Module 7, Data Collection, 1996.
[6] The Quality Tools Cookbook, Sid Sytsma and Katherine Manley,
http://www.sytsma.com/tqmtools/tqmtoolmenu.html
[7] Quality Assurance Tools and Methods, Quality Assurance (QA) Project,
http://www.qaproject.org/RESOURCES.htm#Resources.
[8] The Six Sigma Way Team Field Book, P.S. Sande, R.P. Neuman, R.R.
Cavanagh, ISBN 0-07-137314-4, McGraw-Hill, 2002.
[9] Basic Tools for Process Improvement, Module 9, Run Chart, Navy Total Quality
Leadership Office, January 1996, http://www.odam.osd.mil/qmo/library.htm.
[11] Basic Tools for Process Improvement, Module 11, Histogram, Navy Total
Quality Leadership Office, January 1996, http://www.odam.osd.mil/qmo/library.htm.
[12] Event Management and Notification, White Paper by BMC Software Inc., 2002,
http://www.bmc.com.
1.3 Glossary
Term Description
Affinity Diagram A creative process, used with or by a group, to gather and
organise ideas, opinions, issues, etc.
Brainstorming A powerful, versatile and simple technique for generating
large numbers of ideas around a common theme from a
group of people in a very short period of time.
Cause A proven reason for the existence of a problem - not to be
confused with symptoms.
Check Sheet A systematic data-gathering and interpretation tool
Common Cause A source of variation that is inherent in the system and is
Variation predictable. It affects all the individual values of the process
output being studied; in control charts, it appears as part of
the random process variation. Common cause variation
can be eliminated only by altering the system.
Control Chart A display of data in the order that they occur with
statistically determined upper and lower limits of expected
common cause variation. It is used to indicate special
causes of process variation, to monitor a process for
maintenance, and to determine if process changes have
had the desired effect.
Control Limits Control limits define the area three standard deviations on
either side of the centreline, or mean, of data plotted on a
control chart. Do not confuse control limits with
specification limits
Control Limits Control limits define the area three standard deviations on
either side of the centreline, or mean, of data plotted on a
control chart. Do not confuse control limits with
specification limits
Effect An observable action or evidence of a problem.
Interrelations Digraph A graphical representation of all the factors in a
complicated problem, system or situation.
LSL A lower specification limit is a value above which
performance of a product or process is acceptable. This is
also known as a lower spec limit or LSL.
Mean The average value of a set of numbers. Is equal to the sum
of all values divided by the number of values.
Median In a series of numbers, the median is a number which has
at least half the values greater than or equal to it and at
least half of them less than or equal to it.
Root Cause The basic reason creating an undesired condition or
problem. In many cases, the root cause may consist of
several smaller causes.
Root Cause Analysis Using one or more various tools to determine the root
cause of a specific failure.
Run Chart A chart used to analyse processes according to time or
order. They give a picture of a variation in some process
over time and help detect special (external) causes of that
variation.
Scatter Diagram A chart used to interpret data by graphically displaying the
relationship between two variables
σ The Greek letter used to designate a standard deviation.
Special Cause Cause not normally part of a process that creates process
variation, generally forcing the process out of control. Any
abnormal unpredictable variation.
Term Description
Standard Deviation A mathematical term to express the variability in a data set
or process. It is commonly represented by the lowercase
Greek letter sigma (σ). Mathematically, a standard
deviation is equal to the square root of the average
squared differences between individual data values and the
data set average.
Stratification The process of dissecting an issue or problem and
examining each piece separately. The problem or issue in
question may only be present in one or more distinct pieces
and not the whole population.
Trend A gradual change in a process or output that varies from a
relatively constant average.
USL An upper specification limit, also known as an upper spec
limit, or USL, is a value below which performance of a
product or process is acceptable.
Variation The inevitable difference among individual outputs of a
process. It is the result of the combination of the five
elements of a process - people, machines, material,
methods and the environment. The sources of variation can
be grouped into two major classes, Normal or Common
causes and Abnormal or Special causes.
2. DATA COLLECTION
Data collection helps you to assess the health of your system and processes. To do
so, you must identify the Key Performance Indicators (KPIs) to be measured, how
they will be measured and what you will do with the data collected.
Every improvement effort relies on data to provide a factual basis for making
decisions for improvement. Data collection enables a team to formulate and test
working assumptions and develop information that will lead to the improvement of
the KPIs of the product, service or system. Data collection improves your decision-
making by helping you focus on objective information about what is happening,
rather than subjective opinions. In other words, “I think the problem is...” becomes
“The data indicate the problem is...”
To collect data uniformly, you will need to develop a data collection plan. The
elements of the plan must be clearly and unambiguously defined. Data collection
can involve a multitude of decisions by data collectors. When you prepare your data
collection plan, you should try to eliminate as many subjective choices as possible
by operationally defining the parameters needed to do the job correctly. Your data
collectors will then have a standard operating procedure to use during their data
collection activities [5].
The first step of a good data collection plan is to have clearly defined KPIs. The KPI
definition must include:
Then a data collection plan must be prepared for each KPI. While preparing data
collection plans you have to answer the following questions [3]:
MTBF
Availability =
MRBF + MTTR
where MTBF and MTTR stand for Mean Time Between Failures and Mean
Time To Repair respectively, you will need to collect data for MTBF and
MTTR.
• How will the data be collected? You have to define an operating procedure
for data collection. The procedure must be unambiguous and should not
contain subjective choices.
• When will the data be collected? You have to specify the amount and
frequency of data collection. However you need to remember that you are
collecting data for the purpose of future improvement efforts. Therefore you
have to take into consideration the cost of obtaining the data, the availability
of data and the consequences of decisions made on the basis of the data
when determining how much data should be obtained and how frequently it
should be collected.
• Where will the data be collected? The location where data are collected
must be identified clearly.
• Who will collect the data? The answer is simple: Those closest to the data
(e.g., the process workers) should collect the data. These people have the
best opportunity to record the results. They also know the process best and
can easily detect when problems occur. But remember, the people who are
going to collect the data need training on how to do it and the resources
necessary to obtain the information such as time and measurement tools.
• Event-Driven Measurement
• Simulation
In case of event-driven measurement the times at which certain events happen are
recorded and then desired statistics are computed by analysing data. Although the
event-driven measurement varies from organisation to organisation, there are three
distinct and common steps [12]:
• To record time and nature of events (e.g., to record time and nature of
computer failure or error in a publication)
• To take corrective actions by using a procedure that outlines how the event
should be managed (e.g., to make the computer up and running again or to
correct an error in a publication)
• Agents
• Human beings
Agent. An agent is a piece of software designed to collect data about the status and
functionality of a device, system or application for reporting purposes. Among many
tools the popular examples are First Sense, Empire, OpenView (HP), etc.
Agents capture data directly from the hardware elements underlying the service
(network, bridges, routers, switches, hubs, etc.) or they gather input from software
programs that affect overall service availability (applications, databases,
middleware, etc.) They report events directly as they occur. Examples include
hardware and software failures, broken routers, etc.
Human Beings. In this case the event is detected by the people involved in the
process. The recording is generally done by using checksheets. Checksheets are
structured forms that enable people to collect and organise data systematically.
Checksheets may be computerised (e.g., similar forms are used in workflow
management or document management systems). Common types of Checksheets
include [8]:
Because each checksheet is used for collecting and recording data unique to a
specific process or system, it can be constructed in whatever shape, size and format
are appropriate for the data collection task at hand. There is no standardised format
that you can apply to all checksheets. Instead, each checksheet is a form tailored to
collect the required information. However, you may use the following guidelines
while developing useful checksheets [5]:
• Involve the process workers in developing the checksheet for their process.
• Label all columns clearly. Organise your form so that the data are recorded
in the sequence seen by the person viewing the process. This reduces the
possibility of data being recorded in the wrong column or not being recorded.
• Make the form user-friendly. Make sure the checksheet can be easily
understood and used by all of the workers who are recording data.
• Create a format that gives you most information with least amount of effort.
For example, design your checksheet so that data can be recorded using
only a check mark, slant mark, number or letter.
• Provide enough space for the collectors to record all of the data.
• Designate a place for recording the date and time the data were collected.
These elements are required when the data are used with Run Charts or
other tools which require the date and time of each observation.
• Provide a place to enter the name of the individual collecting the data.
6. Stapling Faults
0
7 6
4 3
5
2
This technique is particularly useful when you don’t have enough tools or manpower
to collect and analyse all data. However you should avoid sampling biases to have
useful data [8]:
The following sampling techniques can be used to avoid biases in your sampling:
b) An equal number of elements is drawn from each stratum and the results
are weighted according to the stratum's size in relation to the entire
population.
2.2.3 Simulation
1
A stratum is a subset of the population that shares at least one common characteristic.
Edition: 0.2 Draft Page 9
AIM/AEP/S-LEV/0008 KPI Measurement, Monitoring and Analysis
Guide
• It does not require real-time agents that can be heavy and expensive in
terms of computing power and investment.
• Attribute (discrete) measures are those where you can sort items into
distinct, separate, non-overlapping categories. Examples: types of aircraft,
sex, types of vehicles, etc., Attribute measures include artificial scales like
the ones on surveys where people are asked to rate a product or service on
a given scale. They count items or incidences that have a particular
characteristic that sets them apart from things with a different with a different
attribute or characteristics.
The confusion raises from the fact that sometimes attribute data is presented in
variable form. As an example if you find that 32.21% of your customers are airlines,
having decimals and numbers does not make it variable measure. You are still
counting something that share one common characteristics or attribute.
The half test can be used to distinguish between variable and attribute measures.
Simply you ask “half of that measure” makes sense. If yes, the measure is variable
otherwise attribute.
Errors per publications “Half an error” does not make sense. It is an attribute
measure
The second confusing point is that something that can be measured in a continuous
scale can be represented as an attribute measure. As example “time to publish” can
be measured as “on time” or “late”. Another example “hold time per incoming call”
can be measured as an attribute data as “number of calls on hold past 30 seconds”.
If there are many unknowns about what data is required, it is probably best to start
with a manual data collection system. After the system has been used for some time
and the system design has been established, it can be automated.
The first problem is to get everyone to report reasonably accurate data. This will be
a challenge, no matter how much instruction is provided. Most people get into the
routine very quickly, but some will require considerable support and instruction.
Intensive follow-up and checking of detail is needed to be sure all problems are
reported and that they are reported correctly. During first several weeks of
implementation, supervisors should review all data sheets and look into any entries
that look questionable.
When KPIs are first implemented, there will be many questions about what they
mean, where they come from and how they should be interpreted, even if all this
was explained before starting. Managers and system developers should carefully
listen to any questions and objections because they may indicate where the system
needs to be improved.
If KPIs are not being used and analysed, there are only four possible explanations:
The following should be considered during the design of a Data Collection System:
Make Reporting Data Easy. Make it as easy as possible to record or enter data.
Don’t add steps and people to a production process to capture necessary data.
Build reporting into the process by modifying forms and procedures.
Do not Overkill. Don’t take the approach of collecting every bit of available data
and rearranging it into massive reports that no one can use. Instead, first determine
what information is needed and then develop the system to supply it.
Reports and graphs should be designed for specific users and purposes. Since
different managers must make different decisions, they need different information.
Give everyone the graphs and summary reports of all KPIs that are relevant to them.
• Systems customised for different functions are more efficient and more
effective than the more general solutions.
The best approach appears to be to take a decentralised approach, but keep closely
coupled functions under the same umbrella, so that data share a common structure
and can be easily interrelated.
Level of Detail. The amount of detail needed to identify the root causes of
problems is typically more than what is required to establish accountability. In
theory, anything (e.g., process) can be measured so extensively that the root cause
of any problem can be quickly isolated. However, in general, it is very expensive.
Therefore, it is required to select right level of detail that makes the best trade-off
between how much data to collect and how often the detailed data is need. Another
approach is to find the root-cause using the iterations. In the iterations the probable
causes can be identified in suspected areas and additional data can be collected to
finally determine the real causes of problems.
Create a Single Composite Index. All managers would like to have one KPI that
would indicate when everything was not in fine shape and tell them what to do about
it. Unfortunately, it is not possible since complex systems cannot be controlled with
simple measurement systems. However, it is still useful to construct composite
KPIs for a department or a process since they can help keep the relative importance
of individual KPIs. The easiest way of constructing a composite KPI is to assign a
weighting factor to each component and calculate the weighted average.
3. MONITORING
• Relevant to the person receiving it. This requirement has two aspects:
1. Making sure that managers get all information that is relevant to them.
2. They get nothing that is not relevant to them. Information not needed or
not used is just another form of waste.
• Operational Reports
• Real-Time Reports
• Executive Summaries
• Customer Reports
The format and content of the operational reports vary considerably with the
purpose of analysis to be done (Please see the chapter Analysis for details and
examples):
reports should contain the values of KPIs in question. They are generally
supported with a scatter diagram or a stratification table.
• Capability Analysis. Such reports are used to keep track of the KPI values
when you start making a change in order to improve the system or
processes. These reports will be very similar to the ones used in trend
analysis. However, in this case the KPI values are presented as a function of
change (i.e., before and after the change(s)), not time.
• Capacity Analysis. These reports are only applicable to KPIs that are used
to measure capacity (e.g., number of publications per month) and associated
quality KPIs (e.g., rate of errors in publications). Such reports are used to
find out where the capacity saturates and at what capacity you can still
produce high quality products.
Operational reports are more detailed than other type of reports. The following
sample report presents cycle time of the process “MyProcess” and includes a
sample of 10 measurements. The report is supported by a Run Chart illustrating
baseline, objective and sample.
• Scheduled outages
• Strikes
• Etc.
Such kind of notification should show which end customers, applications, locations
and lines of business are affected. The report should also show the nature of the
problem and its symptoms along with the estimated time when service is anticipated
to return to normal [1].
You should keep in mind one thing when tailoring reports for executive managers:
they have no time!
These reports should provide customers with summarised reports on service and
product delivery. If there are KPIs for which problems have been experienced and
which are important to customers (e.g. service availability), they should be
highlighted. These reports should also cite any steps it takes to improve customer
service [1].
You may design KPIs very well and in a reliable manner. However, if the data and
information are not properly analysed and interpreted, the benefits will be limited.
• Determining priorities
Although rigid rules for analysing and interpreting KPIs and their related data cannot
be defined, some guidelines will help assure that the data is analysed correctly and
the right conclusions are drawn.
All KPIs will exhibit some variation. At the lower levels of detail, this variation can be
quite large even if everything is under control. The first rule to follow when
interpreting KPIs is to not react to short-term deviations until reasons for the
deviation are understood. If the deviation is within the normal range, there has been
no change in performance at all. If it is a very large deviation, something unusual
has happened and the cause should be determined. In most cases, special
problems or circumstances are known by those responsible for the KPI.
A simple Line Graph or Run Chart will provide a good example of the normal
variation. This is one reason why KPIs should be put on run charts instead of relying
solely on reports.
In order to explore how the variance can be analysed lets assume that you measure
the time to reach your office each morning:
Day 1 2 3 4 5 6 7 8 9 10
Time 25.3 22.1 24.4 26.8 27.3 26.6 24.2 22.0 21.3 23.9
30
28
Although there is a fluctuation between 21.3 and 27.3 minutes, the trend is quite
stable around median value 24.3 and data points do not show a particular and
steady trend. Therefore there is no reason that you try to find out why it took you
27.3 minutes in day 5.
Lets assume that you continue your measurements for the next ten days and you
obtain the following measures.
Day 11 12 13 14 15 16 17 18 19 20
Time 18.1 17.6 17.2 15.1 14.4 14.0 12.6 12.2 14.5 15.3
These new measurements show a descending trend. The trend is confirmed with a
reasonable number of consecutive data points. You can conclude that your time to
reach office has been reduced.
30
28
11
13
15
17
19
Days
The most popular charting techniques used for variance and trend analysis are Run
Chart and Control Chart. A Control Chart is a special case of a Run Chart. If the Run
Chart provides sufficient data, it is possible to calculate "control limits"; the addition
of these control limits creates a Control Chart. Control limits indicate the normal
level of variation that can be expected; this type of variation is referred to as
common cause variation. Points falling outside the control limits, however, indicate
unusual variation for the process; this type of variation is referred to as special
cause variation.
However, although a bit unusual, a histogram can also be used for variance
analysis. The following sub-sections will explain the interpretations of run chart,
control chart and histogram.
The following provide some practical guidance in interpreting a run chart [6] [7] [9]
[10] :
• Seven or more consecutive points above (or below) the centre line (mean or
median) suggest a shift in the process. This is a special cause and you have to
look for what was different during the time when shift appeared. The shift can be
caused due to changes in materials, procedures, types of services/products
being produced, etc.
Shift
Measurement Mean
• Six or more successive increasing (or decreasing) points suggest a trend. You
have to look for what changed in the process on or shortly before the time the
trend began –sometimes it takes a while for a process change to show up in the
data- The trend can be caused due to changes in materials, procedures, types
of services/products being produced, etc.
Trend
Measurement Mean
40
Repeating Patterns
35
30
25
20
15
10
5
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
4.2.3 Histogram
You may also use histograms in variation analysis. In this case time ordered
histograms should be presented together [11].
Day 1 Day 2
Target Target
Day 3 Day 4
Target Target
This is where you play the role of “problem detective”. You have an effect in hand,
i.e., an observable action or evidence of a problem, and try to identify possible
causes for this particular effect.
Analyse Develop
Data/Process Casual
Hypothesis
Refine or Analyse
Reject Data/Process
Hypothesis
The effect or problem should be clearly defined to produce the most relevant
hypotheses about cause. The first step is to develop as many hypotheses as
possible so that no potentially important root cause is ignored. In the second step
data must be collected and analysed to test these hypotheses. It should be noted
that represent hypotheses about causes, not facts. Failure to test these hypotheses
(i.e., treating them as if they were facts) often leads to implementing the wrong
solutions and wasting time.
There are three popular tools and techniques that are used during the step “develop
casual hypothesis”:
• Casual Table
• Cause-and-Effect Diagram
• Interrelations Digraph
A Causal Table, also known as the Why-Because Technique, allows you and your
team to analyse the root causes of a problem.
2
Dr. Kaoru Ishikawa, a Japanese quality control statistician, invented the fishbone diagram.
3
The design of the diagram looks much like the skeleton of a fish. Therefore, it is often referred to as
the fishbone diagram
Page 22 Draft Edition: 0.2
KPI Measurement, Monitoring and Analysis AIM/AEP/S-LEV/0008
Guide
A service company wanted to find out reasons why its support office did not answer
customer phones in the allowed time limits. The following diagram shows their
cause-and-effect analysis.
Figure 11. Cause and Effect Diagram Example – Reason Phone Not Answered
Lack of Trades
Experience in
Management
In: 2 Out: 2
The main conclusion of the above graph is that to solve problem of “repeat service
calls” the drivers must be attacked first since they are the root-causes of the
problem.
While identifying relationships between two variables X and Y, there are three
possibilities that should be considered:
• X and Y are not related at all. The apparent relationship is the result of pure
coincidence.
• X and Y are related, but X does not cause Y or vice-versa. Instead they are
affected by another variable(s).
There are two popular methods used for identifying relationships: scatter diagrams
and stratification.
Y Y Y
X X X
a) Positive Correlation b) Negative Correlation c) Weak Positive Correlation
Y Y Y
X X X
d) Weak Negative Correlation e) No Correlation f) J-Shaped Association
Figure 13. Correlation Examples with Scatter Diagrams for variables X and Y
220
210
200
Height (cm)
190
180
170
160
150
140
50 55 60 65 70 75 80 85 90 95 100 105
Weight (kg)
4.4.2 Stratification
W 2.1
The analysis indicates a possible relationship with model 117B, plant 3 and store
chain W but these relationships are not certain and require further investigation
since it is possible that:
• What is the amount of work that a process can do in a given period of time?
• How does the process capability match (or does not match) customer
requirements or process specifications?
Typically, processes follow the normal probability distribution. When this is true, a
high percentage of the process measurements fall between ±3σ of the process
mean or centre. That is, approximately 0.27% of the measurements would naturally
fall outside the ±3σ limits and the balance of them (approximately 99.73%) would be
within the ±3σ limits.
Since the process limits extend from -3σ to +3σ, the total spread amounts to about
6σ total variation. If process spread is compared with specification spread4, typically
one of the following three situations occurs:
6σ < (USL-LSL)
The process spread is well within the specification spread. When processes are
capable, we have an attractive situation for several reasons: We could tighten our
specification limits and claim our product is more uniform or consistent than our
competitors. We can rightfully claim that the customer should experience less
difficulty, less rework, more reliability, etc. This should translate into higher profits.
4
Specification spread is defined with Lower Specification Limit (LSL) and Upper Specification Limit
(USL).
Edition: 0.2 Draft Page 31
AIM/AEP/S-LEV/0008 KPI Measurement, Monitoring and Analysis
Guide
6σ = (USL-LSL)
When a process spread is just about equal to the specification spread, the process
is capable of meeting specifications, but barely so. This suggests that if the process
mean moves to the right or to the left just a little bit, a significant amount of the
output will exceed one of the specification limits. The process must be watched
closely to detect shifts from the mean. Control charts are excellent tools to do this.
6σ > (USL-LSL)
When the process spread is greater than the specification spread, a process is not
capable of meeting specifications regardless of where the process mean or center is
located. This is indeed a sorry situation. Frequently this happens, and the people
responsible are not even aware of it. Over adjustment of the process is one
consequence, resulting in even greater variability. Alternatives include:
• Live with the current process and sort 100% of the output.
• Re-centre the process to minimise the total losses outside the specification
limits
Histograms are the easiest and common tool to monitor and analyse capability. A
Histogram displays a single variable in a bar form to indicate how often some event
is likely to occur by showing the pattern of variation (distribution) of data. A pattern
of variation has three aspects: the centre (average), the shape of the curve and the
width of the curve. Histograms are constructed with variables such as time, weight,
temperature, etc. and are not appropriate for attribute data.
You may support a histogram with the target value. The target value generally
comes from customer requirements, process specifications or KPI objective. For
example the target value helps you to illustrate how your process capability matches
(or does not match) customer requirements [11].
Target Target
A C
Most of the data were on target, with very Even when most of the data were close
little variation from it. Within together, they were located off the target
by a significant amount.
B Target D Target
Although some data were on target, many The data were off target and widely
others were dispersed away from the target. dispersed.
Similarly the upper and lower specification limits (e.g., from process specifications)
can be marked on a histogram. It assists to identify whether the data lies within its
specification limits [11].
Bell Shaped/Symmetrical
Data points cluster around on end and tail off in opposite direction.
It may happen in any type of measure involving time – processing
time, cycle time, days after due date and costs. You should find out
what is different about the units represented by the values in the
tails of the distribution. If they tail off in an undesirable direction,
you should eliminate them. Otherwise you should copy them [7] [8].
Truncated
Ragged Plateau
There are times when a Histogram may look unusual to you (See above figure). In
these circumstances, the people involved in the process should ask themselves
whether it really is unusual. The Histogram may not be symmetrical, but you may
find out that it should look the way it does. On the other hand, the shape may show
you that something is wrong, that data from several sources were mixed, for
example or different measurement devices were used or operational definitions
weren't applied. What is really important here is to avoid jumping to conclusions
without properly examining the alternatives [11].
The equation for the simplest capability index, Cp, is the ratio of the specification
spread to the process spread, the latter represented by six standard deviations or
6σ.
(USL − LCL )
Cp =
6σ
Cp assumes that the normal distribution is the correct model for the process (i.e.,
assumes the process is centred on the midpoint between the specification limits). Cp
can be highly inaccurate and lead to misleading conclusions about the process
when the process data does not follow the normal distribution.
1.00 .27%
1.33 .0064%
1.67 0.000057%
Remember that the capability index Cp ignores the mean or target of the process. If
the process mean lined up exactly with one of the specification limits, half the output
would be nonconforming regardless of what the value of Cp was. Thus, Cp is a
measure of potential to meet specification but says little about current performance
in doing so.
Occasionally the inverse of the capability index Cp, the capability ratio CR is used to
describe the percentage of the specification spread that is occupied or used by the
process spread.
1 6σ
CR = x100% = x100%
Cp (USL − LSL)
The major weakness in Cp was the fact that few, if any processes remain centered
on the process mean. Thus, to get a better measure of the current performance of a
process, one must consider where the process mean is located relative to the
specification limits. The index Cpk was created to do exactly this. With Cpk, the
location of the process centre compared to the USL and LSL is included in the
computations and a worst case scenario is computed in which Cp is computed for
the closest specification limit to the process mean.
⎧USL − μ μ − LSL ⎫
C pk = min ⎨ and ⎬
⎩ 3σ 3σ ⎭
We have the following situation. The process standard deviation is “σ = 0.8” with a
“USL = 24”, “LSL = 18” and the process mean “μ = 22”.
⎧ 24 − 22 22 − 18 ⎫
C pk = min ⎨ and ⎬ = min{0.83 and 1.67} = 0.83
⎩ 3 ∗ 0 .8 3 ∗ 0 .8 ⎭
If this process' mean was exactly centered between the specification limits, Cp = Cpk
= 1.25.
Cpm is called the Taguchi capability index after the Japanese quality guru, Genichi
Taguchi whose work on the Taguchi Loss Function stressed the economic loss
incurred as processes departed from target values. This index was developed in the
late 1980's and takes into account the proximity of the process mean to a
designated target, T.
(USL − LSL)
C pm =
6 × (σ 2 + ( μ − T ) 2 )
When the process mean is centered between the specification limits and the
process mean is on the target, T, Cp = Cpk = Cpm.
When a process mean departs from the target value T, there is a substantive affect
on the capability index. In the Cpk example above, if the target value were T=21, Cpm
would be calculated as:
(24 − 18) 6
C pm = = = 0.78
6 × (0.8 2 + (22 − 21) 2 ) 6 × 1.64
Motorola bases much of its quality effort on what its calls its "6-Sigma" Program.
The goal of this program was to reduce the variation in every process to such an
extent that a spread of 12σ (6σ on each side of the mean) fits within the process
specification limits. Motorola allocates 1.5σ on either side of the process mean for
shifting of the mean, leaving 4.5σ between this safety zone and the respective
process specification limit.
Thus, even if the process mean strays as much as 1.5σ from the process centre, a
full 4.5σ remains. This insures a worst case scenario of 3.4 ppm nonconforming on
each side of the distribution (6.8 ppm total) and a best case scenario of 1
nonconforming part per billion (ppb) for each side of the distribution (2 ppb total). If
the process mean were centred, this would translate into a Cp=2.00.
• If the process is limited by equipment capacity, work will pile up in front of the
limiting steps of the process – production delay will increase.
• If the process is limited by labour capacity, work will pile up in front of the
bottleneck step(s) in the process, but the work may also get done while the
quality of the work suffers – rework and rejects will increase.
Identification of capacity for the first case is relatively straightforward. The saturation
point can be determined after plotting production data:
Production
Units/Month
Range where
production saturates
Time
Months
For the second case an estimate of production capacity can be derived from quality
and production data:
Time
Months
Figure 23. Capacity Analysis (II)
KPIs do not exist in a vacuum. They are affected by anything that affects an
organisation or its production processes. Weather, strikes, supply line disruptions,
unusual customer requests, competitors’ actions and many others can cause large
deviations in the KPIs. That is why it is a good practice to note significant changes in
environmental factors or unusual circumstances on charts when they occur. Besides
explaining what caused particular behaviour, these notes can help managers predict
what will happen under similar conditions in the future.
Because there will always be more problems and opportunities than there are
resources available to pursue them, managers must always think in terms of
priorities. Priorities for improving performance or changes in these priorities should
be one of the regular outputs of analysing KPIs. Assuming a measurement system
has the capability of determining the relative impact of KPIs, priorities should be
relatively clear in terms of costs or profit opportunities. However, priority decisions
must be supported with the following:
• Potential risk
• Investment required
• Payback period
• Availability of resources
• Etc.
Priorities must be evaluated from the broader perspective of the total organisation to
avoid sub-optimisation and to assure resources are allocated to the areas of most
return.
One of the main questions that is asked during improvement efforts is the following:
The answer is to above question is quite subjective and requires experience in the
area where improvement efforts take place. Lets take as an example the following
table that illustrates improvements provided by each successive change in a
production process:
Start 10.0
Change 1 5.0
Change 2 4.1
Change 3 3.4
Change 4 2.8
10
0
Start Change 1 Change 2 Change 3 Change 4
It might seem further significant improvements are not feasible after change 4.
However, it should be noted that determining when a process has reached its
practical limit for incremental improvement is subjective and a matter of judgement.
If the person making that judgement understands how well the process is performing
and its improvement history, that determination will probably be quite accurate.
The process above is in apparent statistical control. Notice that all points lie within
the upper control limits (UCL) and the lower control limits (LCL). This process
exhibits only common cause variation.
The process above is out of statistical control. Notice that a single point can be
found outside the control limits (above them). This means that a source of special
cause variation is present. The likelihood of this happening by chance is only about
1 in 1,000. This small probability means that when a point is found outside the
control limits that it is very likely that a source of special cause variation is present
and should be isolated and dealt with. Having a point outside the control limits is the
most easily detectable out-of-control condition.
The graphic above illustrates the typical cycle in statistical control. First, the
measurements are highly variable and out of statistical control. Second, as special
causes of variation are found, the measurements comes into statistical control.
Finally, through improvement, variation is reduced. This is seen from the narrowing
of the control limits. Eliminating special cause variation keeps the process in control;
process improvement reduces the process variation and moves the control limits in
toward the centreline of the process.
This guide discusses issues and problems that can be encountered during
measurement, monitoring and analysis of KPIs.
• Measurement techniques
• Different types of analysis that can be done via KPIs: trend analysis, cause-
effect analysis, capability analysis, capacity planning, etc.
The provided information will assist AIS organisations to fulfil the associated
ISO9000:2001 requirement (i.e., section 8). It will be also useful while implementing
service level management.
It should be noted that the main objective is not only to measure but also to take
corrective and preventive actions by analysing performance levels achieved for
KPIs.
The document is still in its early stages and requires some detailed examples
especially for KPI monitoring and analysis.
End of Document