Measurement Systems Analysis
Measurement Systems Analysis
AESQRM003202210
SAE Industry Technologies Consortia provides that: “This AESQ Reference Manual is published by the AESQ Strategy
Group/SAE ITC to advance the state of technical and engineering sciences. The use of this reference manual is entirely
voluntary and its suitability for any particular use is the sole responsibility of the user.”
Copyright © 2022 AESQ Strategy Group, a Program of SAE ITC. All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system, distributed, or transmitted, in any form or by
any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of AESQ
Strategy Group/SAE ITC. For questions regarding licensing or to provide feedback, please contact [email protected].
Aerospace Engine Supplier Quality (AESQ) Strategy
Group
The origins of the AESQ can be traced back to 2012. The Aerospace Industry was, and still is, facing many
challenges, including:
The Aero Engine manufacturers Rolls-Royce, Pratt & Whitney, GE Aviation and Snecma (now Safran Aircraft
Engines) began a collaboration project with the aim of driving rapid change throughout the aerospace engine
supply chain, improving supply chain performance to meet the challenges faced by the industry and the need
to improve the Quality Performance of the supply chain.
Suppliers to these Engine Manufacturers wanted to see greater harmonisation of requirements between the
companies. Each Engine Manufacturer had Supplier Requirements that were similar in intent but quite different
in terms of language and detail.
This collaboration was formalized as the SAE G-22 Aerospace Engine Supplier Quality (AESQ) Standards
Committee formed under SAE International in 2013 to develop, specify, maintain and promote quality standards
specific to the aerospace engine supply chain. The Engine Manufacturers were joined by six major Aero Engine
suppliers including GKN, Honeywell, Howmet Aerospace, IHI, MTU and PCC Structurals. This collaboration
would harmonise the aerospace engine OEM supplier requirements while also raising the bar for quality
performance.
Subsequently, the Aerospace Engine Supplier Quality (AESQ) Strategy Group, a program of the SAE Industry
Technologies Consortia (ITC), was formed in 2015 to pursue activities beyond standards writing including
training, deployment, supply chain communication and value-add programs, products and services impacting
the aerospace engine supply chain.
AESQ Vision
To establish and maintain a common set of Quality Requirements
that enable the
Global Aero Engine Supply Chain
to be
truly competitive through lean, capable processes
and a
culture of Continuous Improvement.
i
The SAE G-22 AESQ Standards Committee published six standards between 2013 and 2019:
In 2021 the AESQ replaced these standards, except for AS13001, with a single standard, AS13100.
The AESQ continue to look for further opportunities to improve quality and create standards that will add value
throughout the supply chain.
Suppliers to the Aero Engine Manufacturers can get involved through the regional supplier forums held each
year or via the AESQ website http://aesq.saeitc.org/.
ii
AESQ Reference Manuals
AESQ Reference Manuals can be found on the AESQ website at the following link:
https://aesq.sae-itc.com/content/aesq-documents
AESQ publishes several associated documents through the SAE G-22 AESQ Standards Committee supporting
deployment of AS13100. Their relationship with APQP and PPAP is shown in Figure 1.
Figure 1: AESQ Standards and Guidance Documents and the link to AS9145 APQP / PPAP
iii
RM13003 - Measurement Systems Analysis
TABLE OF CONTENTS
1. SCOPE ....................................................................................................................................... 5
1.1 Purpose ...................................................................................................................................... 5
2. REFERENCES ........................................................................................................................... 5
2.1 Applicable Documents ................................................................................................................ 5
2.2 SAE Publications ........................................................................................................................ 5
2.3 Other Documents ....................................................................................................................... 5
2.4 Definitions ................................................................................................................................... 6
2.5 Applicability ................................................................................................................................. 7
1
RM13003 - Measurement Systems Analysis
Figure 1 AESQ Standards and Guidance Documents and the Link for AS9145 APQP / PPAP ............. iii
Figure 2 Planning an MSA Study Tree Diagram .................................................................................... 11
Figure 3 MSA Quality Tests - Flowchart ................................................................................................. 12
Figure 4 Flow Diagram for Non-Capable Measurement Processes .......................................................12
Figure 5 Linearity and Bias Graph .......................................................................................................... 18
Figure 6 Case Study - Linearity Initial Study Data .................................................................................. 18
Figure 7 Case Study - Linearity improved Study Data ........................................................................... 19
Figure 8 EMP Process Flow Charts ........................................................................................................ 20
Figure 9 Consistency Chart .................................................................................................................... 21
Figure 10 Type 1 Gage Study ................................................................................................................... 22
Figure 11 Normal Distribution Plot ............................................................................................................ 22
Figure 12 Guard Bands............................................................................................................................. 23
Figure 13 Consistency Study .................................................................................................................... 24
Figure 14 Chunky Data Showing False Alarm .......................................................................................... 26
Figure 15 Short EMP Study ...................................................................................................................... 27
Figure 16 Average Chart by Operator ...................................................................................................... 31
Figure 17 Range Chart by Operator ......................................................................................................... 31
Figure 18 Interaction Effects Plot .............................................................................................................. 32
Figure 19 Analysis of Moving Range Plot ................................................................................................. 33
Figure 20 Analysis of Main Effects Plot .................................................................................................... 33
Figure 21 Chart of Stability ....................................................................................................................... 35
Figure 22 Viscometer Case Study ............................................................................................................ 36
Figure 23 Viscosity Stability Results ......................................................................................................... 37
Figure 24 Number of Distinct Categories.................................................................................................. 38
Figure 25 MSA for Position Tolerance ...................................................................................................... 40
Figure 26 Attribute Agreement Analysis Results Graph ........................................................................... 42
Figure 27 Attribute Acceptance Limits ...................................................................................................... 43
Figure 28 Acceptance Limits for Attribute Agreement Analysis................................................................ 44
Figure 29 Collected Data for Kappa Evaluation ....................................................................................... 45
Figure 30 Structured Data for Kappa Evaluation ...................................................................................... 46
Figure 31 Attribute Agreement Within Inspectors ..................................................................................... 46
Figure 32 Agreement Between Inspectors and Agreement to the Standard ............................................47
Figure 33 ICC Study Data ......................................................................................................................... 49
Figure 34 ICC Appraiser Chart ................................................................................................................. 49
Figure 35 Graph of Averaged Results ...................................................................................................... 52
Figure 36 Graph of Average Results with Confidence Interval................................................................. 53
Figure 37 Average and Range Six Pack Graph ....................................................................................... 54
Figure 38 Average and Range Data ......................................................................................................... 54
Figure 39 Guard Band Bias ...................................................................................................................... 56
2
RM13003 - Measurement Systems Analysis
3
RM13003 - Measurement Systems Analysis
INTRODUCTION
The aerospace industry is heavily reliant on inspection and verification to ensure that parts and assemblies
delivered to the customer meet drawing requirements. There are many differing requirements across the aero
engine supply chain, therefore, this reference manual is intended to harmonize requirements and provide an
approach. This reference manual goes through the essential requirements in AS13100 chapter 7.1.5 and
through example illustrates acceptable means of achieving the requirements for use on aero engine parts and
assemblies.
4
RM13003 - Measurement Systems Analysis
1. SCOPE
This is a reference manual document to provide context for AS13100 chapter 7.1.5 Measurement Systems
Analysis. As the document gives guidance, the word “Should” has been used. All mandatory requirements are
in AS13100 chapter 7.1.5, so the use of this document is to provide extra context and learning to deliver the
mandatory requirement.
1.1 Purpose
The purpose of this document is to provide guidance on the application of appropriate measurement system
analysis tools and work with AS13100 Section 7.1.5 acceptance criteria to be applied by the Aero Engine
Manufacturers Supply Chain. It also provides guidance on the efficiency of application and mitigation strategies
for measurement systems that do not achieve the acceptance levels.
The customer may require different acceptance guidance for specific applications.
There may be situations where alternative measurement systems analysis needs to be deployed. These should
be agreed between the organization and the specific customer prior to approval.
Case studies are included to provide some practical examples of the application of these methods.
2. REFERENCES
The latest issue of SAE publications should apply. Nothing in this document should supersede applicable laws
and regulations unless a specific exemption has been obtained. The documents listed below are intended to
support the requirements of this document and provide guidance on conducting MSA studies.
Available from SAE International, 400 Commonwealth Drive, Warrendale, PA 15096-0001, Tel: 877-606-7323
(inside USA and Canada) or +1 724-776-4970 (outside USA), www.sae.org.
AS13003 Measurement Systems Analysis Requirements for the Aero Engine Supply Chain
AS13100 AESQ Quality Management System Requirements for Aero Engine Design and Production
Organizations
Automotive Industry Action Group (2010). Measurement System Analysis, 4th ed., Detroit, MI.
Wheeler, D.J. and Lyday, R.W., Evaluating the Measurement Process, SPC Press, Inc., Knoxville, TN, 2006. ©
National Physical Laboratory - London.
"Evaluating the Measurement Process III - Using Imperfect Data" by Donald Wheeler.
5
RM13003 - Measurement Systems Analysis
2.4 Definitions
ACCURACY: Accuracy of measurements is the average difference between a measured value and an accepted
reference value of a measurand. Actual values are determined by an independent measurement.
ACCURACY ERROR: Accuracy error is the difference between a measured value and the “accepted reference
value” of a measurand.
ACCEPTED REFERENCE VALUE: A value that serves as an agreed-upon reference for comparison and that
is derived as a theoretical or established value, based on scientific principles, an assigned value based on
experimental work from a national or international measurement body, or a consensus value based on
collaborative experimental work.
ACCURACY RATIO: The ratio between the total part tolerance and the measurement accuracy. Measurement
accuracy is sometimes approximated by the total calibration tolerance of the measurement equipment.
ARTEFACT: An object of known size, shape, chemical composition, etc., that is used to test the measurement
system condition by comparison of the known artefact characteristic against the measured result. Artefacts can
be a component which is measured in several ways to establish an acknowledged known measurement value.
Artefacts can also be calibration masters or an item which has been calibrated to accurately determine its
characteristics.
ATTRIBUTE: A qualitative measure of a property that is of interest. This may be binary (pass/fail, good/bad,
etc.) or ordinal if the values can be ranked (e.g., low, medium, and high).
BIAS: The difference between the observed average value of measurements and an accepted reference value.
Bias should be considered in the overall assessment of the measurement system.
NOTE: Observed Bias may be additive to MSA error, and the combined values should not violate acceptance
requirements. An example of Bias would be when inspecting a feature with a reference value of 0.006
the average observed value is 0.0075. This is a bias of 0.0015.
CORRELATION: The degree to which two or more factors are statistically related to each other and vary
together. This does not necessarily prove a cause and effect relationship just that they appear to show a link.
CRITICAL FEATURES: Those characteristics of an item which, if nonconforming, may result in hazardous or
unsafe conditions for personnel using, maintaining, or depending on the product; or which may prevent or
seriously affect the satisfactory operation or functioning of the product. Specific Customer Feature
Classifications may exist.
CUSTOMER: The customer issuing the purchase order for the part that is subject to the measurement capability
acceptance.
GAGE REPEATABILITY AND REPRODUCIBILITY (GR&R): A study used to determine how much variation is
present in a measurement system by combining the equipment variation (repeatability) with the system variation
(reproducibility).
LINEARITY: (1) The extent to which an output quantity (e.g., a measurement) is proportional to an input quantity
(e.g., a dimension) throughout a given range (e.g., the range of measurement of a gage or a part tolerance
envelope). (2) Lack of Linearity is typically seen in gages which inspect parts that change geometrically and
materially at the same time, with instruments at fixed locations and angles. It is also typical for process related
evaluations where process parameters may change throughout the evaluation cycle. (3) Linearity can be seen
in many measurement systems. It should be considered where a gage is used across a range of measurements,
or the gage is adjustable. For example, consider an adjustable torque wrench where its accuracy decreases
towards the top and bottom of its working range.
MAJOR FEATURES: Those characteristics of an item, other than critical, which if nonconforming, may result in
operational or functional failure of the item, or which materially reduce the usability, physical or functional
interchangeability or durability of the product for its intended purpose. Specific feature specifications will be
defined by the customer.
6
RM13003 - Measurement Systems Analysis
MINOR FEATURES: Those characteristics of an item which, if nonconforming, do not materially reduce the
usability, physical or functional interchangeability or durability of the product, or are departure from established
guidance having no significant bearing on the effective use or operation of the product. Specific feature
specifications will be defined by the customer.
NUMBER OF DISTINCT CATEGORIES: The number of categories within a set of measurement data that the
measurement system can resolve.
PRECISION: How close the repeated measurement results are to each other.
REPEATABILITY: The ability of the measurement system to give the same result when measuring the same
feature multiple times using the same elements of the system, e.g., gage, operator, environment, fixture, etc.
REPRODUCIBILITY: The ability of the measurement system to give the same result when elements of the
system, such as operator or environment, are changed.
RESOLUTION: The smallest change in the characteristic being measured which are detected by the
measurement system or the smallest difference in the characteristic between parts which are discriminated.
STABILITY: A measure of how variation changes over time. This is classified as short term or long-term stability
depending on the time frame involved. RM13006 provides more detail on this definition including how to
determine stability.
VARIABLE DATA: Variable or continuous data take on any value within an interval and measurement results
are any value within that interval subject to the measurement system resolution.
VARIABLE INTERACTION: Where two or more variables combine to influence a measurement result, the effect
on the result due to a given change in one variable, depends on the current values of the other variables.
2.5 Applicability
This reference manual is intended for businesses that design and/or manufacture products in the Aero Engine
Supply Chain. The minimum requirements for both variable and attribute measurement systems are defined as
guidance aligned to AS13100 section 7.1.5.
A Measurement System is the combination of people, equipment, materials, methods, environment, analysis,
and decisions made on the measured results. All measurement systems have a level of uncertainty associated
with them because of variation in these factors. MSA is the method of identifying the level of uncertainty of the
whole system to determine if the Measurement system is ‘fit for its intended purpose,’ i.e., the level of
measurement variation is not significant.
There are several types of MSA; which type is required will depend on the type of data being measured and the
influences on the system.
7
RM13003 - Measurement Systems Analysis
The purpose of MSA is to identify the total variation and systematic error present in the measurement system
and to identify the underlying, contributors to these uncertainties so that actions can be taken to effectively
control it and ensure repeatable and accurate measurements. These studies are conducted to represent the
‘real world’ as much as possible, e.g., range of inspectors, parts that cover the whole specification, normal
working environment, etc.
MSA should be conducted as part of New Product Introduction to validate the measurement system prior to
production. There are also situations where MSA should be repeated, these include: changes to gage design,
refurbishment/repair, environment, product design change to the feature being measured, etc.
The table below is taken from AS13100 chapter 7.1.5 and describes situations where MSA is required. The
latest version of this table is in AS13100.
Correlation to
Any significant change to the current
2 3 3 2 results prior to
verification method:
change
Evaluate
To verify a verification system is adequate
7 3 3 1 current or
before sample and reduced inspection
Perform MSA
Run Computer
New Computer Driven Measurement System driven
8 3 3 2
part program measurement
system
8
RM13003 - Measurement Systems Analysis
1. The table uses the characteristic classification (Critical, Major, Minor) to determine the application of MSA
2. Various scenarios are listed in the “Event Description” column to describe situations where MSA may be required
3 = Mandated
o “Optional but best practice” indicates the organization use engineering judgement and only rule out an MSA study
if the risk of escape is insignificant.
o “At the discretion of the customer” The organization checks if the customer has specific requirements that may
overrule the application of MSA. The customer may also have preferences of which MSA tests to run.
1. Example 1 (Use of Assessment Requirements for MSA): A component never made before in production by that
organization is being measured using a carbon fiber gap gage to size an outside tight tolerance diameter which is
determined by the customer to be a “Major” characteristic.
Being Level 3, means the MSA is Mandated so this equipment must have an MSA study performed on it.
The type of study is defined by review of the equipment, the data, and the process. As guidance a comparator gage
such as this needs to have reproducibility and repeatability study to ensure it always gives the same answer across
different components and operators. A Bias study is also conducted to ensure the measured results are accurate.
2. Example 2 (use of Assessment Requirements for MSA: A component in serial production is moved from an old CMM,
to a new CMM but uses the same CMM programming language. The component has 300 characteristics measured in
one CMM program, some characteristics are Minor, but one characteristic is a Major.
A new Computer Driven Measurement System R&R is required with a bias study (see section 5.10). The bias study
could be conducted by running the component program on the old CMM, and then measuring the same component on
the new machine to evaluate the bias.
ISO9001, AS9100, and AS13100 describes generic competence requirements. For MSA these competence
levels come from several industrial metrology standards but can be generalized as:
• The correct training of MSA practitioners is key to the successful outcome of the process. Each organization
should employ, or have access to, a practitioner who has appropriate experience/qualifications and can
demonstrate competence that includes all elements of MSA.
• The practitioner can drive the right behaviors and training within the organization. This may be through
process, procedure, training, etc.
9
RM13003 - Measurement Systems Analysis
• The practitioner can also periodically confirm the organizations compliance through oversight testing, etc.
• Individuals involved in the MSA study should be suitably trained and competent in the measurement task
to ensure the study is conducted correctly. They should be representative of the measurement system users
to ensure the study is representative of the inspection activities on the shop floor. See also 6.23 Inspector
Qualification.
• The MSA study should be led/facilitated by a person trained and competent in the methodology.
• The organization can nominate a suitably qualified and experienced person from within their own
organization as accountable for deployment of this reference manual and respective compliance.
• Competence can be considered as a combination of training, experience and working knowledge of the
subject.
• The supplier documents the persons competencies in this area. The competence is a combination of
training, experience, and working knowledge of the subject matter.
4.1 Pre-Requisites
The pre-requisites and generic requirements for any type of MSA study are:
• The measurement equipment should be calibrated and traceable to a relevant national or international
guidance.
• The measurement equipment should be maintained in good condition and checked for evidence of damage
or wear, which may impact the measurement capability. It is always good practice to have operators
responsible for checking equipment and calibration status before every use.
• Production parts should be used for studies, except for circumstances where the use of representative parts
or artefacts is authorized. The parts should preferably be representative of the full tolerance, and it is
beneficial to include parts just outside the lower manufacturing limit and upper manufacturing limit.
This will ensure the measurement system is tested through the range of sizes that would be seen in
production. There may be issues with the application of the measurement system that were not planned so
not considered. Using the production components in the study that are at top and bottom limits, with people
representative of the skill sets on the shop floor to fully represent the range of variation in the measurement
system.
• The parts should be as clean and burr free as would be seen by the production inspection method.
• Individuals involved in the study should be suitably trained and competent in the measurement task. They
should preferably be representative of the measurement system users.
• The measurement system analysis study should be led or facilitated by a person trained and competent in
the methodology covered in this reference manual.
• An environment representative of the production operation should be used for the MSA study.
• The method used for any study should replicate the conditions in the production process. Where alignment,
fixturing, and clamping could influence the measured value, the component should be removed and
reloaded between each measurement. This further represents the production process.
• During the study, the personnel performing the measurements should not have visibility of either their own
or other study participant's previous results. This has been shown to cause the inspectors to copy or bias
their inspection result. For the same reason, randomizing the measurement samples is also good practice.
10
RM13003 - Measurement Systems Analysis
MSA Studies require careful planning to ensure that the results are truly representative of the measurement
system. The system should be fully evaluated to identify what could affect the results so that anything likely to
contribute to the measurement variation is included in the study. An MSA study is essentially an experiment to
determine the degree and causes of variation within a measurement system. As such, careful use of Design of
Experiments is recommended (see 6.34 DOE).
• Part variation that will affect the measured value (surface finish, part flexibility, shape, size, etc.)
A useful way of visually expressing the factors is to use a tree diagram example such as the one shown below.
This shows how the top-level factors listed above are broken down into constituent parts, arriving at the test
variable in level 3. Once the tree diagram is completed, variables can be included in the MSA plan, or eliminated.
11
RM13003 - Measurement Systems Analysis
Quality Tests
Before the study is concluded its always good practice to ensure the quality of the tests conducted are fit for
purpose. This is not difficult and can follow the steps shown below to ensure the study plan was valid and data
correctly gathered:
Can the measurement system tell the difference between components measured?
No measurement system is truly perfect, and the errors are identified by several tests. AS13100 section 7.1.5
defines the acceptable levels for measurement systems (measurement capability) based on the type of study
used. The table in the standard list the acceptance limits of the tests nominally as a percentage of the
specification tolerance. Where acceptance criteria are not achieved the following steps are a logical order for
improving the MSA: The order of the steps is somewhat flexible and will depend on the specific application:
12
RM13003 - Measurement Systems Analysis
NOTE: Depending on the specific case and the inspection device used, it is possible to switch steps 2 and 3.
For example if already using a highly accurate CMM, it may be reasonable to check with the design
responsible if the drawing requirements can be adapted before investing in a new CMM. Each stage of
this diagram is explored in this reference manual.
The following section are designed to provide guidance when conducting MSA studies. Where this guidance
cannot be followed as written then alternatives may be appropriate and, in such cases, this may be agreed with
the customer.
MSA studies may be designed to find all the existing variation in the measurement process so that action can
be taken to mitigate it and provide a capable and repeatable measurement system.
• Criticality of dimensions - the degree of confidence required for critical dimensions is higher, therefore, more
data is required.
• Part configuration - large parts, inaccessible features and low numbers of available samples may dictate
low sample sizes which should be recognized in any reports.
• Customer requirements - specific requests regarding the selection of samples may be defined by
customers.
• The type of study being conducted, and the factors determined the number of samples needed, e.g., an
attribute study typically requires more samples than a Gage R&R study.
Sample parts should represent the entire production operating range and ideally the entire allowed tolerance
range.
The MSA analysis techniques assumes parts measured in the study are done in a random order to ensure the
operators are not able to identify parts and provide a level of independence of the individual data points. The
order of measurements within each part may have an order which is documented to standardize the approach.
In some instances, it will not be possible to obtain a fully representative sample of product. In these instances,
a feature that is representative of the manufacturing process and size can be used. Where a representative part
is used, this should be documented as part of the study. The customer may require this to be authorized.
As with any statistical technique, the larger the sample size, the more accurate the results. The number of
samples is, however, very dependent on the measurement process, the type of study being conducted,
availability of parts and the economics of undertaking the study.
Studies where the measured characteristic has significant variation will require a high number of samples to
statistically describe the process, 10 samples are a good point to consider as a minimum. Where this cannot be
achieved, it should be declared as part of the MSA acceptance report.
Processes where the measurement system is subject to human intervention will require a higher number of
repeat measurements per inspector and more than one inspector. To capture that variation its good practice for
each inspector to measure each sample three times or more. Where samples show a good level of control,
each person could measure each sample twice. Where inspectors have low skill or competence, three or more
inspectors could be used to show if the skill level affects the measurement outcome.
13
RM13003 - Measurement Systems Analysis
Attribute studies require a much larger sample size. For a pass/fail study typically 30 or more samples are used
with multiple inspectors. For studies with more complex categorization, more than 30 samples are needed as
well as more inspectors.
Where automated measurement systems are used the human influence might have a negligible effect. Other
factors may however continue to influence the results such as flexible parts or variation in part fixturing
influenced by the operator.
In any test, the number of repeats should only be reduced to a minimum of two when there is supporting
evidence that the influencing factors are controlled.
5.4 Operators
Where possible, a representative sample of operators (two or more) who normally use the measurement
equipment should be included in the study. Its good practice to use a minimum of two people and important
factors to consider include:
• Any physiological factors that may affect the measurement process, e.g., left- and right-handed people,
different levels of eyesight acuity, height, strength, etc.
All participants in the measurement study should be trained and experienced in the inspection task. Do not
include people who do not normally carry out the measurement activities as part of their day-to-day duties. Do
not include 'experts' like Metrologists or Manufacturing Engineers unless they are representative of typical users.
It is usual to use at least 10 parts or features for the measurement study (customer requirements my dictate
specific sample sizes). Using three operators measuring each part two times would give 60 data points (10 x 3
x 2), which is enough to give meaningful result.
Sometimes though there may not be enough parts available. There are several ways that this problem can be
overcome:
• Conduct the study over a longer period as parts become available. This also has the advantage of
incorporating any time-based factors into the study and additional part-to-part variation.
For example, a slotted disc may contain several identical features which can be identified and used to
conduct the study. Potential problems with using this method include randomizing the measurements
effectively and a lack of variation in the features, but with care this can be accomplished. Be aware that this
method may exclude sources of variation that could affect the measurement values and any results obtained
may not be fully representative.
• Use scrap parts. So long as the part is representative of a finished part (all features in the study are present,
and the geometry and surface finish would be expected in a conforming product) then non-conforming parts
can be used.
• If no other options are available, then smaller sample sizes may be used but a note should be added to the
final report highlighting the small sample size and higher risk of the results being unrepresentative. Care
should be taken as a low sample can also lead to a reported low number of distinct categories which may
or may not be the true problem. It is worth looking at mitigations when low samples are used. Consider a
linearity study for instance to show gage variation is limited.
14
RM13003 - Measurement Systems Analysis
The organization FlyHigh manufactures parts to an aircraft engine. One of the existing inspection devices that
are being used to accept a major feature on one product has not previously been evaluated per this reference
manual. To ensure that the measurement system is adequate, the manufacturing engineer decides to perform
MSA.
Ideally, 10 parts would be selected to participate in the MSA. A large sample size is important to produce
statistically useful results and is also more representative of the population, limiting the influence of outliers or
extreme observations. In this case, the organization FlyHigh does not have enough numbers of parts available
in house in a reasonable time. The inspection device in question for the study is being used on a low volume
part, manufactured six times per year. As the manufacturing engineer plans the study, she goes through
different strategies to evaluate the measurement system despite the low sample size.
One alternative is the use of multiple features within a single part. A good example of this is using individual
slots from a disk or holes in a flange as replacements for parts. Instead of part number 1 it is possible to nominate
slot or hole number 1 and so on until enough samples are obtained. The key is to replicate the feature to test
the measurement system, however, this is not an option for FlyHigh since the feature (variable diameter,
categorized as a ’Major’ feature) being measured does not replicate on the part.
Unless the inspection device is part specific or there are geometric constructions needed, it is possible to use
surrogate parts to increase the sample size. It is important to remember that measurement system is being
evaluated, not the part. Using other parts with the same geometry can be used to evaluate the system if care is
taken to ensure that features are close enough in geometry, surface finish, accessibility, etc. The MSA study
could use three of one-part number, five of another, and two of a third, giving the required 10 parts. The
organization FlyHigh does not manufacture any similar surrogate parts that could be used in the MSA study.
Finally, the manufacturing engineer at FlyHigh evaluates the possibility of using scrap parts in the study. They
find parts that have been scrapped due to other dimensional features being out of tolerance, but which have
finished diameters on the feature being used in the study. The conclusion is that there are two scrap parts that
are, in respect of the MSA feature, representative of a finished part. Furthermore, the organization FlyHigh only
has two operators that routinely use the gage in the study. No other operator will be part of the MSA unless they
are trained to use the gage and familiar with the part being measured.
Lowering the sample size will affect the reliability of the test. Being mindful of the low sample size, the
manufacturing engineer starts the MSA determining that the gage resolution and accuracy ratio both meet the
acceptance criteria. The ANOVA method is used to provide an approximation of the Gage R&R value. The
following design factors are used for the study:
• Two operators
• Three repeats
Gage Details:
15
RM13003 - Measurement Systems Analysis
Data:
Operator A Operator B
Part Trial 1 Trial 2 Trial 3 Trial 1 Trial 2 Trial 3
1 348.888 348.889 348.888 348.889 348.89 348.888
2 348.891 348.89 348.892 348.893 348.892 348.893
3 348.898 348.896 348.896 348.899 348.898 348.898
4 348.891 348.892 348.891 348.89 348.89 348.891
5 348.898 348.897 348.899 348.896 348.897 348.898
Results:
Conclusion:
The total gage repeatability and reproducibility (GRR) result of 35.07% of tolerance was unacceptable for a
feature classified as ‘Major’; ideally this should be no more than 20% of tolerance (see AS13100 MSA
acceptance Limits). The variation was almost equally split between repeatability and reproducibility indicating
that there were issues with both the suitability of the gage and the operators, although both factors could be
due to the fitness for purpose of the gage. The dial gage was removed as a suitable gage for this application
and was replaced by an alternative with more acceptable variation.
When conducting a measurement study, it is important to identify variation in the part that could affect the
measurement result.
As an example, consider measuring a shaft diameter where the diameter of the shaft is not perfectly round.
Running a measurement study will capture both variation due to the measurement system, and variation of the
diameter shape. It is advisable to identify the amount of in part variation before the study is conducted. Marking
the component to ensure measurements are taken in the same place will limit the in part variation, but the study
will not then be representative of the measurement process seen in production. Where in part variation is
considerable, compared to the size of the tolerance, this should be declared to the customer as part of the
measurement study.
Taking measurements at the same points will reduce measurement variation but will not replicate what actually
occurs in production. Marking of components to indicate where to measure should not be the solution to improve
the Gage R&R results. If measurements need to be specifically positioned, build this into the measurement
process using a small ‘jig’ or tooling to ensure the measurement was always taken at the correct point. This
would then be tested as part of the MSA studies.
When a large portion of repeatability is due to in part variation, this indicates that the measurement (strategies,
e.g., speed, amount of points, number of sections, feature calculation method, fixturing, etc.) has not been
properly defined and needs improvement.
16
RM13003 - Measurement Systems Analysis
For example, if part roundness has a big effect on repeatability of diameter measurement, then the probable
cause may be that not enough points have been specified on the circle or cylinder or the calculation method
used to calculate the feature is affected by form (i.e., inscribed or circumscribed calculation for ASMEY14.5
calculations as opposed to Least Squared Average calculations for the same feature). Increasing the number
of points and confirm the calculation method required with the customer.
Sometimes the measurement itself needs to be redefined. For example, the roughness of a surface will be
measured as the highest result of three random locations, this will be less repeatable than defining the
roughness of the surface as the highest reading on three specific locations on the surface.
5.7 Linearity
Linearity is evaluated through the entire range of values that the measurement system is used on. The minimum
number of reference values required to do this is three: minimum and maximum dimension plus a mid-point to
check for consistency of any error. Ideally five will be used to give more reliable results.
The key to evaluating linearity is to be confident that the reference values are known to a level of certainty which
is well within the tolerance of the gage. Generally, using an alternative proven method to establish reference
values is the best way to do this.
Its best practice for each point to be measured at least 10 times by one operator. Ideally do this in random order
so that the operator does not influence the result. Using five points this would give a total of 50 measurements.
The bias observed at each of the measurement points can then be calculated by subtracting the reference value
from the recorded value and then plotted on a graph (the black dots in the example). The average bias at each
point may then be calculated (the circled X in the example). If there was no bias the average values would all
be near the zero dashed line.
The best fit line (the solid line on the chart) can be determined by Least Squared Error analysis which is common
in most software or spreadsheet applications. In summary this means the slope of the graph is the measure of
Linearity.
Example: Linearity:
If the range of measurements is 25 to 45, then measurements should be made at 25, 30, 35, 40, and 45 °C
reference points.
The bias at reference value = 35 °C is acceptable but below this the temperature gage under-reads. As the
reference temperature value increases the bias also increases up to a maximum of 0.659 over-read at reference
value = 45 °C. This demonstrates a lack of linearity and shows that full Gage R&R studies are performed at all
reference values. The slope of the best fit line is calculated so that bias levels are predicted at other reference
values. Do not, however, extrapolate this line outside the range chosen as the error may not be consistent.
17
RM13003 - Measurement Systems Analysis
A casting organization is asked to perform an MSA and Linearity study for all inspection equipment that will be
used on a new turbine blade they are developing. One of the pieces of equipment is a transducer gage which
inspects 80% of the required features. The MSA study shows the gage results well within the acceptance criteria
for all features inspected, however, the Linearity study shows a Linearity problem on a feature where the gage
touches the airfoil.
A review of the gage showed the angle at which the transducer touches the part is based upon a nominal blade.
A review of the blades used showed varying material condition on the datum locators causing the location of
the inspection point to vary. A review of the process capability over three production lots showed the feature
nominally shifted. The Gage R&R results indicated the process variation was 0.0031.
0.000 0
0.0111 -0.00174 56.1 0.000
2 0.0111 0.0090 0.012 -0.00002 0.6 0.081
0.0127 0.00185 59.7 0.000
2 0.0111 0.0099 0.0139 0.00213 68.7 0.000
80
2 0.0111 0.0090 -0.004
0.010 0.011 0.012 0.013 0.014
2 0.0111 0.0090 Reference Value 0
Linearity Bias
2 0.0111 0.0090
2 0.0111 0.0099
18
RM13003 - Measurement Systems Analysis
As noted, while the results are repeatable, there is significant Bias (8.4%) as the part datums cause the airfoil
to deviate from nominal. This Linearity error is impacting the observed process variation, with a linearity error
of 162.2%.
Improvements and Next Step Results: The dimensional variation was causing the transducer to touch the
airfoil in different locations on each blade. The organization experimented with the transducer angle until they
found the angle least susceptible to material and geometry influences. Once the adjustments to the gage were
complete, the organization re-ran the Gage R&R and the Linearity studies. The resulting Gage R&R (not shown)
found the process variation decreased to 0.0015. While this did not eliminate the linearity error, it reduced it to
such that it had no significant impact on the results.
1 0.0102 0.0102
Avg Bias
S 0.0000376 R-S q 66.0%
Linearity 0.0000604 % Linearity 4.0
1 0.0102 0.0101 0.00005
1 0.0102 0.0101 G age Bias
Reference Bias % Bias P
1 0.0102 0.0101 A v erage 0.00000 0.0 1.000
0.0102 -0.00008 5.3 0.000
1 0.0102 0.0101
Bias
0.00000 0
0.0111 -0.00002 1.3 0.081
1 0.0102 0.0102 0.012
0.0127
0.00000
0.00002
0.0
1.3
*
0.081
2 0.0111 0.0111 0.0139 0.00008 5.3 0.000
-0.00005
2 0.0111 0.0111
Percent of Process Variation
2 0.0111 0.0110 4
2 0.0111 0.0111
Percent
-0.00010
2
2 0.0111 0.0111 0.010 0.011 0.012 0.013 0.014
Reference Value 0
Linearity Bias
After rework, the linearity only has a 4% influence on process variation and the scale of impact is much smaller.
Even with a good repeatability result, an unacceptable error in the gage may be present. All potential sources
of error should be examined.
The following section is an abridged guide with additions, to the methodologies set out by Dr. Donald J. Wheeler
in his book "Evaluating the Measurement Process III - Using Imperfect Data." These techniques are based upon
well-established mathematical and statistical principles. They focus upon the Measurement System as a
process; they provide information that is not found in other established methods. They provide a check for the
Consistency of the system, the correct Precision of the system, Bias in the system and a true measure of the
Fitness for Purpose (Relative Utility) of that system.
These techniques can be applied to serial data, when several parts or a process flow exists; but also have a
particular benefit when serial data is not available, i.e., in low volume production or in the Maintenance Repair
and Overhaul (MRO) environment, where two parts or repairs are seldom the same.
19
RM13003 - Measurement Systems Analysis
Where serial data is available, the Fitness for Purpose (Relative Utility) of the measurement system can be
estimated using the concept of the Intraclass Correlation Coefficient (ICC). (The ICC is the true ratio of the
Product Variance to the Total Variance. The inverse of this represents the proportion of the total variation due
to the measurement error.) When used with serial production data the ICC is used to classify the Measurement
System by its ability to detect a shift in the process, selectively applying the standard control chart detection
rules. This demonstrates that a measurement system that might, by other MSA methods be condemned, can
be shown to be entirely capable of monitoring and warning of a process shift.
The EMP and ANOVA approach to measurement systems analysis produce very similar results, however
whereas ANOVA produces a table of results EMP uses a more graphical and easily interpretable set of analysis
results.
The flow chart below describes the process steps for the three EMP methods described in this guide. Wherever
possible a Consistency study should be carried out as a first step, as it provides the best estimate of the standard
deviation of measurement error.
Yes
No No
Compute and
No Detectible Yes compare
Consistent? differences in Repeatability for
Repeatability? each operator/
machine
Yes No
Consistency and Short EMP studies, can be used in the absence of serial data from a ‘Process,’ i.e., very low
volume or maintenance or repair environments. Basic EMP studies can be applied where a ‘Process’ exists,
because a reasonable spread of data across a specification is required.
Bias is assessed using the normal Student’s t-distribution method; and as with all studies, the measurement
process must be consistent before bias can be correctly assessed.
Consistency
Consistency is the first step in the characterization of any measurement system; however, this should not be a
‘one time’ study as all processes deteriorate over time, periodically repeating the study will ensure the
measurement process remains consistent.
20
RM13003 - Measurement Systems Analysis
A consistency study involves of a single operator repeatedly measuring the same part or sample using the same
instrument over a period; at least twenty measurements should be made, and the results placed on an Individual
and Moving Range I-mR Chart, as these are repeat measurements the chart is plotted with a Moving Range
size of 2 as shown below:
3.5 _
X=3.337
3.0
2.5 LCL=2.543
1
1 3 5 7 9 11 13 15 17 19 21 23 25
Observation
1 .0 UCL=0.976
Moving Range
0.5
__
MR=0.299
0.0 LCL=0
1 3 5 7 9 11 13 15 17 19 21 23 25
Observation
Worksheet: Worksheet 1
* Bias can be determined from a Consistency study if a known sample, Calibration standard or Artefact is used.
The charts are plotted with ±3 Standard Deviation limits (Red lines) about the Xbar Mean value line (Green).
The measurement is displaying inconsistency as point 20 in the chart above is beyond the three sigma limits
produced by the I-mR chart. The cause of this inconsistency must be identified and fixed before the
measurement system can be characterized further. A chart showing inconsistency demonstrates that the
measurement process is not acceptable. Any further analysis will only provide an Estimate as the system cannot
be relied upon to continue performing as expected.
It is important to deploy some quality checks on the data. This is described in section below on Chunky data. If
the plotted data is not chunky (Chunky Data is defined as the presence of only three or fewer possible values
on the Range chart) but does contain several Zeros on the Range charts, the calculation of the standard
deviation may artificially shrink the three standard deviation limits on the charts and causing false alarms on this
quality check.
The same data placed in a Type 1 study gives the impression that the system is capable as the metrics for Cg
and Cgk are above 1.33 and %Var and %Var with bias is below 15%; however, it can now be seen that this is
a false impression (see figure below which shows the same measurement data in a Type 1 study).
** Note: In situations where the sample cannot be repeatably measured, e.g., a destructive test, the Fitness for
Purpose of the measurement system can be obtained using the Intraclass Correlation from a consistency chart
and a process behavior chart. Refer to EMP III by Donald Wheeler for further details.
21
RM13003 - Measurement Systems Analysis
4
DM1 6_1
Ref
3
Worksheet: Worksheet 1
The Estimated Standard Deviation is commonly used to characterize precision, however there is an alternative
using the term “Probable Error.”
This uses the normal distribution of repeat measurements to provide an estimate of the standard deviation.
In EMP, the middle 50% of the distribution is defined as ±1 Probable Error and is calculated as the mean value
±0.6745 x Standard Deviation (±One Probable Error). Consider that there is a 50% chance that the measured
value will lie within the central zone of the population (see figure below) and a 50% chance it will lie outside the
central zone. When a measured value lies within the central zone, the error of the measurement will be less
than one probable error.
σ σ σ σ
22
RM13003 - Measurement Systems Analysis
The Probable Error is the effective resolution of the measurement system. This can now be used to determine
if the measurement system has the correct measurement increment.
• When measurement increments are greater than 2 Probable Errors, data will be lost due to the system
rounding off.
• When measurement increments smaller than 0.2 Probable Errors are reported, those reported numbers
will be pure noise.
As the Probable Error places a limit on the reporting precision of the measurement system and its scaled
relationship to the Standard Deviation, it can be used to calculate Guard Bands; these are applied to the
tolerance bands to protect against the effect of measurement error. There are several ways to calculate guard
bands, below is case study using Probable Errors.
GB GB
If the
measurement
result is in here
The worked examples below are of a batch of Pins with a very close tolerance (6.5626 mm/6.5748 mm)
measured on a SIP MUL1000 single axis measuring machine.
A single pin was measured 30 times and the data plotted on an I-mR chart below.
The data shows a consistent measurement system with all data points within the three sigma limits.
23
RM13003 - Measurement Systems Analysis
Consistency Study
Pin 1 Large Diameter
6.569 UCL=6.569105
Individual Value
_
6.568 X=6.567903
6.567
LCL=6.566702
1 4 7 10 13 16 19 22 25 28
Observation
0.001 5 UCL=0.001476
Moving Range
0.001 0
__
0.0005 MR=0.000452
0.0000 LCL=0
1 4 7 10 13 16 19 22 25 28
Observation
Worksheet: Worksheet 1
Based on the Consistency study above, the Standard Deviation of Measurement Error SD(E) is calculated by
dividing the MRbar by the Bias correction factor d2. The bias correction factor corrects for the estimation of the
standard deviation based on a subgroup size of n-1 instead of n. The values are found in published statistical
tables including the AIAG Handbook. Because the consistency study is of repeated individual values the
smallest subgroup size of two is used.
n d2
2 1.128
3 1.693
4 2.059
5 2.326
6 2.534
7 2.704
The equation for the Standard Deviation of Measurement Error SD(E) is:
SD(E) = MRbar / d2
For the Subgroup size of 2 taken from the table is 1.128 and MRbar = 0.000452
24
RM13003 - Measurement Systems Analysis
As discussed above the correct Precision should lie between 0.2 and 2 x the Probable Error
Check for correct Precision: Max increment PE x 2 = 0.00054 / Min increment PE x 0.2 = 0.00005
The measurement system reported data to 0.0001 mm, this is well within the tolerance and so systems reporting
Precision is correct.
The Guard Bands can be calculated using different multiples of the Probable Error to reflect different confidence
levels. If the tolerance band is reduced by one Probable Error at each end and the measured value falls within
this zone there is an 85% chance (0.6745 x SD(E)) that it is conforming, similarly if the tolerance band is reduced
by two Probable Errors there is a 96% chance (1.35 x SD(E)) that it is conforming and if the tolerance band is
reduced by three Probable Errors there is a 99% chance (2.025 x SD(E)) that it is conforming. The normal
confidence level for a measurement system is 96%, however 99% can be applied if desired.
For a 96% confidence Guard Band (GB) the Standard Deviation of Measurement error is multiplied by 1.35
As already established, the correct measurement increment contains a level of rounding to include when
calculating the Guard Bands. This means that the reported value could be up to half an increment larger at the
upper end and half an increment smaller at the lower end. These are termed the Watershed Limits.
To ensure that at the chosen confidence level there will be no non-conforming products due to measurement
error, the Guard Bands are applied to these Watershed Limits. The Pin specification limits are 6.5626
mm/6.5748 mm.
In the absence of multiple Parts or repeated features on a part, this method can be employed to ensure that,
depending on the confidence level chosen, no non-conforming parts or features will be accepted by the
measurement system.
Chunky Data
Chunky Data is defined as the presence of only three or fewer possible values on the Range chart of a
consistency study of individual values, or four or fewer values where the study is of grouped values. Chunky
data is a problem as it deflates the standard deviation and this in turn shrinks the control limits leading to false
alarms.
25
RM13003 - Measurement Systems Analysis
4.0055
_
X=4.005233
4.0050
4.0045 LCL=4.004454
1 4 7 10 13 16 19 22 25 28
Observation
1
0.001 0 UCL=0.000958
Moving Range
0.0005
__
MR=0.000293
0.0000 LCL=0
1 4 7 10 13 16 19 22 25 28
Observation
Worksheet: Worksheet 1
If chunky data is detected in the analysis, it is undesirable and generally means one of two things. Either the
measurement system needs an additional digit or there is insufficient variability between parts, possibly because
the serial production process has very good process control compared to the tolerance so there is little
variability.
If possible, employ a measurement system that can add an extra digit; alternatively try and increase the range
by sampling parts over a longer time frame. This is important when the process variation is very small compared
to the tolerance.
NOTE: There is no cost justification in improving a measurement system just to tell the difference between near
identical parts that display very little process variation compared to the process tolerance.
The example above demonstrates this well. There are only three values in the range data and the consistency
chart shows a false alarm as one point is above the upper limit, but the calculations of that limit are based on
the Standard Deviation that has been reduced by the number of Zero’s in the Range data. The tolerance to be
measured is 0.020 mm and the maximum range in the data is 0.001 mm; so, there is no justification to improve
this measurement system; a short EMP study using this instrument would demonstrate its Fitness for Purpose.
A Short EMP Study uses several parts to make an estimation of the usefulness (relative utility or fitness for
purpose), of a systems measurement to detect the difference between parts. One operator uses one instrument
to repeatedly measure a range of parts. The data is placed on an Average and Range chart, where the
measurements are sub-grouped together. The repeatability is shown within the sub-group and the part variation
will be shown between the sub-groups.
The Relative Utility is illustrated in two ways, first by an initial graphical representation and then by a formal
calculation using the Intraclass Correlation Coefficient (ICC).
26
RM13003 - Measurement Systems Analysis
6.572 1
Sample Mean
UCL=6.570462
__
6.570
X=6.56976
LCL=6.569058
1
6.568 1
1
1 2 3 4 5 6
Sample
UCL=0.002573
0.002
Sample Range
_
R=0.001217
0.001
0.000 LCL=0
1 2 3 4 5 6
Sample
Worksheet: Worksheet 1
The study example measured the batch of six pins five times each. The Consistency of the instrument has been
established in the previous section. If that had not been done the SD(E), Probable Error and Precision can be
estimated from the Average Range (R bar 0.001217) divided by the Bias correction factor d2. As the pins are
measured five times the subgroup size for this study is five, the value of d2 found in the table in the previous
section show us that for n = 5, d2 is 2.326. Using the calculation for SD(E) for sub-grouped data of
SD(E) = Rbar / d2
This is reasonably close to the better estimate of SD(E) 0.0004 mm obtained from the larger amount of repeat
data from the consistency study. If a consistency study has been carried out the SD(E) from that study should
be used.
This can now be substituted into the equations previously described. A better estimate is always obtained from
a Consistency study with many repeats.
From the range chart, the system is repeatable as all the points lie within the range limits, these limits describe
the measurement error of the system are similarly portrayed by the limits on the Sample Mean chart. The initial
assessment of Utility is that as most parts are detected outside the bounds of the measurement error; the system
can detect the difference between parts. To formally assess the Relative Utility the next step is to calculate the
ICC.
27
RM13003 - Measurement Systems Analysis
The pin measurements are tabulated, and the Means and Ranges established.
Large
dia Pin 1 Pin 2 Pin 3 Pin 4 Pin 5 Pin 6
The Product Variance = the Range of the Means of the parts, divided by the Bias correction factor, squared.
VP = (Rp / d2)2 Where the bias correction factor d2 for a subgroup of five repeats = 2.326
Measurement Variance VE = The Average Range (Rbar), divided by the bias correction factor, squared.
VE = (Rbar / d2)2
VE = (0.001217 / 2.236)2
VE = 0.000000296
If the ICC value were to be 1 (100%) there would be no measurement error detected by contrast if the ICC
value were 0 then the measurement process would be incapable of detecting any variation in the products
as they were measured.
The EMP methodology gives guidance based on proven empirical data, as to how to interpret the ICC values.
The ICC values are classified into classes of monitor, being First, Second, Third and Fourth.
Each of these classes of monitor has been proven empirically to represent the ability of the measurement
process to track changes in the process and thus their ability to track process improvements. The Rules
referred to are the standard Western Electrical detection rules for process monitoring.
Our calculations above confirm that 93% of the variation is due to product variance and 3% is due to
measurement variance. With an ICC of between 1.0 and 0.8 this process is a First-Class monitor; it is capable
of monitoring process control with a more than 99% chance of detecting a 3 Standard Deviation shift in the
process using ‘Control Chart detection Rule 1 alone (One point greater than 3 sigma from the center line).
Less than 10% of the process sensitivity is consumed by measurement error and it is capable of tracking
process improvements up to a Cp of 80. Even if the ICC indicated that the process monitor was Second
class, if the first four Control Chart detection rules below are used, it would be virtually certain to detect the
same shift.
28
RM13003 - Measurement Systems Analysis
The Control Chart Detection Rules were developed by the Western Electric company and are known as the
Western Electric Zone Tests. A process change is detected in either plot of an I-mR or Xbar-R chart if one
of the following is detected:
Rule 2: Two out of three points on the same side of, and more than 2 sigma from, the center line.
Rule 3: Four out of five points on the same side of, and more than 1 sigma from the center line.
Rule 4: Eight successive points on the same side of the center line
These rules will be found in widely used analysis software, but they may not be in this order. In one widely
used software package they are in order 1, 5, 6 and 2, so care must be taken to select the correct tests when
analyzing the data.
A First-Class Monitor meets the Precision to Tolerance requirements for Critical, Major, and Minor,
categories, as 10% or less, of the measured process signal is consumed by the measurement system.
Therefore, if plotted on a process behavior chart, any signals will have come from the process not the
measurement system.
First Class Monitors 1.00 to 0.80 Up to Cp 80 Less than 10% More than 99% with Rule 1
Fourth Class
Monitors 0.20 to 0 Unable to Track More than 55% Rapidly vanishes
In summary, from a Short EMP study check for consistency, estimate the precision using the probable error,
make an initial estimate of the ability to detect product variance, make a formal computation of the relative utility
using the ICC. Also use the SD(E) to calculate the Guard Bands.
The traditional methods of stating a Precision to Tolerance ratio overstate the effects of measurement error, this
method is based upon observations and experimental data to demonstrate the probability that a conforming
product lies within the manufacturing specification (the Design Specification less the Guard Bands) This is based
on one of the fundamental laws of probability theory that have been known since the Eighteenth Century. Again,
the Honest Precision to Tolerance ratio can be calculated for various confidence levels, based on the Probable
Error as described previously.
The tolerance for the pins from the previous example is 6.5626 mm / 6.5748 mm
29
RM13003 - Measurement Systems Analysis
If this method is employed the AS13100 Table 4 acceptance bands are used.
A Basic EMP study builds upon a Short EMP study as it introduces additional sources of variation, this could be
additional operators, additional measurement instruments or both., Just as for the Short EMP study the data
are plotted on an Average and Range chart where the repeated measurements are sub grouped together and
the differences of the various variable show up between the subgroups. With this study parts covering a
reasonable cross section of normal process variation are required.
This example compares four parts selected at random from a batch and a bore measured using an air gage,
three operators measured each bore three times and the Averages and Ranges plotted.
The Average chart represents the average value obtained by each operator for each part.
Operator A A A A B B B B C C C C
Part Number 1 2 3 4 1 2 3 4 1 2 3 4
Values 3.174 3.1745 3.174 3.171 3.174 3.1745 3.1745 3.171 3.1735 3.1745 3.1735 3.172
3.1735 3.1745 3.174 3.1715 3.1735 3.1745 3.174 3.1725 3.1745 3.175 3.174 3.1735
3.1735 3.1745 3.174 3.172 3.1735 3.1745 3.1735 3.1725 3.1745 3.175 3.1745 3.1725
Average 3.1737 3.1745 3.1740 3.1715 3.1737 3.1745 3.1740 3.1720 3.1742 3.1748 3.1740 3.1727
Range 0.0005 0 0 0.001 0.0005 0 0.001 0.0015 0.001 0.0005 0.001 0.0015
30
RM13003 - Measurement Systems Analysis
The Range chart represents the repeated measurements obtained for the four parts and it also shows us the
consistency of the measurements taken by the three operators. The chart shows that there is consistency
between the three operators as all points are within the limits. Consistency must be established before further
analysis is performed.
Next look for Interaction Effects, to do this look at the patterns of the plotted data. Interaction Effects show up
as a lack of parallelism between the lines.
31
RM13003 - Measurement Systems Analysis
Looking at these charts there are three lines to compare for parallelism. From our previous sections it is known
that the area between the limits represents the measurement error, so care must be taken when making
judgements about interactions for average data that is in this area. Averages outside this area should be
scrutinized for interactions. Based on the Average chart at the beginning of this section, where about 50% of
the averages are within the control limits, it can be stated that there are no interaction effects present in the
data.
The next step is to look for detectable differences in the Range chart, it has already been stated that the
measurements are consistent and there are no interaction effects present. Now establish if the three operators
are detectably different from one another in terms of Repeatability and Test Retest Errors. To do this the Basic
EMP study uses a Mean Range Chart, this is termed an ANnalysis Of Mean Ranges or ANOMR chart. The
chart plots the Average Range values taken from the table at the beginning of this section.
32
RM13003 - Measurement Systems Analysis
Although operator A is very close to showing a detectable difference the point is above the limit, so can conclude
that there is no detectable difference in Test-retest errors between the operators.
Relative utility is calculated as before, and initially showed that there is relatively good utility as half the parts
are outside the measurement error. The formal utility calculations returned an ICC value of 0.9 so the process
is a First Class monitor. The last step is to see if there is a detectable operator bias; this uses a Main Effects
Chart, or ANalysis Of Main Effects (ANOME)
This chart plots the averages of the part values for each operator from the table.
There are software packages available that support Basic EMP studies and produce the charts and analysis,
Manual creation of the charts is complex and requires further use of statistical tables an excerpt of which is
attached below for ANOME0.05.
This chart is constructed by firstly calculating the grand average. Looking at the table of results there are 12
averages (four for each of the three operators) to compute the grand average, in this case the grand average
is 3.1736 mm.
Next, the average value for each operator is plotted on the chart, Operator A = 3.1734, Operator B = 3.1735
and Operator C = 3.1739.
Finally, the Upper and Lower Mean Range (UCL and LCL) Control limits are plotted. These require the use of
a Scaling Factor; this is derived from the ANOME0.05 table below using the following values to look up the
required value.
33
RM13003 - Measurement Systems Analysis
Select the sensitivity of the ANOMR chart to use. It is recommended to use a 5% chance of miss-classification
which is denoted as 0.05.
The ANOME chart shows that operator C has a detectibly different average value than operators A and B. The
reason for this was not clear at this point but bear in mind and re- calculate the effect that this would have on
the Intraclass Correlation Coefficient.
Recalculating the ICC to account for operator C; having previously established that this measurement system
has an ICC of 0.90. There is a nuisance element of variation in the form of detectible operator bias to be taken
into consideration.
To take the bias for Operator C into consideration re-compute the ICC. The formula is similar to that used before
except substitute the Range of the Part Averages for the Range of the Operator Averages.
The range of these averages is 0.0005 which is R o (Range of the Operator Averages). Substituting R p for R o
the formula for calculating Operator Variance (O) is:
Ro 2
� �
d2
d2 is the bias correction factor for the number of operators in the study which is 3 (n=3)
34
RM13003 - Measurement Systems Analysis
The ICC taking the operator variance into account formula is:
Variance P
Variance P + Variance 0 + Variance E
0.001212
=
0.001212 + 0.0002952 + 0.0004132
= 0.850
Then conclude that the detectible operator variance seen on the ANOME chart for Operator C degraded the
Intraclass Correlation Coefficient from 0.896 to 0.850. Despite this, the measurement system remains a First
Class Monitor for this product. A First Class Monitor meets the Precision to Tolerance Rules in AS13100 Table
4 for Critical, Major, and Minor, as 10% or less of the measured process signal is consumed by the measurement
system. Therefore, if plotted on a process behavior chart, any signals will have come from the process not the
measurement system.
Note that the differences between the averages of the operators amount to little more than one Probable Error
therefore confirming the fact that little or no impact relates to this.
Guard Bands and Honest Precision to Tolerance are calculated as described in previous section.
5.9 Stability
Stability, sometimes referred to as drift, is a measure of the measurement system variation over time. This is
sometimes caused in instruments by deterioration in mechanical or electronic components, or by a change in
the method used by the operator, for example by trying to do the task more quickly.
35
RM13003 - Measurement Systems Analysis
In this example, a reference sample is measured repeatedly over a period. Some variation will normally be
observed due to the measurement error associated with the system but there should not be any trend in the
data or evidence of any special causes of variation. The chart above clearly shows a trend rising as time
progresses and the special cause highlighted. Any sign of instability should be investigated, and where it affects
the measurement system, the cause should be identified and where necessary, removed.
A stability check is often used for testing measurement systems by practical measurement of an artefact. Testing
the measured result by measuring a known item every shift/day/week to confirm the measurement system
continues in a known status between calibrations.
Many aero engine components are protected from environmental attack by a paint coating. One of the critical
features of the paint is the coat thickness and this in turn is affected by the paint viscosity, so prior to (and
during) any paint application the viscosity of the paint should be measured and adjusted accordingly.
Paint viscosity may be checked by taking a sample and testing it in a viscometer such as the one pictured. This
machine uses a rotating spindle to measure the spin resistance when it is immersed in the coating. This can
then be related to the viscosity, which is measured in centipoise (cP) units.
The stability of the measurement process is critical as there are many factors that can influence the readings
such as spindle condition, ambient temperature, and equipment wear, so it is recommended that equipment
such as this is monitored to ensure that its accuracy and precision do not vary with time.
Stability Checks:
The stability of viscometers can be checked by using a guidance certified fluid to gain an accepted reference
value. Care needs to be taken to ensure that the fluid is appropriate for the type of spindle, rotational speed
used and typical application. Parameters such as solids content and viscosity range need to replicate the typical
application that the instrument is used for and should be near to the middle of the range that the instrument
works at, in this case 75.0000 cP.
To measure both precision and accuracy, several samples need to be checked; in this case five (5) samples a
week were measured over 20 weeks. To eliminate and environmental factors, both the test sample and
instrument were kept in a temperature-controlled environment at 20 °C ± 2 °C.
The five samples were then plotted on a mean average (Xbar) and range (R) control chart. Xbar is the
arithmetical center of the sample and R is the distance between the smallest and largest data points in each
sample.
Upper and lower control limits (UCL and LCL) are calculated for the chart so that any special cause of variation
can be identified. The mean of the means (Xbar-bar) and range (Rbar) are also calculated to show the process
centers.
36
RM13003 - Measurement Systems Analysis
Results:
Stability is indicated by the lack of any special causes of variation on the Xbar-R chart. The tests for special
causes are:
• One point more outside any control limit (which would indicate an increase in the variation)
• Eight points in a row on the same side of the center line (which would indicate a shift in the values)
Neither of these rules has been broken so the result is that the instrument can be said to be stable over the time
period chosen for the test (20 weeks).
The chart also tells us that the accuracy or bias of the instrument is good since the overall mean is 75.0145,
only 0.0145 cP above the true value of the test fluid. The range of the five samples has a mean average of
0.509 cP and maximum of just 1.075 cP which is within the machine specifications quoted by the manufacturer.
No further action was required.
Measuring a feature with two different measurement systems will often give two different measurement results.
The relationship between the two values is the measurement system correlation.
To ensure measurement systems give accurate answers, measurement correlation is often checked against a
reference value which can be considered as a very accurate measurement. This is often known as the
measurement Bias.
• It must be independent (two-man rule, different inspection device (if possible), measuring strategy,
alignment, etc.).
• It must be accurate (the accuracy of the reference gage must be higher than or at least the same as that of
the production gage.
37
RM13003 - Measurement Systems Analysis
For measurements done with handheld gages (caliper, micrometer, etc.), it is often easier to find a reference
value by measuring a calibrated item such as a gage block to gain a reference value. To obtain a reference
measurement for a CMM, there are different ways:
• Increase point density and distribution on characteristics (e.g., probe a diameter with 30 points instead of
13 points).
• Measure characteristics in different positions (cylinder in different heights) to establish a real reference
value.
In general, it is reasonable to repeat the reference measurements a few times to check whether there is a high
variance in the reference measurement and then using the average of the measurements for correlation.
Where several measurements are taken (best practice is to take 10 or more), the average value of the range of
measurements is used to evaluate the bias between the measurement and the reference value. Bias is normally
quoted as a direct value (in the same units as measured), as a percentage of the overall process variation or
as a percentage of the allowed tolerance. An unacceptable level of correlation or bias would indicate that the
measurement system has an error, the measurement process is flawed or there is a calibration error.
NOTE: Bias is often evaluated at different dimensions in the range of the measurement system; see Linearity
in section 6.7.
This is described as the number of groups within the process data that the measurement system can discern.
A group of measured results will be grouped by measurement scale, result rounding, etc., and the discrimination
is often used as a key quality check of the measurement system.
If the number of categories is low, the measurement system might be poor, or the measurement samples are
clustered compared to a relatively large tolerance zone. If a measurement system's discrimination is inadequate,
it may not be possible to accurately measure process variation or quantify measurements for individual parts.
A low number of categories may indicate that the measurement system resolution is insufficient. However, a
value below five may indicate that the parts measured in the study are too similar or do not represent the entire
range of the process.
38
RM13003 - Measurement Systems Analysis
In cases where high process capability is present, best practice is to check measurement capability against the
“% of process variation” rather than the “% of total tolerance” seen in AS13100 MSA Acceptance Limits. As
process variation is less than the tolerance, applying these limits will ensure the measurement system
adequately controls the process and the manufacturing process is not limited by the measurement system.
• Two-sided tolerance
o e.g., a linear dimension 100 𝑚𝑚𝑚𝑚 ± 0,01 𝑚𝑚𝑚𝑚. Exceeding the tolerance in any direction leads to a
nonconforming product
• One-sided tolerance
For each of these categories different things are considered when conducting MSA’s. In each MSA study,
compare the variation and bias of the measurement system against a tolerance in order to assess whether the
uncertainty is ok or not.
𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑉𝑉𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎
𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 = ⋅ 100%
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇
Two-Sided Tolerances
Most common case, the uncertainty of a measurement system is compared against the feature tolerance or
process variation.
One-Sided Tolerance
In these cases, there is no limitation on one side of the tolerance. For example, the wall thickness of part could
have a MIN thickness specification. The only limiting factor in making the wall thicker may be weight or physical
fit. Depending on the specific situation there are different approaches for these cases which may have to be
approved by the customer.
• Difference between average actual value of characteristic and lower (or upper) limit as tolerance (Average
over different parts, use results from SPC).
39
RM13003 - Measurement Systems Analysis
These are characteristics like position, form of a line/contour, flatness, roundness. All these characteristics have
a lower limit of 0 (where nominal and actual contour/position completely agree).
Additionally, with these characteristics the actual value of a measurement often gives insufficient information
regarding the exact deviation from the nominal.
An easy example is roundness of cylinder: The result of a roundness measurement is often accompanied by a
plot of the measured element as the actual value alone does not show where the maximum and minimum
deviation is.
What does this mean for an MSA? When analyzing repeatability, reproducibility or bias it is important to check
whether the deviation orientation is comparable as well.
In general, the analysis should involve comparison of the deviation orientation. This could even be something
as simple as comparing the graphical plots of for example, a scan contour.
When calculating acceptance limits for one sided tolerance (e.g., position and form) it is not enough to simply
evaluate this value as it will lose any information regarding orientation of the deviation.
The figure below shows a position of a hole and the error as defined in ISO1101 definition for the calculation of
the true position error. The results of the hole measurement from measurement system 1 and 2 are shown on
the figure below. The true position error has the same magnitude, but the vector of the error has significantly
changed showing the measurement system has a significant repeatability error. The actual position of the hole
center differs greatly between the first and second measurement.
One way to deal with this topic is to compare the x- and y-deviation of the measurement from each measurement
system instead of the position error (two characteristics for each hole instead of one).
40
RM13003 - Measurement Systems Analysis
Example:
The tolerance needs to be cut in half when analyzing x- and y-deviations as the position error is doubled for the
true position.
Attribute Data
Attribute data should only be analyzed where the operator must make observational judgement to decide if a
component is conforming or non-conforming. Attribute agreement is not to be used where measurement data
is provided directly by the measurement system.
Where attributes are used to assess feature acceptability (e.g., Is each part the same color yellow, or is there a
visible mismatch between two machined surfaces) the people making these decisions need to be tested to see
if they are making the same decisions of pass/fail criteria.
NOTE: Sample selection is especially important for attribute data. Performing an attribute study only on parts
which are well in tolerance or clearly out of tolerance will lead to misleading assumptions regarding the
capability of the inspection process. This is mainly since it is more difficult to evaluate an attributive
characteristic which is marginally in or out of tolerance. However, the aim of an MSA for attribute data
is to evaluate exactly these borderline characteristics. The sample should consist of equal parts
acceptable and unacceptable parts and should contain characteristic errors to test for.
Typically, there are two methods used to assess attribute data: The Attribute Agreement Analysis and the Kappa
Analysis.
Attribute Agreement Analysis (also known as Pass/Fail Study/Agreement between Assessors (AbA))
• Do assessors (people) make the correct decision when evaluating a feature’s acceptability?
This test verifies if the assessors can get the correct answer for a sample with a known value (i.e., either
pass or fail).
• Are they consistent in making that decision? (i.e., if they did the evaluation again, would they give the same
answer?)
This test verifies if the assessors give the same answer for each sample each time.
• If there are multiple assessors, do they all make the same decision?
This test verifies if all the assessors give the same answer.
Results can be expressed as the number of correct/incorrect assessments and as the percentage (%) value.
41
RM13003 - Measurement Systems Analysis
In Figure 26, the “Within Appraisers” graph is used to assess criteria ii and the “Appraiser versus Standard”
graph is used to assess criteria i.
Looking at the results of the “Within Appraisers” graph in Figure 26, Appraisers B, C, and H have consistency
scores greater than 90. This means on average across the different samples’ asses these Appraisers (B, C, H)
agreed with their own assessment just over 90% of the time.
At the low end of the consistency E and G appraisers have consistencies ~65%. This means Appraisers E and
G only agree with their answers/assessments 65% of the time on average.
The 95% Confidence Interval (CI) on each average is given as well. This shows values where 95% of the time
the consistency is expected in future measurements.
Figure 26 “Appraiser versus Standard” graph, compares how well the appraisers do at getting the same answer
as the standard (or known value). The range of means is from 40% with Appraiser D to 60% for Appraisers B
and D.
Against the standard, the best correlation is 60% which means that 40% of the time the appraisers give the
wrong answer.
AS13100 introduces two different acceptance limits for the %Agreement in the Attribute Agreement Analysis.
These consider the fact that it is more critical to rate a bad part as good (false pass) than rating a good part as
bad (false fail).
By defining a lower acceptance limit for false fails the inspection process can be designed so that more false
fail decisions are made in favor of fewer false pass decisions. Thus, additional security is built into the inspection
process.
42
RM13003 - Measurement Systems Analysis
Kappa ≥ 0.8
Attribute study: pass/fail Test False fail False pass
The following example shows how to determine the individual characteristic values (Appraiser Consistency,
Appraiser to Appraiser, Appraiser to Standard).
Two inspectors (A, B) assess in two trials whether a part is acceptable (response = 1) or unacceptable (response
= 0). The study uses 20 parts (sample size of n = 20). For each sample there is a known “true” result, specified
by an expert.
The first inspection result from inspector A is called A1, the second A2 (similarly for B → B1, B2). By using these
abbreviations, the acceptance limits are calculated with following formulas:
𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 𝑜𝑜𝑜𝑜 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒 (𝐴𝐴1 = 𝐴𝐴2) 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 𝑜𝑜𝑜𝑜 𝑝𝑝𝑝𝑝𝑝𝑝𝑡𝑡𝑠𝑠 𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒 (𝐵𝐵1 = 𝐵𝐵2)
%𝐴𝐴 = ; %𝐵𝐵 =
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑜𝑜𝑜𝑜 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑜𝑜𝑜𝑜 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 𝑜𝑜𝑜𝑜 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒 (𝐴𝐴1 = 𝐴𝐴2 = 𝑆𝑆) 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 𝑜𝑜𝑜𝑜 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒 (𝐵𝐵1 = 𝐵𝐵2 = 𝑆𝑆)
%𝐴𝐴𝐴𝐴 = ; %𝐵𝐵𝐵𝐵 =
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑜𝑜𝑜𝑜 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑜𝑜𝑜𝑜 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
To compare results of inspectors and to analyze the results it is common practice to calculate 95% confidence
intervals for these percentages. There are different approaches to calculate confidence intervals for binomial
proportions.
It is recommended to use the Clopper-Pearson exact confidence interval which is used by most commercial
software packages.
• %A, %B, %AB, %AS, %BS must be >95% (Note that “Appraiser to Standard” is always the lowest
percentage)
However, these calculations do not consider a false pass or false fail. As mentioned, before this now has the
possibility to check for false fail/pass separately. The figure below shows the two errors which can happen in
the assessment. “True Value” is the result predetermined by for example, an expert/quality engineer for a part,
“Response” is what the inspector is saying.
43
RM13003 - Measurement Systems Analysis
𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒 (𝐴𝐴1=𝐴𝐴2=𝑆𝑆) + 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 𝑜𝑜𝑜𝑜 𝑆𝑆=1 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒 (𝐴𝐴1=𝐴𝐴2≠𝑆𝑆)
False Pass (>95%): %𝐴𝐴𝐴𝐴 = (same for %BS)
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑜𝑜𝑜𝑜 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒 (𝐴𝐴1=𝐴𝐴2=𝑆𝑆) + 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 𝑜𝑜𝑜𝑜 𝑆𝑆=0 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒 (𝐴𝐴1=𝐴𝐴2≠𝑆𝑆)
False Fail (>75%): %𝐴𝐴𝐴𝐴 = (same for %BS)
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑜𝑜𝑜𝑜 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
Example:
In our sample of 20 parts were 10 acceptable and 10 unacceptable parts. In total there were four disagreements
for inspector A.
16 parts correct,
16+1
• Considering only False Fail: %𝐴𝐴𝐴𝐴 = = 85% ≥ 75% 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙
20
16+3
• Considering only False Pass: %𝐴𝐴𝐴𝐴 = = 95% ≥ 95% 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙
20
Kappa Analysis
The Kappa Analysis is based upon the same data as the Attribute Agreement Analysis. What differs is the
calculation of the acceptance limit.
Kappa is a measure for assessing the reliability of agreement between a fixed number of assessors.
Comparable to the Attribute Agreement Analysis Kappa can be calculated for:
• Inspector A (first trial) against inspector A (second trial) (same for inspector B)
The reason to use the Kappa analysis over the Attribute Agreement Analysis is that the Kappa value is generally
thought to be more robust than a simple percent agreement calculation. When calculating Kappa, consider the
possibility of the agreement between assessors occurring by chance which means that the sample selection
(number of good/bad parts) greatly influences the calculations.
44
RM13003 - Measurement Systems Analysis
NOTE: As Kappa is influenced by the distribution it is seldom comparable across different studies. Check
beforehand whether Kappa gives more insight in the study results than a percentage agreement
analysis. Keep in mind that the calculation of Kappa only considers whether the part is “good” or “bad.”
However as mentioned before it is more important to select marginally good/bad parts for the study
than just selecting enough clearly bad parts.
Depending on the number of assessors, either calculate Cohen’s Kappa (2 assessors only) or Fleiss’ Kappa
(>2 assessors). It is most common to calculate Cohen’s Kappa between the most and least trained inspector.
If Kappa = 1, then there is perfect agreement. If Kappa = 0, the agreement is the same as would be expected
by chance. The stronger the agreement, the higher the value of Kappa. Negative values occur when agreement
is weaker than expected by chance, but this rarely happens. Depending on the application, a Kappa value less
than 0.7 indicates that the measurement system needs improvement against the target value (AS13100 MSA
acceptance Limits) of 0.8. Kappa values greater than 0.9 are considered excellent.
Case Study - MSA for Attribute Data Using the Kappa Analysis
SITUATION: An assembly shop had a high number of escapes due to lockwire issues. No matter how much
training they conducted, escapes continued to be a problem. The engineer responsible for the product decided
to run an MSA study, since it was attribute data, a Kappa evaluation was completed to see if the operators
installing the lockwire, and the inspectors inspecting the work were able to recognize the difference between
acceptable and unacceptable. Twenty samples were collected, 10 acceptable and 10 unacceptable. The
engineer suspected that the problem was not with the obvious conditions but with those on the edge of the
acceptance criteria. So, within each group they selected samples where one was an obvious example of that
group while the remaining nine were just on the edge of the acceptance criteria. The engineer selected two
operators who installed the lockwire and two inspectors who final inspected the parts as evaluators for the study.
To maintain an independent evaluation, each evaluator was only allowed to view one part at a time while the
balance of the samples was out of view. Each evaluator performed their evaluation out of sight from the other
evaluators and the engineer performing the evaluation randomized the order of sample presentation.
Following figure shows the data which was collected by the engineers.
Base line Actual run for Evaluator A (randomized) Evaluator A Evaluator B Evaluator C
Sample Truth Evaluator Sample Insp. 1 Insp. 2 Sorted Insp. 1 Insp. 2 Sorted Insp. 1 Insp. 2 Sorted Insp 1 Insp. 2
1 Good A 20 Bad Bad 1 Good Good 1 Good Good 1 Good Good
2 Good A 3 Good Good 2 Good Good 2 Bad Good 2 Good Bad
3 Good A 1 Good Good 3 Good Good 3 Good Good 3 Good Good
4 Good A 6 Good Good 4 Good Bad 4 Good Bad 4 Good Good
5 Good A 18 Bad Bad 5 Bad Bad 5 Good Good 5 Bad Bad
6 Good A 9 Good Good 6 Good Good 6 Bad Good 6 Good Good
7 Good A 12 Bad Good 7 Good Good 7 Good Good 7 Good Good
8 Good A 14 Bad Bad 8 Bad Good 8 Good Bad 8 Good Bad
9 Good A 15 Bad Bad 9 Good Good 9 Good Good 9 Good Good
10 Good A 4 Good Bad 10 Good Bad 10 Good Good 10 Bad Good
11 Bad A 11 Bad Bad 11 Bad Bad 11 Bad Bad 11 Bad Bad
12 Bad A 7 Good Good 12 Bad Good 12 Good Bad 12 Bad Bad
13 Bad A 17 Good Bad 13 Good Good 13 Bad Bad 13 Good Good
14 Bad A 13 Good Good 14 Bad Bad 14 Bad Good 14 Bad Bad
15 Bad A 10 Good Bad 15 Bad Bad 15 Good Bad 15 Bad Bad
16 Bad A 16 Bad Bad 16 Bad Bad 16 Bad Bad 16 Bad Bad
17 Bad A 8 Bad Good 17 Good Bad 17 Bad Good 17 Good Good
18 Bad A 5 Bad Bad 18 Bad Bad 18 Bad Bad 18 Bad Bad
19 Bad A 19 Bad Good 19 Bad Good 19 Good Bad 19 Bad Bad
20 Bad A 2 Good Good 20 Bad Bad 20 Bad Bad 20 Bad Bad
45
RM13003 - Measurement Systems Analysis
After each evaluator had inspected every sample twice the engineer analyzed the results. First, they analyzed
the agreement within each evaluator to themselves. To do this the data is first structured and then the Kappa
values for the different comparisons are calculated. The basic calculations for Cohen’s Kappa are shown below.
For further details (especially regarding Fleiss’ Kappa).
Evaluator A Insp. 1 to B Insp. 1 Evaluator A Insp. 1 to C Insp. 1 Evaluator B Insp. 1 to C Insp. 1 C Insp. 1 vs Truth
Sorted Insp. 1 Insp. 2 Sorted Insp. 1 Insp. 2 Sorted Insp. 1 Insp. 2 Sorted Insp. 1 Truth
1 Good Good 1 Good Good 1 Good Good 1 Good Good
2 Good Bad 2 Good Good 2 Bad Good 2 Good Good
3 Good Good 3 Good Good 3 Good Good 3 Good Good
4 Good Good 4 Good Good 4 Good Good 4 Good Good
5 Bad Good 5 Bad Bad 5 Good Bad 5 Bad Good
6 Good Bad 6 Good Good 6 Bad Good 6 Good Good
7 Good Good 7 Good Good 7 Good Good 7 Good Good
8 Bad Good 8 Bad Good 8 Good Good 8 Good Good
9 Good Good 9 Good Good 9 Good Good 9 Good Good
As a next step the contingency tables are created as shown in Figure 31, the first set of contingency tables (left
column) is based on each Evaluator, in this case A, B, and C comparing the results of their first and second
inspection results. Then the probability of occurrence to get to the Kappa (right column).
46
RM13003 - Measurement Systems Analysis
As a next step they analyzed the agreement between evaluators using the first inspection from each evaluator.
As expected, the obvious samples (sample 1 and 20) were evaluated correctly every time. The differences were
on the parts marginally acceptable or unacceptable.
The comparison between Evaluator A and C (second to top box in Figure 32) appear to show a capable system
however, based on Evaluator A’s results within themselves (top left box of Figure 31) it is easy to discount this
level of agreement was by chance not actual agreement.
Evaluator C was the closest to an acceptable Kappa score when measured within themselves (Figure 31).
However, the engineer noticed that two of the responses indicated the acceptance of bad product as good for
both evaluations.
The engineer decided to compare evaluator C against the truth. The results showed a poor Kappa score of
K=0.6 shown in the bottom box of Figure 32.
Results:
From the results of the Kappa analysis showed the measurement system needs improvement. The engineer
determined that the training provided was not enough to ensure conforming product was built and shipped. They
developed a reference board that demonstrated the subtle difference between acceptable and unacceptable of
various possible conditions. All the operators and inspectors were re-trained using the reference board which
was also in full view at the point of inspection.
The engineer re-ran the Kappa analysis allowing the evaluators to use the reference boards. With the retraining
and use of the reference boards all the Kappa scores were acceptable with a few showing perfect agreement
(score 1.0).
Even after the engineer had implemented the improved process, they continued to have escapes until they
provided their customer a reference board and training. It turned out that the customer’s inspectors also required
calibration.
47
RM13003 - Measurement Systems Analysis
Training is not always enough to drive consistency in subjective measurement systems. Just because an
evaluator is consistent does not mean their analysis is correct or their interpretation will not drift. The evaluator
must utilize all the information collected to make the right decisions on how to move forward. Employing
reference samples at the point of inspection is key to long term consistency
Attribute data is discrete data, meaning that it is limited to pass/fail, go/no go, conforming/non-conforming, or a
count of defects or defective. Ordinal data falls in the same category. This is data with natural ordering, where
something is rated for example on a scale of 1 to 5 or disagree, neutral and agree. An example of ordinal data
could be rating of the quality of a cable tie, where the quality is rated on a point scale of 1 to 5.
This kind of study assesses the consistency and correctness of ratings by appraisers to the standard as well as
compared with other appraisers. A study involves several parts of which half are known to be conforming and
half are known to be non-conforming. These form the baseline for the study.
It is important to choose a representative sample of appraisers who are trained and experienced in the
inspection task. Other aspects that could affect the outcome must also be considered when organizing a
study. The environment in which the study is undertaken must be representative of the normal working
environment, however such variables as lighting, cleanliness and noise may have an influence on results.
Samples are rated twice in random order to prevent the results being influenced by the appraiser remembering
how they rated the same sample during the first run.
ICC is the Intraclass Correlation Coefficient, and it describes the reproducibility of the numerical score made by
different appraisers assessing the same parts. It compares several different scenarios and uses the sum of
squares to calculate the coefficient. Minitab® statistical software is an easy way to calculate the ICC using
Attribute Agreement Analysis.
Laboratory technicians use a common test method to estimate the microstructural grain size in a low alloy steel
used for shafts. The test method requires grain size to be estimated using graded images from 1 to 10 displayed
on a wall chart for comparison purposes. The technicians view the grains using an optical microscope, commit
the image to memory, look up from the microscope and compare their mental image with the graded images on
the wall chart.
The Quality Engineer wants to assess the consistency and correctness of the technician’s responses and asks
four appraisers to rate the grain size of 50 samples of the same alloy twice, in random order and the results
loaded to a table of which a sample is shown below:
48
RM13003 - Measurement Systems Analysis
Within Appraisers
Assessment Agreement
Appraiser # Inspected # Matched Percent 95% CI
Amanda 50 50 100.00 (94.18,
100.00)
Britt 50 48 96.00 (86.29,
99.51)
Eric 50 43 86.00 (73.26,
94.18)
Mike 50 45 90.00 (78.19,
96.67)
49
RM13003 - Measurement Systems Analysis
Results:
From the study there are many more statistics and correlations that can be completed. For this summary case
study, it can be seen from the graph and statistics above that Amanda has 100% correlation (the spread of
classifications is low for each sample tested).
It can also be seen that Eric and Mike have a wider spread of results, showing that they have often given
differing classification for each sample viewed.
Where the measurement/inspection system has failed to reach acceptable minimum levels (as described in
AS13100 section 7.1.5 Table 4), the measurement system may still be judged as fit-for-purpose if the relevant
customer technical authorities agree. Risk to producer and risk to customer are both to be considered prior to
such judgment. Specific factors that must be considered align with typical FMEA “Risk Priority Number”
evaluations:
• It is to be noted that studies of Manufacturing Process Capability are, by their nature, confounded with
measurement error. Some processes with observed low capability are indeed excellent manufacturing
processes followed by very large measurement variability. It may be possible to estimate the “true” process
capability after studying the measurement errors by methods described in this document.
In addition to the above, IMPROVEMENT COST/TIME are also considerations that add to the risk evaluation.
If the process must deliver product, and an improved gage has a long lead time, a mitigation strategy may be
put in place until the new gage is delivered and verified to be better than the existing gage. Additional cost may
be calculated by estimating an increased reject rate under risk mitigation, until such time that the improvement
is in place.
To judge if the measurement system is acceptable, the details of the measurement process, the measurement
system analysis study details and the results achieved should be submitted to the customer for approval or in
line with other customer requirements.
50
RM13003 - Measurement Systems Analysis
NOTES:
1. It is expected that the organization will make every reasonable effort to improve the measurement system
capability before submission to the customer.
2. The organization can use historical measurement capability results to demonstrate that a customer’s
specification cannot be achieved and negotiate an improved tolerance, or increased acceptance limits from
AS13100 MSA acceptance limits. This should be on a feature-by-feature basis.
Looking at the whole MSA study as a balance of multiple factors is encouraged. If, for example all factors tested
in the AS13100 “MSA Acceptance Criteria table” were at maximum limits, this measurement system should be
strongly discouraged from continued use even though the acceptance limits are fulfilled.
The analyzed parameters (e.g., Bias, Repeatability, Reproducibility) always appear as a combined error in real
production line measurements. There are different methods to deal with this topic, which are presented in the
following two case studies.
In many cases the most significant uncertainties derive from repeatability and bias of a measurement system.
A capability study to analyze these influences as a whole is called “Type 1 Gage Study.” In this study two
parameters - 𝐶𝐶𝑔𝑔 and 𝐶𝐶𝑔𝑔𝑔𝑔 - are calculated. These sound familiar to the 𝐶𝐶𝑝𝑝 and 𝐶𝐶𝑝𝑝𝑝𝑝 metric of a SPC as they are a
comparable metric for just the measurement process.
• 𝐶𝐶𝑔𝑔 → A measure of the repeatability in relation to the characteristics tolerance. Values greater than 1.33
indicate that the spread of the measurements is narrow enough. There are different possibilities to calculate
𝐶𝐶𝑔𝑔 . Following method is recommended:
0,2 ⋅ 𝑇𝑇
• 𝐶𝐶𝑔𝑔 = 𝑤𝑤𝑤𝑤𝑤𝑤ℎ 𝑇𝑇 = 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇, 𝑠𝑠 = 𝑆𝑆𝑆𝑆𝑆𝑆𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
4 ⋅ 𝑠𝑠
• 𝐶𝐶𝑔𝑔𝑔𝑔 → When the measurement does not have any systematic error (bias = 0) 𝐶𝐶𝑔𝑔𝑔𝑔 = 𝐶𝐶𝑔𝑔 . 𝐶𝐶𝑔𝑔𝑔𝑔 considers the
difference between the measurements and a reference value. The bigger the difference the smaller 𝐶𝐶𝑔𝑔𝑔𝑔
becomes. A benchmark value for an accurate and precise measurement system is 𝐶𝐶𝑔𝑔𝑔𝑔 > 1.33. There are
different possibilities to calculate 𝐶𝐶𝑔𝑔 . Following method is recommended:
0,1 ⋅ 𝑇𝑇 − �𝑋𝑋�𝑔𝑔−𝑋𝑋𝑚𝑚 �
• 𝐶𝐶𝑔𝑔 = 𝑤𝑤𝑤𝑤𝑤𝑤ℎ 𝑇𝑇 = 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇, 𝑠𝑠 = 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑, Xg = Mean of the measurements, Xm =
2 ⋅ 𝑠𝑠
Reference measurement
In general, a Type 1 study is done prior to a Gage R&R analysis as a way to assess the effects of bias and
repeatability on measurements from one operator and one reference part. It can be seen as a first test for a
measurement system to evaluate whether a gage is generally suitable for the intended task. Some variations
(e.g., different parts and inspectors) are excluded to focus only on the gage not on any other sources of variation.
Risk mitigation for poor measurement processes may be accomplished by several different methods. Any
method used should be documented and should be with agreement of the customer. Special attention is to be
paid to the criticality of the part feature being measured to prevent non-conforming material from being
erroneously accepted as conforming.
It is always important to find the root cause of the poor results and then find ways to improve the measurement
system. A culture of continuous improvement in accordance with AS13100 is key with measurement and product
validation. Only use risk mitigation measures as a last resort as these build rework loops into the inspection
process.
51
RM13003 - Measurement Systems Analysis
Use when:
• Bias, Calibration, and Correlation exhibit a Correlation coefficient between measurement methods of
R2>0.65, or similar minimum threshold value. Lower values of R2 may be simply a case of variation error.
• Gage R&R study shows small difference in Reproducibility, and large difference in Repeatability.
• Multiple measurements can be taken over a short period, thus reducing possibility of stability errors.
Methodology: Measurements are taken a number of times, and averaged, in order to estimate the true value
demonstrated through a simple average or 95% Cis on the average. This may be done with a simple (sum of
measurements)/(number of measurements) formula, or, in a more critical application, may be used with
confidence intervals.
Five measurement readings were taken on seven different parts ‘SN.’ The process included the complete
removal of part from gage, and a re-set of zero on the digital indicator. The feature is not a critical characteristic,
so a simple mean was agreed upon as the final measured result. All five readings are recorded, and the average
reading is used to accept or reject the part.
In the simple case, it was decided to average five readings on the one gage. The tolerance limits shown
(horizontal dashed lines at 3.158 and 3.162) are superimposed on the plot for clarity. The large gray dot is the
average (mean) value of the five readings. In this case, parts SN1, SN4, and SN7 would be rejected as under
minimum limit.
Individual Value Plot of SN1, SN2, SN3, SN4, SN5, SN6, SN7
3.164
3.163
3.162 3.162
3.161
3.160
Data
3.159
3.158 3.158
3.157
3.156
3.155
SN1 SN2 SN3 SN4 SN5 SN6 SN7
52
RM13003 - Measurement Systems Analysis
Confidence interval example: In a high risk application, such as a critical or major feature, it is possible to
calculate a Confidence Interval on the mean. In the example below, a 95% Confidence Interval was calculated
for the mean of five readings, taken from the example above. In this case, parts SN1, SN3, SN4, and SN7 would
be rejected. Compared to the example above, this is a more conservative approach.
Individual Value Plot of SN1, SN2, SN3, SN4, SN5, SN6, SN7
95% CI for the Mean
3.164
3.163
3.162 3.162
3.161
3.160
Data
3.159
3.158 3.158
3.157
3.156
3.155
SN1 SN2 SN3 SN4 SN5 SN6 SN7
Where the MSA studies show an intermittent problem with variability, it may be useful to consider the range of
three or more individual measurements of the same feature. Consider the example below. Two parts exhibited
unusually high variability by both operators during the Gage R&R Study. There may be some unusual feature
on this part that creates an unstable measurement. This led to the Repeatability Mitigation review. Five readings
were taken on nine different serial numbers (SNs) these readings are shown in Figure 38. The analysis is then
shown in Figure 37 and Figure 38. Looking at the results of SN 117, SN218, SN266, SN275 and SN 389 are
rejected as they range of the values measured for each part has at least one measurement outside the
tolerances.
As in the above examples, part and feature criticality should play a role in deciding how to proceed. In this case,
the “typical” parts show a range that does not exceed 50% of the tolerance band.
Two rules are used to evaluate parts based on the average and range readings. Both conditions must be TRUE
to accept parts.
1. (Maximum Reading) minus (Minimum Reading) must be less than 50% of Tolerance
AND
In this case, serial numbers 117, 218, 266, 275, and 389 would be rejected for one of the rules.
53
RM13003 - Measurement Systems Analysis
This case study gives an example of two parts exhibiting high variation. It is always important to establish the
root cause and eliminate this. Mitigations such as averaging always induce extra effort to overcome the initial
issue. It is always better to fix the root cause and rerun the study using different parts.
5.14.6.1 Guard banding is a process of applying additional control limits to design limits. These tighter limits
would serve to protect the customer from any risk of receiving undetected nonconformance. The
Guard bands are typically sized by a function of the measurement error. There are several ways to
calculate guard bands. For example, a gage error of 22% could be mitigated by reducing the top and
bottom limits of the feature by 22%. If all components are within the new guard band limits, the
measurement error is benign.
54
RM13003 - Measurement Systems Analysis
• Variability (Gage R&R) error is known to be non-normally distributed. For example, a tolerance with a
physical limit of zero often has a skewed distribution more appropriately described with Weibull,
Exponential, min extreme and max extreme distribution.
• Correlation/Bias is not understood, or not quantified. Caution is to be used if two measurement methods
disagree, and the standard is not known. Either or both could be in accurate.
Use when:
• Gage R&R Error is not skewed, normally distributed, or is close to fitting a normal distribution.
• The production measurement method cannot be easily repeated for averaging, as shown above.
It is often difficult to implement guard bands safely if there is a large gage error or the manufacturing process
has a lot of variation. A large gage error would affect the capability analysis of the process and would lead to
bigger guard bands.
Improving the measurement system should be the first action. Using guard banding as an alternative solution
to get around a poor MSA result will add additional time, cost and introduces a rework loop. Always ensure this
is part of a structured problem solving solution and is regularly updated and checked.
Methodology:
Correlation to a known standard (or a bias study), and a variable Gage R&R Study can be used to calculate the
guard band limits.
3. Define the method of measurement validation for parts that fall in the guarded zone. Typically, validation
would mean pulling the part out of the production line and sending it for more detailed inspection by a more
precise (often slower and more expensive method).
• It is to be noted that a capable process, targeted to the middle of the tolerance band, will have very few
parts fall into the guarded zone.
• Bias Guard Band size. It must be understood if the production method measures higher than, or lower than,
the standard. Depending on which way the bias is exhibited, install the guard band on either the upper or
lower limit per this table:
55
RM13003 - Measurement Systems Analysis
BIAS GUARD BAND FEATURE CRITICALITY LOWER TOLERANCE LIMIT UPPER TOLERANCE LIMIT
MINOR Guard Band with MEAN bias use upper tolerance limit
MINOR use lower tolerance limit Guard Band with MEAN bias
Variation Guard Band Size: Variation guard bands are usually applied on both upper and lower tolerance
limits. In this case, the size of the guard band is set by the variability estimated in a Gage R&R or similar study,
shown in Figure 40.
LOWER UPPER
TOLERANCE TOLERANCE
FEATURE CRITICALITY LIMIT LIMIT
Stability Guard Band Size: If it is known that the gage has problems holding long term stability, a guard band
based on experience with the direction and magnitude of drift (e.g., bias due to datum point wear) or additional
variation (e.g., gradual loosening of mechanical joints) may be calculated and applied.
56
RM13003 - Measurement Systems Analysis
In the example data above, the tolerance limits are 3.122 to 3.130, and this is a MAJOR feature so will use
1.282 SD from the guard band size chart above. A variable guard band is calculated as:
Tests for bias showed the difference to the standard to be less than 5% of tolerance so no guard band is required
for bias.
The Guard Band Limits (green zone) are (3.122 + 0.000833) on the lower limit, and (3.130 - 0.000833) on the
upper limit. Rounding to four decimal places, in agreement with the gage readout.
57
RM13003 - Measurement Systems Analysis
Guard Bands and Inspection Rules must be implemented to correctly control part sentencing. An example is:
RULE 3 If all VALIDATED measurements are within GREEN or YELLOW zones, part
is ACCEPTED
RULE 4 If ANY VALIDATED measurements are within RED zone (out of tolerance),
part is REJECTED
5.14.7.1 Over-inspection may be recommended when multiple inspectors’ have similarly acceptable results
from a Gage R&R Study, and one inspector is somewhat less repeatable or has bias in his/her
measurements.
Similar to Guard Banding above, rules should be set on the inspection based on studies performed. Examples
of possible rules from the CASE Study above may be:
• For Inspectors who scored above 30% repeatability on the Gage R&R Study, all parts measuring within the
upper or lower 30% of the tolerance must be over-inspected by a different inspector.
Or
If one of the inspectors had less repeatable results, try to find out why, resolve the issue or remove them from
inspection, retrain and rerun the study when it was believed they were competent. This mitigation can aid short
term, but adds additional cost and contradicts the culture of continuous improvement outlined in AS13000
Inspector Qualification
It may be advantageous to use the results of an MSA Test as an inspector qualification (authorization to accept
or reject parts). A particularly technique-sensitive gage will often show high Reproducibility variation and may
show a difference between inspectors with respect to Repeatability.
For example, an inspector may be not qualified to use Gage X1234 if their performance causes the Gage R&R
study to fail 20% of Tolerance against two other inspectors. As the unqualified inspector practices and learns
the proper techniques, he or she may re-test, and the new test data may replace his or her old data in the Gage
R&R study. If the overall study passes the 20% mark, the inspector earns qualification on this gage. Also, all
three qualified inspectors are required to re-test the Gage R&R study on an annual basis.
58
RM13003 - Measurement Systems Analysis
Checking a known artifact (often referred to as a master) can help keep track of drift or instability over time.
Care must be taken to ensure the master is not abused, is not allowed to accumulate a layer of contaminants
like dirt or oxidation or is not allowed to wear. A simple graphical “beginning of shift” or “beginning of lot” master
plot like the example shown below should have limits that are established, taking into account typical (allowable)
variation due to the Gage R&R errors, thermal errors, etc. This is often used on automated measurement
systems to confirm the equipment has not drifted between yearly calibrations.
The corrective action for an Out-of-Limits Master measurement is often either maintenance, zero adjustment or
re-calibration of the equipment.
Although there is no specific rule for control of variation due to thermal shrinkage or growth, except the
calibration reference temperature of 20 °C or 68 °F in various regulations and procedures, thermal effects on
dimensions are often a large source of error.
Each environmental condition needs to be assessed and/or measured. For example, a temperature study of the
machine shop over a full year to understand the temperature variation that is seen in winter or summer periods
where measurement is conducted.
To put the potential effects of such errors into perspective, we use the formula:
A rough rule of thumb used by some organizations is to use 5% of tolerance as a level of significance for thermal
growth.
The formula will determine that large parts will have greater thermal growth, as will parts or gages made from
materials with larger coefficients of thermal expansion (Aluminum alloys are approximately twice as
dimensionally sensitive to changes as cast iron, for example).
Attention must be paid to dissimilar materials. The gage and part may both have been stabilized at 5 degrees
hotter than ambient, but the thermal growths of the materials they are made from will make a difference in
measured value.
It is not just the set point temperature that is important but the rate of change of temperature in the room. When
conducting measurement studies, the components and measurement system can be affected by the rate of
change of temperature. Most measurement equipment will operate at any temperature, and the bias caused by
that temperature will remain constant, if the temperature is constant. If the room temperature varies when the
door is opened or the air conditioning comes on, then the measurement results will not be predictable until the
room, measurement equipment and part have re-stabilized.
59
RM13003 - Measurement Systems Analysis
• Tight tolerances on large components - the effects of temperature are magnified and with a small tolerance,
the thermal effects can cause a gage to fail.
• Lightweight parts, with low thermal mass. These parts may absorb or expel heat more quickly than heavy
parts, and may be affected by hand temperature, a cold granite slab, or even the warm breath of an operator.
• Heavy parts with large thermal mass. Heavy or dense parts hold heat for a long time and may still be warmer
than the measurement room’s for many hours. “Soaking” the part in the measurement environment for a full
working shift or longer may be required.
• Temperature of measurement equipment. Gages that are stored on top of a warm milling machine cabinet,
or in a cold locker next to a shipping well door will give a biased reading. The extend of the bias due to
temperature should be considered.
• Time required to measure the part. Excessively long CMM Programs, for example, may be affected by the
part continuing to cool during a 4-hour measurement cycle.
The temperature of parts must be allowed to stabilize before they are measured, and stabilization time built into
the production process.
A 40-inch diameter part with a diametral tolerance of 0.020 inch, is made of a Nickel alloy, and a lightweight OD
gage was made with a long shaft made of a PVC Tube. If measured at 75 °F, both materials are 7 °F from
standard temperature of 68 °F. A quick table look up for the Nickel alloy tells us the coefficient of thermal
expansion is 7.2 x 10-6 in/in/°F, and for this brand of PVC the coefficient is approximately 30 x 10-6 in/in/°F.
Growth of DIA Nickel = (40 inches) x (7 °F) x (7.2 x 10-6 in/in/°F) = approximately 0.002 inch
Growth of DIA PVC = (40 inches) x (7 °F) x (30 x 10-6 in/in/°F) = approximately 0.0084 inch
The difference in the length of the gage and the diameter of the part is approximately 0.0064 inch on the
diameter. To put this into perspective, divide by the tolerance of 0.020 inch.
0.0064 ÷ 0.020 = 32% of Tolerance. This is far above the 5% rule of thumb, so the Process Engineer should
consider this a significant source of error.
To combat such errors, different methods may be used, with varying levels of precision and cost:
• Select more compatible materials for gage and fixture design, with coefficients of thermal expansion that
are much closer to each other.
• Install air conditioning to keep temperatures at 68 °F ± 1 °F or soak the gage and part at a fixed temperature
before measurement.
• Carefully measure temperatures of parts and mathematically compensate for direct readings. Simple
devices may be attached to the gage to help estimate thermal corrections.
• Create a standard table for use with the gage and a room thermometer, which can be programmed into a
digital readout.
• Use an artefact which is calibrated at a size at a known temperature. Use this calibrated size at the working
temperature, so the measurement becomes a comparison between the calibrated artifact and the part being
manufactured.
60
RM13003 - Measurement Systems Analysis
Where temperature is an issue, ensure it is included in any MSA study as shown below:
A 600 mm diameter part with a diametral tolerance of ±0.1 mm, is made of an Inconel 625, and a comparator
OD gage was made from Polycarbonate. If measured at 26 °C, both materials are 6 °C from standard
temperature of 20 °C. A quick table look up for the Inconel 625 tells us the coefficient of thermal expansion is
12.6 x 10-6 (°C)1, and the Polycarbonate coefficient is approximately 70.2 x 10-6 (°C)-1.
Therefore, the growth due to the ambient room temperature of 26 °C is calculated by:
Growth of the part = (600 mm) x (6 °C) x (12.6 x10-6 mm / mm°C) = approximately 0.0756 mm
Growth of the gage = (600 mm) x (6 °C) x (70.2 x10-6 mm / mm°C) = approximately 0.0004212 mm
The difference between these two expansion rates (the length of the gage and the work piece diameter) is
approximately 0.0752 mm. To put this into perspective, if thermal compensation were not used this difference
in expansion would consume 37.6% of the tolerance.
This is far above the 5% rule of thumb, so the Process Engineer should consider this a significant source of
error.
The environment in which a measurement system is used will impact measurement capability. Examples of
measurement variation due to environmental factors include:
• Thermal expansion of the component and measurement system (see section above for detailed
discussion).
• Expansion of the part due to humidity - especially true for natural materials and potentially true for fibrous
material that could wick or store fluids.
• Seismic and sonic vibrations from nearby operations on large scale measurement.
• Assessment of color or surface finish in varying light conditions (florescent lighting, tungsten lighting, day
light, etc.). This may be especially true for attribute evaluation of visuals.
• Dust and dirt or oil contamination of measurement equipment or component surface. Obviously, a dirty part
or probe will add error by the thickness of the debris on the part. Preventive maintenance and daily cleaning
are very important to all measurement equipment.
61
RM13003 - Measurement Systems Analysis
• Optical devices may be sensitive to particulates in the air (especially dust), and the optics themselves may
be subject to errors if debris or filmy materials are deposited on lenses. This is specifically noticeable with
Laser measurement systems.
• Optical devices may also be sensitive to movement of air itself. Keep in mind that air refracts light and air
at different temperatures may have different densities.
• Vibration may affect sensitive equipment. Mounting highly sensitive equipment on air tables or shock
isolating mounts may help. Vibration may not be detectable by a human standing near the equipment. Trains
behind the factory, elevators near an upstairs laboratory, and forklifts delivering materials have all been
known to adversely influence measurements.
Unless otherwise specified, dimensions shown on a part drawing are intended to be measured in the free-state
unless free state measurement is excluded by part flexibility, in which case it must be fixtured. Constraint of the
part is generally not allowed, apart from securing it from moving, and supporting it against datum features.
Where fixtures are allowable to secure flexible components for measurement, the fixtures should be part of the
measurement capability study. If there are multiple fixtures, variation in the fixtures may cause measurement
variation and it is expected that this will be evaluated as part of the study.
Where fixtures affect the form or size of the component for measurement, it is expected that the fixtures will be
under calibration control. This will ensure that any wear, damage, or movement of the fixtures is maintained
within acceptable limits.
If the component features are flexible, the measurement result will have variation due to both measurement
process and part movement. It is important to recognize both sources of measurement variation and ensure the
component is correctly sentenced. For the purpose of measurement with fixtures, trials can be limited to study
repeatability and then establish measurement system bias. As an example, measuring the component three
(3) times with minimum part movement to establish a base line measurement repeatability figure (the range of
measured results between the lowest measured result and the highest measured result).
Remove the component from the fixture or measurement system and reset the component to mimic a new
component set-up. Re-measure the component and repeat the process a further three times.
• The first group of runs, where the component was not removed from the fixture will estimate the
measurement system repeatability.
• The second group of runs where the part was reset in the fixture will estimate the repeatability through part
movement but will also include the measurement repeatability.
• Studying all the runs together will estimate the reproducibility of the full measurement process.
NOTE: This is not a statistical solution and is only used to indicate the source of measurement variation. Gage
Repeatability and Reproducibility and bias studies are recommended to prove measurement capability
with flexible components and fixtures.
Where fixtures are used, the fixture may induce a measurement bias. Experiments should be conducted to
ensure the component characteristics are not unduly affected. This may include the measurement of the
component using several different measuring systems, both on and off the fixture to establish the range of
measurement obtained.
In all cases, component constraint should follow the customers’ requirements or be documented and agreed
through the measurement limitations process.
62
RM13003 - Measurement Systems Analysis
MSA studies are usually conducted on a specific feature, component, and measurement system. In certain
circumstances it may be appropriate to use existing MSA study data and read the results across to another
component in place of conducting a new study. Read across results should only be used when the measurement
study characteristics are within AS13100 acceptance limits.
This may only be done when the measurement system characteristics are judged to be a suitable equivalent.
The decision to read across capability should be confirmed by assessment of study characteristics from the
donor study to the system under test, considering feature criticality.
Acceptance of read across should be documented and approved by the customer authorities through the
Measurement Capability Acceptance. Examples of study characteristics and acceptance criteria are indicated
in the table below:
63
RM13003 - Measurement Systems Analysis
5.18 Resolution
The resolution of a gage is an inherent property of that instrument and is usually fixed by its design. It is the
smallest readable unit or usable output from that instrument. Care should be taken not to assume that the
smallest increment on the gage indicates its resolution, as this is often not the case. The gage manufacturer
should specify the least-significant digit (LSD) and it is this that should be used to test the gage’s suitability for
its application.
Resolution may also be affected by ‘noise’ such as electromagnetic interference, vibration, friction, etc., as well
as the physiological limitations of people reading the gage, e.g., eyesight.
The drawing requirement for a compressor case diameter is 10.312 inches ± 0.002 inches. This is a critical
characteristic. What is the required gage resolution for the measurement equipment?
The measuring equipment should be able to discriminate to at least one tenth of the total tolerance being
measured, per AS13100 MSA Acceptance Limits.
In this case, for the total part tolerance range of 0.004, to meet the 10:1 resolution requirement, 0.004/10 =
0.0004. The gage should be able to measure with a resolution of 0.0004.
This is calculated by dividing the total part feature tolerance (from the part drawing or specification) by the total
calibration tolerance spread of results when the equipment was calibrated (or the calibration tolerance if easier).
Case Study 1:
If the tolerance for the diameter of a shaft is quoted on its drawing as 15.6 mm ± 0.05 mm then the total tolerance
is 0.1 mm. This is measured by a micrometer which has a calibration tolerance of 0.003 mm. This gives an
accuracy ratio of 33:1 which is acceptable.
Case Study 2:
The drawing requirement for a combustion case diameter is 16.714 inches ± 0.002 inches. This is a critical
characteristic. The required accuracy ratio may be obtained by taking the total tolerance spread of the
characteristic to be measured and dividing by 10, per AS13100 MSA Acceptance Limits. This will provide the
largest acceptable total gage accuracy permitted for that dimension.
In this case 0.004/10 = 0.0004. To maintain the 10:1 accuracy ratio, the total calibration tolerance spread of the
Measurement Equipment should be 0.0004 inches or less.
5.20 Repeatability
A repeatability test, also known as a Type 1 Gage Study, will identify the variation observed when one operator
performs repeated measurements on one part with the same instrument. The best way to analyze the results
of this type of study is to plot the values on a graph. In the example below a feature has been measured 50
times and the results plotted on a run chart.
64
RM13003 - Measurement Systems Analysis
The accepted reference value of 42.0 mm has been added as a reference line (‘Ref’ on chart) to check for bias.
Lines have also been added to the chart to indicate 10% of the tolerance (‘Ref ±0.10*Tol’ on chart). Ideally all
results would be within the 10% limits, but in this example 4, 36, and 41 are not so the source of this variation
should be established and eliminated before the measurement system can be accepted.
As part of the validation for the measurement of control blades, a mini repeatability study was conducted to
validate two dimensional features:
A coordinate measuring machine (CMM) equipped with a five-axis scanning head (continuous probe movement
over the part surface to gather many data points) was used. The CMM was programmed in line with the part
definition and according to the customer methodology that defines the method of programming and the datum
points to use. The component is loaded in a fixture to locate the part for inspection. The datum system is made
directly on the blade surface, so several iterative calculations are required to generate a repeatable and accurate
CMM datum system (CMM measured points need to be taken on the datum points specified on the drawing, so
the datum is measured and recalculated several times to ensure the correct surface location is measured). The
fixture is therefore not a factor in the measurement system.
As this study is a periodic confirmation of capability, this study used a sample of three parts that represent the
process variation, each part was measured 10 times by the same operator on the same fixture. Note: The
changes in measurement setup are limited to ensure variation from other factors is not introduced between
runs. The range method was used for calculation based on the variation of the three runs by part and analyzing
the worst case of the three parts to provide a quick approximation of measurement variability.
In this example, only two geometrical dimensions are studied but the same process would be used for the entire
geometrical definition:
65
RM13003 - Measurement Systems Analysis
𝑀𝑀𝑀𝑀𝑀𝑀 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅
% 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 = 100 ∗ � �
𝐼𝐼𝐼𝐼
There are several calculation methods for %repeatability, but in this case study the calculation above was used.
This calculation was conducted on all the geometrical dimensions for all the runs and this can be easily
completed in spread sheet software package.
Results:
Analysis of Results:
The maximum repeatability observed across all parts is used to assess if the repeatability is acceptable by
comparison with the minimum acceptance criteria set out in AS13100 MSA Acceptance Limits. The result for
this case study can be presented using the table below:
66
RM13003 - Measurement Systems Analysis
It is observed that the repeatability for dimension N°1 part 2 gives the maximum value of 16% of the total
tolerance. As this is below the minimum guidance established for “Minor” feature category (Repeatability <30%
from AS13100 MSA acceptance Limits), the repeatability of measurement for this dimension is deemed
compliant.
It is observed that the repeatability for dimension N°2 part 3 gives the maximum value of 10% of the total
tolerance. As this is below the minimum guidance established for “Major” feature category (Repeatability <20%
from AS13100 MSA acceptance Limits), the repeatability of measurement for this dimension is deemed
compliant.
The measurement system and measurement program have acceptable levels of repeatability so the study
records should be maintained internally and archived properly.
If several operators or fixtures for positioning are introduced, a further gage repeatability and reproducibility
study will be required to establish if they affect the measurement system. This simple test has not tested the
measurement system accuracy so further tests may be advised.
This is the most common MSA study used and determines how the variation in measurement system is split
between repeatability and reproducibility. In a good measurement system, the largest variation obtained is due
to part to part differences not variation due to the measurement system.
Once the study pre-requisites and planning are completed then the study can be run and the data collected and
analyzed, ideally using a suitable statistical analysis software package such as Minitab (contact the customer
for details of approved software packages).
There are two approved methods for analyzing the data: an ANOVA (analysis of variance) method and an Xbar
and R method. The calculations used in Xbar and R method are simpler, but the ANOVA method is preferred.
The objective of an ANOVA Gage R&R study is to split the variation into individual components to represent the
variation in the study due to repeatability, reproducibility, parts, and operators. This helps to identify any specific
sources of variation. The results are easier to interpret if they are expressed graphically as shown below.
The sources of variation are split and expressed as percentages; total Gage R&R (left hand sub-group), gage
variation split into repeatability and reproducibility (center two) and part-to-part variation, which is the variation
present in the parts being measured.
Each sub-group is expressed in three different ways (represented by the different shaded columns in the
example above); as % contribution and then individual components divided by the total study variation and
finally by the tolerance defined for the component characteristic being measured.
67
RM13003 - Measurement Systems Analysis
AS13100 MSA Acceptance Limits in section 7.1.5 has acceptance limits for these percentages.
Another useful way of expressing the data is to use an Xbar and Range chart which plots the mean average
values of each part measurement by the range of values for that subgroup. If operator A measures part 1 twice,
the average of the two values along with the difference between the two measurements are plotted as shown
below.
As the parts chosen should represent the entire range of possible parts the graph should ideally show a lack of
control, i.e., points outside the horizontal dashed control limit lines (UCL and LCL). In the Xbar chart the lack of
control shows that there is more part variation than measurement variation.
This chart also allows a comparison of the different operators and can show how good the discrimination of
the gage is. In the example graph above, operator John is producing different results from the other two and
has roughly three times the measurement variation in his repeat measurements (up to 0.015 mm compared to
0.005 mm).
It can also be seen in the same example that John’s 9th and 10th measurement points are out of the control
limits in the R Chart by Operator. This gives cause for concern and the study may be rejected due to the lack
of control. Without stability in the range of measured values the measurement error is not predictable and could
lead to a false study result.
Instances where a lack of control is detected can be recorded as part of the measurement system study and
reported to the customer. Before doing this, review the process that John is using and his training to ensure
control of the process and repeat the trial.
Several other graphs can be produced to illustrate the data in useful ways. If reproducibility is significant then
the data from each of the operators can be plotted to show the average measurements for each part. Any
patterns in the data can then be identified to help determine the cause of the variation.
The discrimination of the gage can also be seen to be 0.005 mm by observing the steps in the data plots which
are all increments of 0.005 (0.000, 0.005, 0.010, etc.).
68
RM13003 - Measurement Systems Analysis
In this case study, operator John measures seven out of the 10 parts smaller than the other two inspectors
which were part of the trial.
John also seems to have a problem with measuring larger dimensions his maximum reading was just above
394.10 mm where other inspectors measured larger.
For measurement system accept/reject purposes the % Tolerance value should be used in AS13100 MSA
Acceptance Limits. This is the total percentage of tolerance consumed by variation.
The case study example is from Minitab software, but other software will give similar output. From this data the
following can be seen:
1. If the tolerance (Upper spec - Lower spec) is given to Minitab, the % Tolerance column is calculated by
dividing the “Study Variation” for each “source” by the tolerance. This column may not appear if the
characteristic tolerance is not given to the software.
2. The % Study Variation is the percentage of total variation for each Sources. This is calculated by summing
all the “Study Deviations” to calculate the total Study Deviation, then working out the contribution each
“Source” adds to this total as a percentage.
3. The “Study Variation” is the “Standard Deviation” calculated in the study multiplied by six which is the
number of standard deviations needed to estimate 99.73% of the measurement population. This is a
requirement of AS13100.
69
RM13003 - Measurement Systems Analysis
An aero engine organization is manufacturing machined structures. An inspection device is used to determine
a critical feature on one of the parts. To evaluate the measurement system and determine if it is fit for its intended
purpose an MSA is conducted.
The critical feature is an outer diameter with specification limits 838.60 - 838.80 mm (total tolerance = 0.2 mm).
The inspection device is a dial gage comparator together with a master gage.
The manufacturing engineer started planning the MSA study by considering potential sources of variation in the
measurement system and developing the test procedures, making sure the test followed the defined measuring
procedure and evaluating if any environmental factors affect the measurement system.
The gage resolution and the accuracy ratio are determined to meet the acceptance criteria. Since the
measurement system includes several operators, the engineer decides to perform a Gage R&R study. Gage
R&R can be generated using several statistical software packages but can also be calculated manually.
Ten parts were selected that represent the expected range of the process variation. Three operators measured
the 10 parts, three times per part, in a random order without seeing each other’s readings.
Data:
The mean average diameter for each part/operator combination was then calculated together with the range
(maximum - minimum).
Calculating all the variance components from the above data gave us the following results, using the simpler
Xbar R method:
70
RM13003 - Measurement Systems Analysis
% Contribution
Source Variance Component (of Variance Component)
Total Gage R&R 0.0002274 7.20
Repeatability 0.0002239 7.08
Reproducibility 0.0000035 0.11
Part-to-part 0.0029331 92.80
Total variation 0.0031605 100.00
This showed that the total Gage R&R value when calculated as a percent of the tolerance was 45.24%, more
than the permitted value of 10% maximum allowed for critical features. 44.89% of the variation came from
repeatability, and only 5.64 from reproducibility. This would tend to indicate that the equipment is the cause, as
opposed to operator skill. The measurement system was rejected as being not suitable to determine the
diameter of the structure.
Recalculating the results using the recommended ANOVA method gave the following results:
Degrees of Sum of
Source Freedom Squares Mean Squares F statistic p-value
Part 9 0.260893 0.0289881 164.562 0.000
Operator 2 0.000727 0.0003633 2.063 0.134
Repeatability 78 0.013740 0.0001762
Total 89 0.275360
% Contribution
Source Variance Component (of Variance Component)
Total Gage R&R 0.0001824 5.39
Repeatability 0.0001762 5.21
Reproducibility 0.0000062 0.18
Operator 0.0000062 0.18
Part-to-part 0.0032013 94.61
Total variation 0.0033837 100.00
71
RM13003 - Measurement Systems Analysis
In this example the Gage R&R % Tolerance figure was 40.52%, slightly lower than the Xbar R method but still
greater than the permitted maximum of 10%.
NOTE: The calculations used to determine these results are available in the AIAG guide ‘Measurement System
Analysis’ referenced in Section 2.
Co-ordinate measuring systems can come in several configurations and should be considered as a system of
measuring co-ordinate data from a datum point. This might include tactile systems such as coordinate
measuring machine touch trigger probing, or structured light measurement vision systems that gather images
of the component and then establish the coordinate system within software. In all cases it is important to
understand the measurement process used and establish the tests required to ensure the system results are
capable for the measurements being conducted. Typically coordinate measuring systems will use the following
steps to establish a measurement:
1. System qualification - Establish a measurement reference by qualification of styli tips, axes, or vision areas.
4. Determine a measurement value through the calculation of features based on the coordinate data. This
may be based on the fitting of mathematically perfect features to the measured data, or by comparison of
the data against a known guidance such as solid model.
While coordinate measuring systems are generally thought of as accurate and repeatable, the influence of the
system, environment, calibration, programming method, drawing interpretation, system qualification fixturing,
etc., can influence measurement capability. In all instances it is important to establish:
Often with co-ordinate Measurement Systems, Measurement repeatability can be an optional test only ran once
a Reproducibility test has failed.
Design of Experiments (DoE) is a detailed subject, but the basic idea is simple and lies at the heart of the Gage
R&R method. The principle is that the measurement system will be affected by several process inputs to
produce a single measured result as a system output. The purpose of DoE is to understand how the output
varies with different combinations of the inputs.
A Gage R&R study is an experiment designed to understand the variation in the output of the measurement
system when the measurement system is subject to external sources of variation. The key to a good Gage R&R
study on a CMM is sound experimental design.
If a measurement system is used to repeat the same measurement several times whilst nothing is altered, any
variation in the measured data is attributable to variation inherent in the measuring equipment itself. Repeating
the measurement whilst keeping everything constant should, in theory, give an answer which is equal to the
CMM accuracy and pure CMM repeatability. Any observed variation can only be due to the measurement
equipment itself because all other possible influencing factors have been held constant.
72
RM13003 - Measurement Systems Analysis
Likewise, reproducibility tests should be designed to include variations that will be seen in the measurement
process. For this reason, a reproducibility test should include the component being loaded and unloaded
between each MSA run.
Operator influence tends to dominate handheld gauging and so variation between operators is usually the only
reproducibility factor tested in most conventional gage studies. Conversely, an automated co-ordinate
measuring system executing a part program will not necessarily be influenced by the person who presses the
buttons but may be influenced by other factors.
As a co-ordinate measuring system operates at a much higher level of precision than handheld gages, there
will be many other sources of variation which will have an effect and will therefore have to be accounted for in
the design of the Gage R&R study.
Gage R&R studies should consider every combination of influencing factors that can be anticipated. The start
point is to obtain a set of components that represent around 80% or more of the range of output from the
manufacturing process. The second step is to decide what reproducibility factors need to be tested for.
An Aerospace organization is making stud shafts with a tight diameter limit for bearing tracks. The shafts are
made in a manufacturing cell with multiple coordinate measuring machines. The Manufacturing Engineer has
chosen variation in ambient temperature and machine to machine CMM variation as key elements to study. The
ambient temperature is thought to cause variation as there appears to be a drift between ‘morning’ and
‘afternoon’ shifts shown as two levels on in cycle gauging results.
The machine to machine influence is characterized as ‘machine A’ and ‘machine B.’ The experimental design
is shown below.
Part 1 M/c A, am, run 2 M/c A, pm, run 2 M/c B, am, run 2 M/c B, pm, run 2
Part 2 M/c A, am, run 2 M/c A, pm, run 2 M/c B, am, run 2 M/c B, pm, run 2
Part 3 M/c A, am, run 2 M/c A, pm, run 2 M/c B, am, run 2 M/c B, pm, run 2
The above tables show a full factorial experimental design in which the variables are:
3. Variation between the time of day - taken as a surrogate for ambient temperature
The trial is run, and the data gathered and analyzed using the ANOVA method.
73
RM13003 - Measurement Systems Analysis
It should be noted that each CMM run will inspect multiple features, and therefore the repeatability,
reproducibility and Gage R&R values will be calculated for every feature.
Conclusions:
A successful Gage R&R study on a co-ordinate measurement system requires some initial insight into the likely
sources of variation. An experiment should then be designed to test for the chosen sources of variation. An
understanding of the relative contributions of the sources of variation will provide a good starting point and can
be made generic through control of the process. If a good level of control is implemented, many causes of
variation will be limited and can be excluded from future studies. Where very tight component tolerances are
being measured, more causes of variation will have an influence and should be included in the study.
Coordinate Measuring Systems (CMS) such as coordinate measuring machines, length measuring machines,
structured light systems, articulated measurement arms and laser interferometry systems, are all based on
mechanisms that can measure very accurately. However, these systems are typically controlled through a
computer program or manipulated by a human operator and these factors can degrade system repeatability
and precision. To ensure the measurement system is fit for purpose, it is important to test each program that is
used on the measurement mechanism.
CMS systems require calibration on a frequency dependent on the amount of use and environmental conditions.
The act of calibration will ensure the system accuracy is at an acceptable level through the compensation of
each axis scale together with system setup information such as axis squareness. While calibration is conducted
against calibration master gages (to provide traceability to national standards), it does not provide a complete
measure of system accuracy or repeatability’.
As with all measurement systems, the system (co-ordinate axis and scales, programming control and sensor
and probe) needs to be assessed for:
1. Repeatability (does the system give a consistent measurement result if the component is left in the same
position on the measurement system).
2. Reproducibility (does the system give a consistent result when considering all sources of variation that are
present when the part is removed and replaced between program runs).
As the CMS system is inherently repeatable, the reproducibility test will ensure all sources of variation are
included in the study. For this reason, repeatability testing is optional for CMS systems and only used where a
reproducibility study has failed. In these cases, a repeatability study will indicate if the source of variation is
down to part variation. This often happens when semi-flexible components are measured and the component
flexes between runs.
While the CMS system can be thought of as accurate, there are a lot of errors that can occur such as:
• Is the part flexible, does it move under its own weight or under the force of measurement?
• If a fixture holds the component during inspection, does it affect the part shape or size?
• Does the CMS take the measurement points in the correct place?
• Is the CMS in good working order throughout its range and calibrated?
74
RM13003 - Measurement Systems Analysis
• Does the stylus qualification program take enough points in enough places on the calibration sphere?
• Will the stylus shank hit the part rather than the stylus ball?
• Has the CMS program been created in line with the requirements of the drawing?
• Has the correct datum system been used to measure the features?
• Have enough points been taken on the datum features to represent the part shape accurately?
• Have all the measured results been output to the report system using the correct datum’s
To ensure the inspection system is fit for purpose, all programs should be tested for reproducibility and bias.
Read across of results from one program to another should not be used as few of the errors listed above will be
tested.
Testing CMS programs is simple and where the required measurement accuracy is within the bounds of the
inspection machine, the study can consist of far fewer repeats than normally associated with measurement
systems analysis.
Consider the CMS as a measurement system, where the part is loaded, inspected, and removed.
Within the trial this needs to be replicated through the repeated loading, measuring and unloading the
component. Five runs are used.
Review the data from the dynamic runs by finding the largest and smallest measurement result for
each measured feature. If the range of measurement is less than 10% (taken from Table of
acceptance limits in AS13100) of the feature tolerances continue. If any measurement result range is
greater than 10% of the component tolerance, analysis of the program and CMS mechanism is
required to understand the root cause of reproducibility issue. This is typically done with a
Repeatability study - see Stage 3.
To test if the figures from the CMS mechanism are accurate, it is expected that an alternative method
of measurement should be used to check the measured result achieved. The equipment that is used
to verify the measured result should be at least as accurate as the measurement system under test.
This normally means that an alternative method of CMS inspection is required.
It is expected that the maximum difference between the measurement on one system, compared to
the measurement on the other system will be less than 10% of the feature tolerance. This comparison
needs to be completed for all the features measured. It is very good practice to use the average of
multiple measurements. This helps to eliminate any variation due to measurement noise.
The bias is calculated as the percentage of feature tolerance and is based on the difference between
the two measurement systems for each feature measured.
75
RM13003 - Measurement Systems Analysis
The route cause with this test is commonly based on the programmer misunderstanding the design
definition or measuring the feature in a slightly different way (number of points, measurement height,
etc.) Therefore, its recommended that the alternative programmer or inspector is used to create the
second inspection program. While independence does not fully ensure the production program is
correctly coded, the chances of coding two measurement systems the wrong way twice are limited.
Stage 3: Test the measurement system to determine the variation due to the measurement system (static
repeatability) independent of part movement. This is often an optional test used to understand the
reasons for Stage 1 (reproducibility test) failing.
Leaving the component on the CMS system, and without moving the component as far as possible,
complete a further three repeat runs of the measurement program (to achieve 10 runs minimum).
Review the data from the static runs by finding the largest and smallest measurement result for each
measured feature. The range of measurement should be far less than that seen in the dynamic runs.
If any feature exceeds 10% of feature tolerance, it is likely that there is a programming, mechanism,
fixture, or datum system problem that needs to be fixed.
If the dynamic study shows high repeatability, but the same feature on the static repeatability run
shows low repeatability, it is likely that the part is moving or deforming between each inspection run.
Further testing of the program is required, and the customer should be consulted to determine if fixture
or free-state inspection is required.
On completion of the stage 1, 2 and 3 runs, complete the statistical analysis of all the measurement
runs to establish the true expected repeatability of the system. Assuming six standard deviations, the
repeatability seen for all features over all tests should be less than 10% of each feature tolerance.
If outside the 10% feature tolerance, consider the programming methodology, number of
measurement points per feature, and position of the measurement points.
76
Document
Part Number
Eng Drawing No
Associated Docs
Measurement Capability Sheet MSA Study, FAI Pack, Fixed Process Approval, Control Plan)
Critical Observed
Major Process Existing
No Sheet No Grid Ref Feature Description Nominal Max Tol Min Tol Measurement Method Action Comments
Minor Capability Capability
Unclassified (CP, CPK etc)
0 2 F3 3.1 Radius 3.1 3.2 3 Minor 3.0 CPK Shadowgraph 19.2% None Passed MSA
1
Example measurement capability acceptance form:
77
2
3
4
5
6
7
8
9
10
APPENDIX A: ACKNOWLEDGEMENTS
This reference manual represents the consensus of the members of the AESQ. The Team members who
developed this guidance and whose names appear below, wish to acknowledge the many contributions made
by individuals from their respective organizations.
Organization Representative
Rolls-Royce Simon Gough-Rundle - Team Leader
MTU Anil Oenuer - Deputy Team Leader
Pratt & Whitney Todd Angus
Pratt & Whitney Jule Hegwood
Safran Benoit Gottie
Safran Frederic Vetil
GE Aviation Marnie Ham
Safran Geoffrey Carpentier
Pratt & Whitney Joseph Drescher
P&W Canada Pierre Gaudet
P&W Canada Simon Lamarre
78
RM13003 - Measurement Systems Analysis
Change History
Email: [email protected]
79