Statistical Procedures For Measurement Systems Verification and Validation Elsmar
Statistical Procedures For Measurement Systems Verification and Validation Elsmar
n
i
P
The percentage of the total variation that is due to product variation is x 100.
%
p
= x 100
The percentage of the total variation that is due to measurement error is (1-) x 100
%
e
= (1- ) x 100
The Discrimination Ratio is the number of distinct categories that the observed process
variation can be divided into given the measurement error and the amount of observed
variation.
1
2
- 1
+ 1
D
2
e
2
T
R
Centerline,
d
:
n
d
d
i
n
d d
i
d
k
j
j j
k
j
ij
n
i
overall
q p m nm
x m n
K
1
1
2
1
2
) 1 (
1
Kappa Score by Category:
( )
( )
j j
n
i
ij ij
category
q p m nm
x m x
K
1
1
1
n = number of samples
m = the number of judgers, raters or inspectors
k = number of categories
x = an individual rating = 1
p = # ratings within a category/(nm)
q = 1- p
McNemars Test for Differences Between Two Correlated (Dependent ) Results
Calculating p values for the statistical significance of the difference (the difference is
between cells b and c)
One Tailed Exact Binomial (in Excel):
p
Binomial
= 1-BINOMDIST(max(b,c),n
b+c
,0.5,True)
max count(b,c) is the larger of the two cell counts in b and c
n
b+c
is the sum of the cell counts in b and c.
0.5 is Binomial probability given that if there were no real difference the count of b will
equal the count of c.
True = cumulative probability
One Tailed Chi-Square Test (in Excel):
p
2
=
2
CHIDIST(
2
,1)
2
=
( )
( ) c b
1 c b
2
+
degrees of freedom (df) = 1
Confidence Intervals for the Size of the Difference of Two Correlated Results
n = a + b + c + d
= |P
A1
P
A2
|
e
=
( ) ( ) ( )
n
P P P P 2 P 1 P P 1 P
c b d a 2 A 2 A 1 A 1 A
+
Confidence Limits =
e
2
z
t
n
) c a (
P
1 A
+
n
) b a (
P
2 A
+
n
a
P
a
,
n
b
P
b
,
n
c
P
c
,
n
d
P
d
Pass Fail
Pass a b
Fail c d
2
nd
Measurement
1st Measurement
SAMPLE SIZE FORMULAS
Continuous Data
2
2
z
n
,
_
or
( )
2
z z
n
1
]
1
: the standard deviation of the population you are trying to estimate
: the amount of accuracy you want; this is the minimum amount of detectable
difference from the true mean.
is always expressed as a number, not a percent.
Note that + = the Confidence Interval.
: alpha is the probability of detecting a difference that doesnt exist, typically .05 or .01
: beta is the probability of not detecting a difference that does exist, typically 0.01
Categorical Data
Categorical data for the proportion of defective events generally follows a Binomial
Distribution.
( )
2
2
p 1 p z
n
or
( ) ( )
2
2 2 1 1
p 1 p z p 1 p z
n
1
1
]
1
Rare events (<5%) will require iteration using confidence intervals for the Exact Binomial.
A starting point for the iteration is to use the Exact Binomial formula:
ln: natural log
P(Detection): the probability of detecting a defect rate that is = p, typically. 95 or .99
p: the expected defect rate or occurrence rate of the process; may be the minimum rate
which we want the system to reliably detect.
Confidence 80% 90% 95% 99.0% 99.9% Power
0.2 0.1 0.05 0.01 0.001
z
z
/2
1.282 1.645 1.96 2.576 3.291
) 1 ln(
)) ( 1 ln(
p
Detection P
n
Appendix: References
Donald J Wheeler, Craig Award Paper, Problems with Gauge R&R Studies, 46th
Annual Quality Congress, May 1992, Nashville TN, pp. 179-185.
Youden, William John, Graphical Diagnosis of Interlaboratory Test Results,
Industrial Quality Control, May 1959, Vol. 15, No. 11
Donald J Wheeler, Richard W Lyday, Evaluating The Measurement Process, Second
Edition, SPC Press, 1988
Donald S. Ermer and Robin Yang E-Hok, Reliable Data is an Important Commodity,
The Standard, ASQ Measurement Society Newsletter, Winter 1997, pp. 15-30.
Donald J Wheeler, An Honest Gauge R&R Study, Manuscript 189, January 2009.
http://www.spcpress.com/pdf/DJW189.pdf
Donald Wheeler, How to Establish Manufacturing Specifications, ASQ Statistics
Division Special Publication, June 2003, http://www.spcpress.com/pdf/DJW168.pdf
Bland, Martin, J., Altman, Douglas, G., Statistical Methods For Assessing Agreement
Between Two Methods Of Clinical Measurement, The Lancet, February 8, 1986
Dietmar Stockl, Diego Rodrguez Cabaleiro, Katleen Van Uytfanghe, Linda M. Thienpont
Interpreting Method Comparison Studies by Use of the BlandAltman Plot:
Reflecting the Importance of Sample Size by Incorporating Confidence Limits and
Predefined Error Limits in the Graphic, Letter to the Editor of Clinical Chemistry, 50,
No. 11, 2004
Prond, Paul, and Ermer, Donald S., A Geometrical Analysis of Measurement System
Variations, ASQC Quality Congress Transactions Boston, 1993
Morris, Raymond A., and Watson, Edward, F., A Comparison of the Techniques Used
to Evaluate the Measurement Process, Quality Engineering, 11(2), 1998, pp. 213-219
Futrell, David, When Quality is a Matter of Taste, Use Reliability Indexes, Quality
Progress, Vol. 28, No. 5, May 1995, pp. 81-86
Agresti, Alan, An Introduction to Categorical Data Analysis, John Wiley & Sons,
1996