MCQ SettingtheStandard
MCQ SettingtheStandard
presentation
from the
Medical
Education
Unit
in
University
College
Cork,
describes
how
to
carry
out
standard
setting
for
multiple
choice
examination.
In
a
previous
presentation
we
have
looked
at
what
standard setting
means
and
why
it
is
used
in
medical
education.
1
Please see
the
Overview
of
Standard
Setting
presentation
for
more
details
on
the
theory
behind
standard
setting.
There
are
a
number
of
different
methods
of
standard
setting
that
have
been
validated
for
use
in
MCQ
type
examinations.
These include the Modified A ngoff 1,2,3,4 , Ebel 2,5 , Hofstee 2,6 and Cohen 7,8 methods.
I will describe each of these methods and give r eferences to r elated publications.
The
method
that
is
most
used
in
UCC’s
Medical
Education
Unit
is
the
Modified
Angoff,
so
we
will
describe
this
in
the
greatest
detail.
2
The
A ngoff Method
was
first
described in
1 971,
and
since
then
various
modifications
have
been
proposed.
A
group
of
expert
judges
make
estimates
on
how
a
borderline
candidate
would
perform
on
each
item
in
the
test.
Ideally a panel of at least 6-‐8 judges should be involved in the process.
3
The
Experts
in
the
standard
setting
process
are
r equired
to
conceptualise the
minimum
level
of
performance
r equired
for
a
pass
in
the
examination.
Standard
setters
should
r eceive
training,
so
that
they
can
provide
their
judgements in
an informed
manner.
This
training
should
familiarise them
with
both
the
standard
setting
task
and the
conceptual
level
r equired
for
a
pass.
It
should
ideally
provide
an
opportunity
for
standard
setters
to
calibrate
their
expectations
using
past
performance
data.
4
The
next
concept
that
we
need
to
explore is
that
of
the
borderline
candidate.
At
the
beginning
of
every
modified
A ngoff process,
the
panel
of
judges
should
conceptualize
the
“Borderline
Candidate”
/
Minimally
Competent
Candidate.
This
candidate
demonstrates
the
knowledge
/
skills
that
are
just
about
at
the
level
which
differentiates
pass
or
fail.
We
use
the
concept
of
the
borderline
candidate
r outinely
in
our
clinical
O SCE
exams
so
this
is
generally
familiar
to
clinical
examiners
but
it
may
be
less
familiar
to
examiners
in
other
fields.
We
describe
the borderline
candidate
as
one
where
their
performance
is
patchy,
they
may
demonstrate
some
aspect
of
the
r equired
knowledge
or
skill
but
they
also
demonstrate
multiple
omissions
and
errors.
5
Each
member
of
the
panel
of
judges
estimates
the
proportion
of
borderline
examinees
who
will
answer
an
item
correctly.
This
is
equivalent
to
estimating
the
candidate’s
likelihood
of
answering
an
item
correctly
9 .
Estimates
are
averaged
over
judges
and
summed
over
items
to
create
a
standard
(cut-‐
off
score).
6
The
Examination
coordinator
convenes
a
panel
of
experts.
These
experts
might
include
anyone
who
teaches
on
the
module,
or
who
teaches
on
that
subject
in
other
modules,
tutors,
post
grads.
Training should be provided to the panel of standard setters.
The training focuses
on
explaining
what
has
been
taught
to
the
students
and
what
level
the
students
are
at.
The panel should then focus on conceptualising the Borderline Candidate.
The
next
step
is
for
each
member
of
the
panel
to
r ead
the
examination
and
for
each
item
to
answer
the
question:
“What percentage of borderline candidates would answer this question correctly?”
Each
member
of
the
panel
should
fill
out
a
spreadsheet
such
as
the
one
shown
in
the
next
slide.
7
Each
examiner
fills
in
their
initials
as
shown
and
then
r ecords the
percentage
of
Borderline
candidates
that
they
think
would
answer
each
question
correctly,
based
on
how
difficult
they
think
the
question
would
be
for
this
cohort
of
students.
8
If
the
modified
A ngoff is
being
used
in
a
single
best
answer
MCQ,
it
is
important
to
remember
that
just
by
guessing,
a
proportion
of
completely
incompetent
candidates
would
statistically
be
expected
to
answer
each
question
correctly.
So
for
example
if
the
question
has
5
possible
answers,
then
2 0%
of
candidates
with
no
prior
knowledge
would
be
expected
to
answer
each
question
correctly.
So
when
answering
the
question
“What
percentage
of
borderline
candidates
do
you
think
would
answer
this
question
correctly?”
we
need
to
bear
this
in
mind.
For
a
question
with
5
possible
answers
I
would
ask
the
examiners
to
give
a
percentage
between
20%
-‐100%
to
allow
for
r andom
chance.
9
On
this
slide we
can
see
a
worked
example
from
a
previous
MCQ
used
in
the
School
of
Medicine
on
a
clinical
paper.
We
can
see
that
7
examiners
have
filled
in
their
percentages
for
each
question.
Usually
we
r ecord
the
examiners
initials
but
as
this
is
actual
data
from
School
of
Medicine
examiners,
I
have
r eplaced
the
initials
with
A ,
B,
C,
D
and
so
on
to
protect
anonymity.
Each
column
r epresents
a
separate
examiner
and
each
row
r epresents
a
separate
exam
question.
For
this
exam
7
examiners
participated
in
the
standard
setting.
W e
usually
ask
all
clinical
module
coordinators
and
clinical
tutors
who
are
involved
in
the
year
to
participate.
This
gives
us
a
good
mix
of
subject
expertise
and
also
knowledge
of
what
the
students
have
been
taught
and
the
standard
expected
from
them.
10
The
next
step
is
for
the
panel
to
compare
their
scores.
Looking
at
question
1
here,
we
can
see
that
there
is
broad
agreement
between
examiners,
with
estimates
ranging
from
3 0%
to
5 0%.
However
look
at
question
1 4
– here
we
see
a
big
discrepancy
with
estimates
ranging
from
3 0%
to
8 0%.
Now
the
examination
coordinator
or
usually
module
coordinator
should
step
in.
Perhaps,
for
example,
this
might
be
a
difficult
concept,
but
the
students
may
have
had
explicit
teaching
on
this
subject.
Some
examiners
may
be
aware
of
this
and
others
may
not.
So
at
this
stage,
the
questions
are
r eviewed,
discrepancies
are
discussed,
and
examiners
can
choose
to
r eview
their
original
estimates.
If
a
broad
consensus
cannot
be
r eaching
on
any
particular
question
then
the
module
coordinator
should
consider
removing
that
question
from
the
paper
entirely.
11
The
next step
is
to
calculate
the
mean
percentage
per
question
and
then
the
overall
mean.
The
overall
mean
becomes
the
new
pass
mark
– in
this
case
52.3%.
12
As
UCC
uses
50%
as
the
Pass
Mark
for
examinations
in
the
medical
degree
programmes,
the
students’
actual
marks
are
amended
taking
into
account
the
new
pass
mark
(cut
score).
This
is
the
formula
used: Amended
mark
=
(actual
mark
X
old
pass
mark)/
new
pass
mark.
For
example,
if
a
student’s
actual
score
is
6 0/100
and
the
new
pass
mark
/
cut-‐off
score
is
5 5%,
the
student’s
amended
mark
is
(60
X 50)/55=
5 4.5%
13
Angoff’s method
is
r elatively
easy
to
use,
there
is
a
sizeable
body
of
r esearch
to
support
it,
and
it
is
frequently
applied
in
licensing
and
certifying
settings.
However
it
is
much
easier
to
use
when
the
panel
have
done
it
once
or
twice
in
the
past.
This
method
produces
absolute
standards,
so
it
is
well
suited
to
tests
that
seek
to
establish
competence.
14
Another
method
that
can
be
used
for
standard
setting
is
the
Ebel method.
This
has
been
in
use
since
1 986.
In
the
Ebel method,
again,
we
have
a
team
of
judges
who
review
each item
in
the
test.
They
r ate
each
item
on
2
dimensions
– difficulty
and
importance.
Each
member
of
a
panel
of
standard-‐setters
completes
a
3 x3
grid,
allocating
every
question
to
one
of
the
nine
boxes
in
the
grid.
15
So
looking
at
this sample
MCQ
question,
an
examiner
might decide
to
r ate
this
question
as
Important
and
of
medium
difficulty.
16
Examiners
may
have
differing
opinions
about
how
to
categorize
any
given
question.
Then
the
question
should
be
discussed
by
the
panel,
including
any
r elevant
information
about
how
the
topic
was
covered
in
teaching.
A consensus is then r eached by the panel for each question.
17
Next the
experts
agree
on
the
definition
of
a
minimally
competent
examinee. Then
another
grid
is
filled
out,
this
time
estimating
the
percentage
of
questions
in
each
category
that
a
borderline
/
minimally
competent
candidate
would
answer
correctly.
18
So
in
this
table
I
have
transferred
the
9
boxes
on
the
last
grid
into
the
first
2
columns
we
see
here.
Next
we
go
back
to
the
test
and
count
how
many
items
were
judged to
be
in
each
of
the
9
categories.
So
in
this
example
7
questions
were
judged
by
the
expert
panel
to
be
Essential
and
Easy,
8
were
judged
to
be
Important
and
Easy
and
so
on.
The
percentage
in each
category
that
the
panel
believed
would
answer
questions
in
that
category
correctly
is
multiplied
by
the
number
of
questions
that category
contains.
The
passing
score
is
set
by
averaging
the
category
scores.
So
in
this
case
the
average
category score
is
the
total
score
for
all
the
9
categories
divided
by
the
total
number
of
questions,
which
is
6 0.
So
3 600/60
=
6 0
which
now
becomes
the
pass
mark
of
the
test.
19
The
Hofstee Method
is
another
way
of
standard
setting.
It
is
described
as
a
compromise
method,
using
a
combination
of
r elative
and
absolute
standards.
Their
r esponses
serve
as
the
focus
for
discussion,
with
all
being
free
to
change
their
estimates.
These
minimum
and
maximum
failure
r ates
and
percent
correct
scores
are
averaged
across
panelists
and
projected
onto
the
actual
score
distribution
to
derive
a
passing
score
as
we
see
on
the
next
slide.
20
This
is
a
worked
example
taken
from
Kamal
et
al
10 .
The data r efers to a Final Med MCQ paper with 8 0 questions on the paper.
As
we
see
on
the
Y
axis,
the
examiners
set
the
minimum
acceptable
fail
r ate
at
1 7%
and
the
maximum
acceptable
fail
r ate
at
3 6%.
Looking
at
the
x
axis
we
can
see
that
they
set
the
minimum
acceptable
pass
mark
at
36/80
and
the
maximum
acceptable
pass
mark
at
4 8/80.
Finally we see the curve of the students’ actual performance on the test.
Look at the 2 horizontal lines made up by the minimum and maximum fail r ates.
Now
look
at
the
vertical
lines
made
up
by
the
minimum
and
maximum
acceptable
pass
marks.
These
4
lines
intersect
in
a
r ectangle
as
shown
on
the
graph.
A
diagonal
is
drawn
across
the
r ectangle.
The
point
where
that
diagonal
line
intersects
with
the
curve
of
the
students’
actual
performance
becomes
the
pass
mark
21
for
the
exam
– so
in
this
case
the
pass
mark
is
set
at
45/80.
21
The
advantages
of
the
Hofstee method
are
that
it
is
easy
to
implement,
and
that
the
questions
asked
of
the
examiners
are
less
abstract than
in
some
of
the
other
methods.
However,
it
can
happen
the
the
pass
mark
defined
by
the
process
is
not
within
the
bounds
of
the
actual
scores
on
the
exam
and
when
this
happens
the
standard
becomes
the
maximum
or
minimum
acceptable
pass
mark
identified
by
the
examiners.
For this r eason the Hofstee method is less suited to high stakes exams.
22
The Cohen
Method is
a
simple
and
fast way
of
standard
setting.
The
basic
method
is
to
set
the
pass
mark
at
6 0%
of
the
highest
achievers’
score,
or
60%
of
the
mean
of
the
top
3
highest
achievers’
scores,
or
at
6 0%
of
the
90 th or
9 5 th
centile.
You
need
to
have
at
least
1 00
students
in
the
cohort
to
be
able
to
use
this
with
any
degree
of
statistical
confidence.
I
use
it
as
part
of
a
post
exam
evaluation
to
r eality
check
the
pass
mark
that
I
have
arrived
at
by
doing
a
modified
A ngoff.
23
Whichever
standard
setting
method
is
used,
we
must
follow
these
guidelines
3 :
24
25