0% found this document useful (0 votes)
44 views26 pages

MCQ SettingtheStandard

This document describes the Modified Angoff method for standard setting in multiple choice examinations. The Modified Angoff method involves convening a panel of experts who are trained on conceptualizing a "borderline candidate". Each expert then estimates what percentage of borderline candidates would correctly answer each test item. The estimates are averaged across experts and summed to determine a cutoff score. The method requires experts to evaluate each item based on the difficulty for borderline candidates who have the minimum level of knowledge expected to pass.

Uploaded by

Nazia Enayet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views26 pages

MCQ SettingtheStandard

This document describes the Modified Angoff method for standard setting in multiple choice examinations. The Modified Angoff method involves convening a panel of experts who are trained on conceptualizing a "borderline candidate". Each expert then estimates what percentage of borderline candidates would correctly answer each test item. The estimates are averaged across experts and summed to determine a cutoff score. The method requires experts to evaluate each item based on the difficulty for borderline candidates who have the minimum level of knowledge expected to pass.

Uploaded by

Nazia Enayet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

This

 presentation  from the  Medical  Education  Unit  in  University  College  Cork,  
describes  how  to  carry  out  standard  setting  for  multiple  choice  examination.  In  a  
previous  presentation  we  have  looked  at  what  standard setting  means  and  why  it  is  
used  in  medical  education.  

1
Please see  the  Overview  of  Standard  Setting  presentation  for  more  details  on  the  
theory  behind  standard  setting.  

There  are  a  number  of  different  methods  of  standard  setting  that  have  been  validated  
for  use  in  MCQ  type  examinations.  

These  include  the  Modified  A ngoff 1,2,3,4 ,  Ebel 2,5 ,  Hofstee 2,6 and  Cohen  7,8 methods.  

I  will  describe  each  of  these  methods  and  give  r eferences  to  r elated  publications.  

The  method  that  is  most  used  in  UCC’s  Medical  Education  Unit  is  the  Modified  
Angoff,  so  we  will  describe  this  in  the  greatest  detail.

2
The  A ngoff Method  was  first  described in  1 971,  and  since  then  various  modifications  
have  been  proposed.  

It  is  widely  used  in  medical  education  nationally  and  internationally.    

A  group  of  expert  judges  make  estimates   on  how  a  borderline  candidate  would  
perform  on  each  item  in  the  test.  

Ideally  a  panel  of  at  least  6-­‐8  judges  should  be  involved  in  the  process.  

3
The  Experts  in  the  standard  setting  process  are  r equired  to  conceptualise the  
minimum  level  of  performance  r equired  for  a  pass  in  the  examination.  

In  order  to  do  so,  they  must  :


• be  knowledgeable  about  the  candidate  population  (for  example  first  years  
or  final  years);  
• understand  the  standard  expected  at  this  level;  
• be  knowledgeable  about  the  subject  matter;  and  
• understand  what  has  been  taught  about  the  subject  matter  to  this  cohort  
of  students.  

Standard  setters  should  r eceive  training,  so  that  they  can  provide  their  judgements in  
an informed  manner.  This  training  should  familiarise them  with  both  the  standard  
setting  task  and the  conceptual  level  r equired  for  a  pass.  It  should  ideally  provide  an  
opportunity  for  standard  setters  to  calibrate  their  expectations  using  past  
performance  data.

4
The  next  concept  that  we  need  to  explore is  that  of  the  borderline  candidate.  

At  the  beginning  of  every  modified  A ngoff process,  the  panel  of  judges  should  
conceptualize  the  “Borderline  Candidate”  /    Minimally  Competent  Candidate.

This  candidate  demonstrates  the  knowledge  /  skills  that  are  just  about  at  the  level  
which  differentiates  pass  or  fail.

We  use  the  concept  of  the  borderline  candidate  r outinely  in  our  clinical  O SCE  exams  
so  this  is  generally  familiar  to  clinical  examiners  but  it  may  be  less  familiar  to  
examiners  in  other  fields.  We  describe  the borderline  candidate  as  one  where  their  
performance  is  patchy,  they  may  demonstrate  some  aspect  of  the  r equired  
knowledge  or  skill  but  they  also  demonstrate  multiple  omissions  and  errors.

5
Each  member  of  the  panel  of  judges  estimates  the  proportion  of  borderline  
examinees  who  will  answer  an  item  correctly.

This  is  equivalent  to  estimating  the  candidate’s  likelihood  of  answering  an  item  
correctly  9 .

Estimates  are  averaged  over  judges  and  summed  over  items  to  create  a  standard  (cut-­‐
off  score).

6
The  Examination  coordinator  convenes  a  panel  of  experts.  These  experts  might  
include  anyone  who  teaches  on  the  module,  or  who  teaches   on  that  subject  in  other  
modules,  tutors,  post  grads.

Training  should  be  provided  to  the  panel  of  standard  setters.

The training focuses  on  explaining  what  has  been  taught  to  the  students  and  what  
level  the  students  are  at.

The  panel  should  then  focus  on  conceptualising the  Borderline  Candidate.

The  next  step  is  for  each  member  of  the  panel  to  r ead  the  examination  and  for  each  
item  to  answer  the  question:

“What  percentage  of  borderline  candidates  would  answer  this  question  correctly?”

Each  member  of  the  panel  should  fill  out  a  spreadsheet  such  as  the  one  shown  in  the  
next  slide.

7
Each  examiner  fills  in  their  initials  as  shown  and  then  r ecords the  percentage  of  
Borderline  candidates  that  they  think  would  answer  each  question  correctly,  based  
on  how  difficult  they  think  the  question  would  be  for  this  cohort  of  students.

8
If  the  modified  A ngoff is  being  used  in  a  single  best  answer  MCQ,  it  is  important  to  
remember  that  just  by  guessing,  a  proportion  of  completely  incompetent  candidates  
would  statistically  be  expected  to  answer  each  question  correctly.  

So  for  example  if  the  question  has  5  possible  answers,  then  2 0%  of  candidates  with  
no  prior  knowledge  would  be  expected  to  answer  each  question  correctly.

So  when  answering  the  question  “What  percentage  of  borderline  candidates  do  you  
think  would  answer  this  question  correctly?”  we  need  to  bear  this  in  mind.

For  a  question  with  5  possible  answers  I  would  ask  the  examiners  to  give  a  
percentage  between  20%  -­‐100%  to  allow  for  r andom  chance.  

9
On  this  slide we  can  see  a  worked  example  from  a  previous  MCQ  used  in  the  School  
of  Medicine  on  a  clinical  paper.  

We  can  see  that  7  examiners  have  filled  in  their  percentages  for  each  question.  
Usually  we  r ecord  the  examiners  initials  but  as  this  is  actual  data  from  School  of  
Medicine  examiners,  I  have  r eplaced  the  initials  with  A ,  B,  C,  D  and  so  on  to  protect  
anonymity.  

Each  column  r epresents  a  separate  examiner  and  each  row  r epresents  a  separate  
exam  question.  

For  this  exam  7  examiners  participated  in  the  standard  setting.  W e  usually  ask  all  
clinical  module  coordinators  and  clinical  tutors  who  are  involved  in  the  year  to  
participate.  This  gives  us  a  good  mix  of  subject  expertise  and  also  knowledge  of  what  
the  students  have  been  taught  and  the  standard  expected  from  them.    

10
The  next  step  is  for  the  panel  to  compare  their  scores.  

Looking  at  question  1  here,  we  can  see  that  there  is  broad  agreement  between  
examiners,  with  estimates  ranging  from  3 0%  to  5 0%.  

However  look  at  question  1 4  – here  we  see  a  big  discrepancy  with  estimates   ranging  
from  3 0%  to  8 0%.  Now  the  examination  coordinator  or  usually  module  coordinator
should  step  in.  

Perhaps,  for  example,  this  might  be  a  difficult  concept,  but  the  students  may  have  
had  explicit  teaching  on  this  subject.  Some  examiners  may  be  aware  of  this  and  
others  may  not.  

So  at  this  stage,  the  questions  are  r eviewed,  discrepancies  are  discussed,  and  
examiners  can  choose  to  r eview  their  original  estimates.   If  a  broad  consensus  cannot  
be  r eaching  on  any  particular  question  then  the  module  coordinator  should  consider  
removing  that  question  from  the  paper  entirely.

11
The  next step  is  to  calculate  the  mean  percentage  per  question  and  then  the  overall  
mean.  The  overall  mean  becomes  the  new  pass  mark  – in  this  case  52.3%.

12
As  UCC  uses  50%  as  the  Pass  Mark  for  examinations  in  the  medical  degree  
programmes,  the  students’  actual  marks  are  amended  taking  into  account  the  new  
pass  mark  (cut  score).

This  is  the  formula  used: Amended  mark  =  (actual  mark  X  old  pass  mark)/  new  pass  
mark.

For  example,  if  a  student’s  actual  score  is  6 0/100  and  the  new  pass  mark  /  cut-­‐off  
score  is  5 5%,  the  student’s  amended  mark  is  (60  X 50)/55=  5 4.5%

13
Angoff’s method  is  r elatively  easy  to  use,  there  is  a  sizeable  body  of  r esearch  to  
support  it,  and  it  is  frequently  applied  in  licensing  and  certifying  settings.  

This  process  can  be  time  consuming  when  first  used.

However  it  is  much  easier  to  use  when  the  panel  have  done  it  once  or  twice  in  the  
past.

This  method  produces  absolute  standards,  so  it  is  well  suited  to  tests  that  seek  to  
establish  competence.  

14
Another  method  that  can  be  used  for  standard  setting  is  the  Ebel method.  This  has  
been  in  use  since  1 986.  In  the  Ebel method,  again,  we  have  a  team  of  judges  who  
review  each item  in  the  test.  They  r ate  each  item  on  2  dimensions  – difficulty  and  
importance.

Each  member  of  a  panel  of  standard-­‐setters  completes  a  3 x3  grid,  allocating  every  
question  to  one  of  the  nine  boxes  in  the  grid.  

15
So  looking  at  this sample  MCQ  question,  an  examiner  might decide  to  r ate  this  
question  as  Important  and  of  medium  difficulty.  

16
Examiners  may  have  differing  opinions  about  how  to  categorize  any  given  question.

Then  the  question  should  be  discussed  by  the  panel,  including  any  r elevant  
information  about  how  the  topic  was  covered  in  teaching.

A  consensus  is  then  r eached  by  the  panel  for  each  question.

17
Next the  experts  agree  on  the  definition  of  a  minimally  competent  examinee. Then  
another  grid  is  filled  out,  this  time  estimating  the  percentage  of  questions  in  each  
category  that  a  borderline  /  minimally  competent  candidate  would  answer  correctly.  

18
So  in  this  table  I  have  transferred  the  9  boxes  on  the  last  grid  into  the  first  2  columns  
we  see  here.  

Next  we  go  back  to  the  test  and  count  how  many  items  were  judged to  be  in  each  of  
the  9  categories.  

So  in  this  example  7  questions  were  judged  by  the  expert  panel  to  be  Essential  and  
Easy,  8  were  judged  to  be  Important  and  Easy  and  so  on.  

The  percentage  in each  category  that  the  panel  believed  would  answer  questions  in  
that  category  correctly  is  multiplied  by  the  number  of  questions  that category
contains.    

The  passing  score  is  set  by  averaging  the  category  scores.  So  in  this  case  the  average  
category score  is  the  total  score  for  all  the  9  categories  divided  by  the  total  number  of  
questions,  which  is  6 0.  So  3 600/60  =  6 0  which  now  becomes  the  pass  mark  of  the  
test.

19
The  Hofstee Method  is  another  way  of  standard  setting.  It  is  described  as  a  
compromise  method,  using  a  combination  of  r elative  and  absolute  standards.  

The  examiners  estimate  4  values:


• The  minimum  acceptable   failure  r ate
• The  maximum  acceptable   failure  r ate
• The  minimum  pass  mark  (cutscore),  even  if  all  examinees  failed
• The  maximum  passmark (cutscore),  even  if  all  examinees  passed

Their  r esponses  serve  as  the  focus  for  discussion,  with  all  being  free  to  change  their  
estimates.  

These  minimum  and  maximum  failure  r ates  and  percent  correct  scores  are  averaged  
across  panelists  and  projected  onto  the  actual  score  distribution  to  derive  a  passing  
score  as  we  see  on  the  next  slide.  

20
This  is  a  worked  example  taken  from  Kamal  et  al  10 .  

The  data  r efers  to  a  Final  Med  MCQ  paper  with  8 0  questions  on  the  paper.  

As  we  see  on  the  Y  axis,  the  examiners  set  the  minimum  acceptable   fail  r ate  at  1 7%  
and  the  maximum  acceptable  fail  r ate  at  3 6%.  

Looking  at  the  x  axis  we  can  see  that  they  set  the  minimum  acceptable  pass  mark  at  
36/80  and  the  maximum  acceptable  pass  mark  at  4 8/80.  

Finally  we  see  the  curve  of  the  students’  actual  performance  on  the  test.  

Look  at  the  2  horizontal  lines  made  up  by  the  minimum  and  maximum  fail  r ates.  

Now  look  at  the  vertical  lines  made  up  by  the  minimum  and  maximum  acceptable  
pass  marks.  These  4  lines  intersect  in  a  r ectangle  as  shown  on  the  graph.  

A  diagonal  is  drawn  across  the  r ectangle.  The  point  where  that  diagonal  line  
intersects  with  the  curve  of  the  students’  actual  performance  becomes  the  pass  mark  

21
for  the  exam  – so  in  this  case  the  pass  mark  is  set  at  45/80.  

21
The  advantages  of  the  Hofstee method  are  that  it  is  easy  to  implement,  and  that  the  
questions  asked  of  the  examiners  are  less  abstract than  in  some  of  the  other  
methods.

However,  it  can  happen  the  the  pass  mark  defined  by  the  process  is  not  within  the  
bounds  of  the  actual  scores  on  the  exam  and  when  this  happens  the  standard  
becomes  the  maximum  or  minimum  acceptable  pass  mark  identified  by  the  
examiners.  

For  this  r eason  the  Hofstee method  is  less  suited  to  high  stakes  exams.

22
The Cohen  Method is  a  simple  and  fast way  of  standard  setting.  

Various  modifications  have  been  suggested  in  the  literature.  

The  basic  method  is  to  set  the  pass  mark  at  6 0%  of  the  highest  achievers’  score,  or  
60%  of  the  mean  of  the  top  3  highest  achievers’  scores,  or  at  6 0%  of  the  90 th or  9 5 th
centile.  

You  need  to  have  at  least  1 00  students  in  the  cohort  to  be  able  to  use  this  with  any  
degree  of  statistical  confidence.  

I  use  it  as  part  of  a  post  exam  evaluation  to  r eality  check  the  pass  mark  that  I  have  
arrived  at  by  doing  a  modified  A ngoff.  

23
Whichever  standard  setting  method  is  used,  we  must  follow  these  guidelines  3 :  

The  method  must:  


• Produce  standards  consistent  with  the  purpose  of  the  test
• Rely  on  informed  expert  judgement,  taking  into  account  careful  analysis  
and  judgement of  acceptable   performance
• Take  into  account  test  difficulty  and  student  criteria.
• Demonstrate  due  diligence
• Be  easy  to  explain  and  implement
• Be  supported  by  a  body  of  r esearch

24
25

You might also like