We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
6,400
Open access books available
173,000
International authors and editors
190M Downloads
154
Countries delivered to
TOP 1%
most cited scientists
12.2%
Contributors from top 500 universities
Abstract
1. Introduction
1
Medical Education for the 21st Century
consequences [9]. While for the formative evaluation where the target is students
learning, the item analysis has no much importance in giving feedback about items
construction to test composers.
In literature, many reasons were reported for the conduction of item analysis,
including examining if the item is functioning as intended, did it assess the required
concepts (content)?, did it discriminate between those who master the content
material and those who were not? was it within the acceptable level of difficulty?,
whether the distracters are functioning or not? [10, 11].
Many factors can affect item analysis and hence its interpretation [8]. Difficulty
and discrimination indices were constantly changing per administration and
influenced by the ability and number of the examinee, the number of items, and the
quality of instructions [8, 12].
Whatever the exam or test blueprinting (item selection) method, exam items
remain a sample of the needed content material. The number of items (item sam-
pling) carries excellent importance because one cannot ask about all contents. With
a too-small number of items, the results may not be enough to reflect true student
ability [8]. Technical item flaws are divided into two major types, test wiseness, and
irrelevant difficulty. Test wiseness flaws can result in more easy items. Faults related
to irrelevant difficulty can result in more challenging items unrelated to the content
under assessment. It was reported that item analysis of exam with 200 examinees is
stable, and with fewer than 100 examinees should be interpreted with caution (item
difficulty or item discrimination index). While Downing and Yudkowsky described
that even for a small number of the examinee (e.g., 30) still, the item analysis can
provide a piece of a helpful information to improve item [13, 14].
Cronbach’s alpha (KR20) is widely accepted and used estimate of test reliability (the
internal consistency) and reported to be superior to the split-half estimate [15, 16].
Although validity and reliability are closely associated, the reliability of an assessment
does not depend on its validity [16, 17]. Coefficient alpha is known to be equal to Kr-20
if the item has a single answer, such as in the case of type A MCQs or binary [18–21].
Coefficient alpha reflects the degree to which item response scores correlate with
total test scores [15]. It also describes the degree to which items in the exam measure
the same concept or construct [22]. Therefore, it is connected to the inter-related-
ness and dimensionality of the items within the exam [16, 20]. Cronbach’s alpha is
affected by exam time, the number and inter-relation of the items (dimensional-
ity) and easy or hard, poorly written or confusing items, Variations in examinee
2
Item Analysis: Concept and Application
DOI: http://dx.doi.org/10.5772/intechopen.100138
responses, curriculum content not reflected in the test, Testing conditions, and
Errors in recording or scoring [22–24]. The value of alpha is decreased in the exam
with fewer items and increased if items assessing the same concept (unidimension-
ality of the exam) [16]. Other factors were reported to impact alpha value, such
as item difficulty, number of the examinee, and student performance in the exam
time. It was argued that very high alpha values could indicate lengthy exams, paral-
lel items, or a narrow coverage of the content material [22].
The alpha value of the exam can be increased by increasing the number of items
with a high p-value (difficulty index). It was reported that items with moderate dif-
ficulty could maximize alpha value and while those with zero difficulties or 100 can
minimize it [15]. In the same way, deletion of faulty items can increase alpha value.
It should be considered that repetition of items in the same exam or using items
assessing the same concept can increase alpha value.
The interpretation of reliability is the correlation of the test with itself. When
the estimate of reliability increases, the portion of a test score related to the error
will decrease. Wise interpretation of alpha needs an understanding of the inter-
relatedness of items and whether the items measure a single latent trait or construct.
Exam or test with different content materials such as integrated courses, for
example, in the musculoskeletal system course, although is dominated by anatomy
it contains other subjects of basic medical and clinical sciences that have different
contains. Therefore, interpretation of such a course exam needs deep looks beyond
the alpha figure. It was reported that KR20 of 0.7 is acceptable to short test (less
than 50 items) and KR20 of 0.8 for an extended test (more than 50 item-test) [25].
Moreover, it was documented that a multidimensional exam does not have a lower
(Table 1) alpha value than a unidimensional one [30].
A low alpha value can be due to a smaller number of items, reduced interrelatedness
between items, or heterogeneous constructs [22]. A high value of alpha can suggest
exam reliability, and some items are non-functional as they are testing the same con-
tent but in a different guise or repeated ones [16, 22]. Also, a high value indicates items
with high interrelatedness, indicating a limited coverage of the content materials [22].
8. Distractor analysis
Commonly are formed of a stem with or without leading question and five or
four alternatives (type A MCQs). Among item’s alternatives, only one is the key
answer and others are called distractors [4]. Distractors should carry or convey a
miss concept about the key answer and appear plausible. The distractors should
appear similar to the key answer in terms of the used words, grammatical form,
3
Medical Education for the 21st Century
0.60–0.69 Moderate
<0.60 Minimal
Cicchetti [27] <0.70 Unacceptable
0.70–0.80 Fair
0.80–0.90 Good
0.50–0.60 Suggests need for revision of test (unless it is quite short, ten or fewer
Items).
−0.5-0.6 is poor
< 0.5 is unacceptable
Table 1.
Reference values and interpretation of Cronbach’s alpha (KR20).
style, and length [19]. Distractor efficiency (DE) is the ability of incorrect answers
to distract the students [12].
A functional distractor (FD) the distractor that is selected by 5% or more of the
examinee [4, 33]. At the same time, those chosen by less than 5% of the examinee are
considered non-functional (NFD) [4]. In comparison, other authors reported 1% of
the examinee as the demarcation of functional distractors [34, 35]. Commonly items
are categorized based on the numbers of NFDs in the item (Table 2) [12, 26, 36, 37].
The occurrence of NFD makes the item easier and reduces its discrimination
power, while FD distractors are making it more difficult [36, 38]. It was reported
that non-functional distractors are negatively correlating with reliability [38]. The
presence of non-functional distractors can be related to two main causes. First
is the training and construction ability of the item writer or composer. Second,
the miss-match between the target content and the possible number of a distrac-
tor created. Thus, training and more effort in item writing and construction
can decrease NFDs [36]. Other causes were related to NFDs, including the low
4
Item Analysis: Concept and Application
DOI: http://dx.doi.org/10.5772/intechopen.100138
3 0 Poor
2 33.3 Moderate
1 66.6 Good
0 100 Excellent
Table 2.
Classification of items according to thee number of the nonfunctional distractors (NFD).
5
Medical Education for the 21st Century
3 0.77 0.67
4 0.74 0.63
5 0.70 0.60
Table 3.
The ideal (optimal) difficulty level ( for tests with 100 items).
low value represents the difficulty of hard items. The ideal (optimal) difficulty
levels for type A MCQs is varying according to the number of the options (Table 3)
[49, 50]. The range of items difficulty can be categorized into difficult, moderate,
and easy. Easy and difficult items were reported to have very little discrimination
power [48]. Item difficulty is related to the item and the examinee that took the test
in the given time [24]. Thus, reusing of the item depending on its difficulty index
should be controlled. Some authors found that difficulty indices of items assessing
high cognitive levels in Bloom’s taxonomy such as evaluation, explanation, analysis,
and synthesis are lower than those assessing remembering, understanding, and
applying [51, 52].
During item or exam construction, the constructor should aim for acceptably
level of difficulty [6]. Sugianto reported that items within the exam could be
distributed according to difficulty to moderate level (40%), easy and challenging
levels (20%), and easier and more challenging levels (10%) [6]. Other authors
reported that most items should be of moderate difficulty or 5% should be in the
difficult range [50, 53]. Some authors found that difficulty indices of items assess-
ing high cognitive levels in Bloom’s taxonomy such as evaluation, explanation,
analysis, and synthesis are lower than those assessing remembering, understand-
ing, and applying [51, 52]. Regarding the general arrangement of test or examina-
tion, easy items start first then are followed by difficult ones. At the same time, in
the case of diagnostic assessment, the sequence of the learning material is more
important [6, 7].
Easy and difficult items affect the item’s ability to discriminate between
students and show low discrimination power. Some reports described a negative
correlation between exam reliability and difficult and easy items [38]. Oermann
et al. reported that educationalists must be careful in deleting items with poor
DIF because the number of items has more effect on test validity [54]. It is recom-
mended that difficult items should be reviewed for the possible technical and con-
tent causes [50]. Possible causes of low difficulty index include uncovered (taught)
content material, challenging items, missed key or no correct answer among the
item options [55]. Easy items (high P-value) can be due to technical causes, or the
concerned learning objective (s) were achieved or revisited in coverage that is more
superficial [55].
6
Item Analysis: Concept and Application
DOI: http://dx.doi.org/10.5772/intechopen.100138
30–80% Moderate
<30% Difficult
40–80 Moderate
<39 Difficult
50 Moderate
10 Difficult
50–60% Excellent/ideal
30–70% Good/acceptable/average
30–70% Good
<30% Difficult
Table 4.
Reference values and interpretation of difficulty index (p-value).
7
Medical Education for the 21st Century
item flaws or inefficient distractors, miss keys, ambiguous wording, gray areas of
opinion, and areas of controversy [12, 62]. Nevid and McClelland [52] reported
that items assessing evaluation and explanation domains could discriminate
between high and low performers, while Kim et al. [51] comments that items
assessing remembering and understanding levels have low discrimination
power [52, 54].
It was reported that discrimination indices are positively associated with diffi-
culty index and distractor efficiency [39, 63]. The discrimination power of the item
is reduced by the increased number of non-functional distractors [36].
A test with poor discriminating power will not provide a reliable interpreta-
tion of the examinee’s actual ability [6, 64]. In addition, discrimination power will
not indicate item validity, and deletion of items with poor discrimination power
negatively impacts validity due to a decrease in the item number [65].
Discrimination power of items more than 0.15 was reported as evidence of item
validity [50, 53]. While any item with less than 0.15 or negative should be reviewed
[50] (Table 5).
When interpreting the discrimination power of an item to decide about, especial
consideration should be related to its difficulty. Items with a high difficulty index
(most of the examinee answer it right) and those with low difficulty index (most of
the examinee answer it wrong) commonly have low discrimination power
[35, 63]. In both cases, such items will not discriminate examines as the majority
are on one side. Thus items with a moderate difficulty index are more likely to have
good discrimination power.
The common causes of poor discrimination power of item include technical or
writing flaws, untaught or not well covered content material, ambiguous wording,
gray areas of opinion and controversy, and wrong keys [12, 50, 62, 66].
In general, the statistical data obtained from item analysis can help item con-
structors and exam composers to detect defective items. The decision to revise an
item or distractors must be based on the difficulty index, discrimination index,
and distractor efficiency. Revision of items can lead to modification in the teaching
method or the content material [68].
8
Item Analysis: Concept and Application
DOI: http://dx.doi.org/10.5772/intechopen.100138
0.21–0.24 Acceptable
≤ 0.20 Poor
Obon and Rey [12] ≥ 0.50 Very Good Item (Definitely Retain)
0.2–0.35 Good
00 Defective
0.25–0.34 Good
0.21–0.24 Acceptable
Table 5.
Reference values and interpretation of discrimination index (power).
Figure 1:
In this Example 1.
The number of examinees was 21.
The number of test items (Total possible) is 40.
9
Medical Education for the 21st Century
Figure 1.
Standard item analysis of mid-course examination. The total number of items is 40, and the total number of
the examinee is 21. The KR20 is 0.82. Pt.Biserial: Point biserial correlation, Disc Index: discrimination index,
Correct: number and percentage of the correct answer (difficulty index), Pct. Incorrect: percentage of an
incorrect answer.
• Item 1: the difficulty index is 85.7% (easy). Although it has high discrimination
power (DE = 0.6, Pbiserial = 0.58), two distractors are non-functional (B, C).
• Item 2: the difficulty index is 100% (easy). It has low discrimination power
(DE = 00, Pbiserial = 00), all distractors are non-functional.
Comment: the item needs major revision or rewriting. This item is absolutely
easy with no difficulty or discrimination index. Such items should be removed
from the question bank and removal from the exam is considered valid.
Comment: The item has acceptable indices. Such items can be saved in the
question bank for further use. The distractors need to be updated to have more
efficiency.
• Item 7: the difficulty index is 28.6% (difficult). Although it has high dis-
crimination power (DE = 0.67, Pbiserial = 0.39), all the distractors are
functional.
10
Item Analysis: Concept and Application
DOI: http://dx.doi.org/10.5772/intechopen.100138
Comment: The item has acceptable indices. Such items can be saved in the question
bank for further use. The distractors need to be updated to have more efficiency.
• Item 8: the difficulty index is 76.2% (moderate). This item has a negative discrim-
ination index (−0.33) and poor Pbiserial (0.04). Only one distractor is functional
(C). The negative discrimination index is caused by the increased number of
students in the lower account (27%) than those in the upper account (27%).
Comment: although the item has a moderate difficulty index, but is poorly
discriminating. Such an item needs major revision.
Figure 2.
Standard item analysis of Mid-course examination. The total number of items is 40, and the total number
of examinee is 25. The KR20 is 0.74. Pt.Biserial: Point biserial correlation, Disc Index: discrimination index,
Correct: number and percentage of the correct answer (difficulty index), Pct. Incorrect: percentage of an
incorrect answer.
Figure 2:
In this Example 2.
The number of examinees was 25.
The number of test items is 40.
The highest and lowest scores were 33 and 13 respectively.
The class average (mean) (24.6) is more than the class median (25), distribution
of examinee scores is skewed to the left. Despite this, examinee scores may show
normal ball shape distribution.
The KR20 (Cronbach’s alpha) is 0.74 which is an acceptable value for most of the
authors. Such a value of internal consistency is suitable for class tests.
Comment: the correct answer is (A) while most of the examinees chose (B).
According to distractor analysis, this item is miss-keyed rather than an implau-
sible distractor.
• Item 9: the difficulty index is 20% (difficult). It has low discrimination power
(DE = 0.17, Pbiserial = 0.09), all distractors are functional.
Comment: distractor analysis show option number (A) and (B) are more
selected by examinees. This can be due to implausible. The presence of implau-
sible can affect the item difficulty index. Distractors in this item should be
revised or changed with plausible ones.
11
Medical Education for the 21st Century
• Item 11: the difficulty index is 44.0% (moderate). It has low discrimination power
(DE = 0.0, Pbiserial = 0.01) and only one the distractors is non-functional.
Comment: The item has an acceptable difficulty index. Distractor (D) is more
selected by upper examinee such as the key answer. Such a situation can favor
missed key or implausible distractors. The distractors need to be updated to
have more efficiency.
Author details
© 2022 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms
of the Creative Commons Attribution License (http://creativecommons.org/licenses/
by/3.0), which permits unrestricted use, distribution, and reproduction in any medium,
provided the original work is properly cited.
12
Item Analysis: Concept and Application
DOI: http://dx.doi.org/10.5772/intechopen.100138
References
[1] Stassen, M.L., K. Doherty, and M. individuals: When is short too short? 2012,
Poe, Program-based review and Netherlands Ridderprint BV,
assessment: Tools and techniques for Ridderkerk.
program improvement. 2004: Office of
Academic Planning & Assessment, [10] Akinboboye, J.T. and M.A.
University of Massachusetts Amherst. Ayanwale, Bloom taxonomy usage and
psychometric analysis of classroom teacher
[2] Tavakol, M. and R. Dennick, Post- made test. AFRICAN
examination analysis of objective tests. MULTIDISCIPLINARY JOURNAL OF
Med Teach, 2011. 33(6): p. 447-58. DEVELOPMENT, 2021. 10(1): p. 10-21.
[3] Benson, J., A comparison of three types [11] Brookhart, S.M. and A.J. Nitko,
of item analysis in test development using Education assessment of students. New
classical and latent trait methods, in jearsey: Merrill prentice hall. 2018, New
GRADUATE COUNCIL OF THE Jersey: Pearson; 8th edition.
UNIVERSITY OF FLORIDA. 1978,
UNIVERSITY OF FLORIDA: FLORIDA, [12] Obon, A.M. and K.A.M. Rey,
USA. p. 134. Analysis of multiple-choice questions
(mcqs): Item and test statistics from the
[4] Sharma, L.R., Analysis of difficulty 2nd year nursing qualifying exam in a
index, discrimination index and distractor university in cavite, philippines. Abstract
efficiency of multiple choice questions of Proceedings International Scholars
speech sounds of english. International Conference, 2019. 7(1): p. 499-511.
Research Journal of MMC, 2021. 2(1):
p. 15-28. [13] Downing, S. and R. Yudkowsky,
Assessment in health professions education.
[5] Thompson, B. and J.E. Levitov, Using 2009, New York and London: Routledge
microcomputers to score and evaluate and Taylor & Francis
items. Collegiate Microcomputer
archive, 1985. 3. [14] Silao, C.V.O. and R.G. Luciano,
Development of an automated test item
[6] Sugianto, A., Item analysis of english analysis system with optical mark
summative test: Efl teacher-made test. recognition (omr) International Journal of
Indonesian EFL Research, 2020. 1(1): Electrical Engineering and Technology
p. 35-54. (IJEET), 2021. 12(1): p. 67-79.
[7] Kumar, H. and S.K. Rout, Major tools [15] Reinhardt, B.M., Factors affecting
and techniques in educational evaluation, coefficient alpha: A mini monte carlo
in Measurement and evaluation in study, in The Annual Meeting of the
education. 2016, Vikas Publishing House Southwest Educational Research
Pvt. Ltd.: India. p. 256. Association ( January 26, 1991). 1991,
University of Texas: San Antonio, Texas,
[8] Case, S.M. and D.B. Swanson, USA. p. 31.
Constructing written test questions for the
basic and clinical sciences. 3 ed. 1998, [16] Tavakol, M. and R. Dennick, Making
United States of America: National sense of cronbach’s alpha. International
Board of Medical Examiners journal of medical education, 2011. 2:
Philadelphia, PA. 129. p. 53-55.
[9] Kruyen, P.M., Using short tests and [17] Graham, J.M., Congeneric and
questionnaires for making decisions about (essentially) tau-equivalent estimates of
13
Medical Education for the 21st Century
score reliability: What they are and how to normed and standardized assessment
use them. Educational and Psychological instruments in psychology.
Measurement, 2006. 66(6): p. 930-944. Psychological assessment, 1994.
6(4): p. 284.
[18] Rezigalla, A.A., A.M.E. Eleragi, and
M. Ishag, Comparison between students’ [28] Axelson, R.D. and C.D. Kreiter,
perception toward an examination and Reliability, in Assessment in health
item analysis, reliability and validity of professions education, R. Yudkowsky,
the examination. Sudan Journal of Y.S. Park, and S.M. Downing, Editors.
Medical Sciences, 2020. 15(2): p. 2019, Routledge: London.
114-123.
[29] Hassan, S. and R. Hod, Use of item
[19] Considine, J., M. Botti, and S. analysis to improve the quality of single
Thomas, Design, format, validity and best answer multiple choice question in
reliability of multiple choice questions for summative assessment of undergraduate
use in nursing research and education. medical students in malaysia. Education
Collegian, 2005. 12(1): p. 19-24. in Medicine Journal, 2017. 9(3): p. 33-43.
[20] Cortina, J.M., What is coefficient [30] Green, S.B. and M.S. Thompson,
alpha? An examination of theory and Structural equation modeling in clinical
applications. Journal of applied psychology research, in Handbook of
psychology, 1993. 78(1): p. 98. research methods in clinical psychology,
M.C. Roberts and S.S. Ilardi, Editors.
[21] McNeish, D., Thanks coefficient 2008, Wiley-Blackwell. p. 138-175.
alpha, we’ll take it from here. Psychol
Methods, 2018. 23(3): p. 412-433. [31] Mahjabeen, W., et al., Difficulty
index, discrimination index and distractor
[22] Panayides, P., Coefficient alpha: efficiency in multiple choice questions.
Interpret with caution. Europe’s Journal Annals of PIMS-Shaheed Zulfiqar Ali
of Psychology, 2013. 9(4): p. 687-696. Bhutto Medical University, 2017. 13(4):
p. 310-315.
[23] Al-Osail, A.M., et al., Is cronbach’s
alpha sufficient for assessing the reliability [32] Mehta, G. and V. Mokhasi, Item
of the osce for an internal medicine course? analysis of multiple choice questions-an
BMC research notes, 2015. 8(1): p. 1-6. assessment of the assessment tool. Int J
Health Sci Res, 2014. 4(7): p. 197-202.
[24] McCowan, R.J. and S.C. McCowan,
Item analysis for criterion-referenced tests. [33] Tarrant, M., J. Ware, and A.M.
1999, New York: Center for Mohammed, An assessment of functioning
Development of Human Services. and non-functioning distractors in
multiple-choice questions: A descriptive
[25] Salkind, N.J., Encyclopedia of analysis. BMC Medical Education, 2009.
research design. Vol. 1. 2010: sage. 9(1): p. 40.
[26] Robinson, J.P., P.R. Shaver, and L.S. [34] Puthiaparampil, T. and M. Rahman,
Wrightsman, Scale selection and How important is distractor efficiency for
evaluation, in Measures of political grading best answer questions? BMC
attitudes, J.P. Robinson, P.R. Shaver, and medical education, 2021. 21(1): p. 1-6.
L.S. Wrightsman, Editors. 1999,
Academic Press: USA. p. 509. [35] Gajjar, S., et al., Item and test analysis
to identify quality multiple choice
[27] Cicchetti, D.V., Guidelines, criteria, questions (mcqs) from an assessment of
and rules of thumb for evaluating medical students of ahmedabad, gujarat.
14
Item Analysis: Concept and Application
DOI: http://dx.doi.org/10.5772/intechopen.100138
[40] Sajjad, M., S. Iltaf, and R.A. Khan, [48] Hingorjo, M.R. and F. Jaleel,
Nonfunctional distractor analysis: An Analysis of one-best mcqs: The difficulty
indicator for quality of multiple choice index, discrimination index and distractor
questions. Pak J Med Sci, 2020. 36(5): p. efficiency. J Pak Med Assoc, 2012. 62(2):
982-986. p. 142-7.
[41] Haladyna, T.M., S.M. Downing, and [49] Lord, F.M., The relation of the
M.C. Rodriguez, A review of multiple- reliability of multiple-choice tests to the
choice item-writing guidelines for classroom distribution of item difficulties.
assessment. Applied measurement in Psychometrika, 1952. 17(2):
education, 2002. 15(3): p. 309-333. p. 181-194.
[42] Swanson, D.B., K.Z. Holtzman, and [50] Uddin, I., et al., Item analysis of
K. Allbee, Measurement characteristics of multiple choice questions in pharmacology.
content-parallel single-best-answer and J Saidu Med Coll Swat, 2020. 10(2): p.
extended-matching questions in relation to 128-131.
number and source of options. Academic
Medicine, 2008. 83(10): p. S21-S24. [51] Kim, M.-K., et al., Incorporation of
bloom’s taxonomy into multiple-choice
[43] Frary, R.B., More multiple-choice examination questions for a
item writing do’s and don’ts. Practical pharmacotherapeutics course. American
Assessment, Research, Evaluation, journal of pharmaceutical education,
1994. 4(1): p. 11. 2012. 76(6).
15
Medical Education for the 21st Century
[52] Nevid, J.S. and N. McClelland, Using [61] Matlock-Hetzel, S., Basic concepts in
action verbs as learning outcomes: item and test analysis. 1997.
Applying bloom’s taxonomy in measuring
instructional objectives in introductory [62] Sim, S.-M. and R.I. Rasiah,
psychology. Journal of Education and Relationship between item difficulty and
Training Studies, 2013. 1(2): p. 19-24. discrimination indices in true/false-type
multiple choice questions of a para-clinical
[53] Elfaki, O., K. Bahamdan, and S. multidisciplinary paper. Annals-Academy
Al-Humayed, Evaluating the quality of of Medicine Singapore, 2006.
multiple-choice questions used for final 35(2): p. 67.
exams at the department of internal
medicine, college of medicine, king khalid [63] Ramzan, M., et al., Item analysis of
university. Sudan Med Monit, 2015. 10: multiple-choice questions at the
p. 123-27. department of community medicine, wah
medical college, pakistan. Life and
[54] Oermann, M.H. and K.B. Gaberson, Science, 2020. 1(2): p. 4-4.
Evaluation and testing in nursing
education. 6 ed. 2019, New York: [64] Setiyana, R., Analysis of summative
Springer Publishing Company. tests for english. English Education
Journal, 2016. 7(4): p. 433-447.
[55] Bukvova, H., K. Figl, and G.
Neumann, Improving the quality of [65] Oermann, M.H. and K.B. Gaberson,
multiple-choice exams by providing Evaluation and testing in nursing
feedback from item analysis. education. 2016: Springer Publishing
Company.
[56] Kaur M, Singla S, Mahajan R. Item
analysis of in use multiple choice [66] Aljehani, D.K., et al., Relationship of
questions in pharmacology. International text length of multiple-choice questions on
Journal of Applied and Basic Medical item psychometric properties–a
Research. 2016;6(3): 170-173. retrospective study. Saudi Journal for
Health Sciences, 2020. 9(2): p. 84.
[57] Bhat SK, Prasad KHL. Item analysis
and optimizing multiple-choice [67] Henrysson, S., Gathering, analyzing,
questions for a viable question bank in and using data on test items, in
ophthalmology: A cross-sectional study. Educational measurement, R.L.
Indian J Ophthalmol. 2021;69(2):343-6. Thorndike, Editor. 1971, American
Council on Education:
[58] Rogausch, A., R. Hofer, and R. Washington DC. p. 141.
Krebs, Rarely selected distractors in high
stakes medical multiple-choice [68] Maulina, N. and R. Novirianthy,
examinations and their recognition by Item analysis and peer-review evaluation
item authors: A simulation and survey. of specific health problems and applied
BMC Medical Education, 2010. research block examination. Jurnal
10(1): p. 1-9. Pendidikan Kedokteran Indonesia:
The Indonesian Journal of Medical
[59] Wood, D.A. and D.C. Adkins, Test Education, 2020. 9(2): p. 131-137.
construction: Development and
interpretation of achievement tests. 1960:
CE Merrill Books.
16