Example Test Item Analysis
Example Test Item Analysis
Based on table 3, the highest score obtained is 28 which represent by two students that
are Uzayr Rayyan Ramadhan and Aiysar Nur Izdiyad. The students who got intermediate
score range between 13 to 19 points represent by nine students and the pupils who got
the lowest score is Affiq Aisy Iskandar scored only four points out of 30.
7
Total Score
12
10
Number of Students
8
0
0 to 5 6 to 10 11 to 15 16 to 20 21 to 25 26 to 30
Marks
Figure 1 shows that Year 4 students of Sekolah Kebangsaan Kuala Kubu Bharu are
mixed abilities students because they are of different level of proficiency. Based on table
3 and figure 1, a frequency histogram was built and the values for mean, median, mode
and standard deviation were calculated to know the students’ total scores. Figure 2 shows
the frequency histogram with class interval of 5 while the values for mean, median, mode
and standard deviation is shown in table 4 below.
MEAN 19.6
MODE 26
MEDIAN 21.5
STANDARD DEVIATION 7.3
Table 4: The Values for Mean, Mode, Median and Standard Deviation of the Students
Total Scores
8
Score
0
2
4
6
8
10
12
Uzayr Rayyan Ramadhan
Aiysar Nur Izdiyad
Qamarul Isyraq
Aimy Nur Husna Humairah
Nur Sumayyah Maisarah
Mohamad Najib
Said Lutfil Daiyan
Farah Syahzarina
Nur Iffah Musfirah
Zayyan
Dhanalakshmii
Student's Name
Muhammad Alif Asyraff
9
Affiq Aisy Iskandar
will reveal it. In item analysis, there are two most common statistics used to determine the quality
of an item that are the item difficulty and item discrimination. Difficulty index is a measure of the
proportion of examinees who responded to an item correctly where discrimination index is a
measure of how well the item discriminates between examinees who are knowledgeable in the
content area and those who are not.
14
6.1 Difficulty Index
According to Understanding Item Analyses (n.d), difficulty index is simple the percentage
of correct answers responded by students. In this case, it is also the item mean. It ranges
between 0.0 and 1.0; the higher the value the easier the question. To determine the best
item, the difficulty index is 0.5. It is called p-value.
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Q13
Q1
Q2
Q3
Q4
Q5
Q6
Q7
Q8
Q9
Q10
Q11
Q12
Q14
Q15
Q16
Q17
Q18
Q19
Q20
Q21
Q22
Q23
Q24
Q25
Q26
Q27
Q28
Q29
Q30
Figure 9
15
On the other hand, Question 14 (0.09) is categorised as “very difficult” quality since they
have difficulty value below 0.20 and should be discarded. While Question 4 (0.88),
Question 6 (0.81), Question 8 (0.88), Question 9 (0.94), Question 27 (0.84), Question 29
(0.84) and Question 30 (0.81) are categorised as “very easy” quality since they have the
difficulty value above 0.80 and should be discarded or carefully reviewed. Therefore, I
can conclude that most of the upper and lower pupils were unable to answer these
questions.
1
1 0.89 0.89
0.78 0.78 0.78 0.78 0.78
0.8
0.67 0.67 0.67 0.78
0.56 0.56 0.56 0.56
0.6
0.44 0.44 0.44 0.44
0.56
0.4 0.44 0.44
0.44 0.44 0.44
0.33 0.33
0.2 0.11
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0
Q13
Q1
Q2
Q3
Q4
Q5
Q6
Q7
Q8
Q9
Q10
Q11
Q12
Q14
Q15
Q16
Q17
Q18
Q19
Q20
Q21
Q22
Q23
Q24
Q25
Q26
Q27
Q28
Q29
Q30
Figure 10
16
Discrimination
Quality Recommendation
Value
>0.4 excellent retain
0.30 - 0.39 good can be improve
0.20 - 0.29 fair need to review
<0.20 poor discard/modify
<0 very poor discard/modify
Table 10
Based on the data from figure 10 and table 10, it can be seen that most of the questions
have an excellent discrimination index where their values are above 0.4. These questions
should be retained with a slight modification for some questions. While Question 14 and
19 with the value of 0.33 each are considered ‘good’ quality of questions where their
value are in the range between 0.30 and 0.39. These questions should be revised as
Question 14, although, it has a good discrimination value, its difficulty index is too low.
However, Question 9 and 28 are considered ‘poor’ quality of questions where their values
are 0.11 and 0.0 respectively. These questions also should be revised whether to discard
them or not.
Table 11 show the degree of reliability of a test. A test is said to be reliable when the
value is above 0.5. It is the consistency of the measurement of a test it the test was to be
tested again in the same condition. A student scores similar to the first test. For this test,
the reliability value was measured and is shown in table 12 below along with the variance,
PQ value, average difficulty index and discrimination index. It has the value of 0.93 and
according to table 11, the test is almost perfect.
17
AVERAGE DIFF. 0.65
AVERAGE DISC. 0.57
PQ 5.54
VARIANCE 53.02
RELIABIITY KR20 0.93
SEM 1.98
Table 12
Distractor refers to the wrong responses in multiple choice questions. For example, the
correct answer is A, so the remaining choices; B, C, and D are the distractors of the
questions. (Boon, Lee, & Aeria, 2017) It can be a good, bad or non-functional distractor
based on the number of responses from upper and lower group of students.
Item 15 *A B C D
Total Students 11 6 3 12
High Proficiency 6 0 0 5
Low Proficiency 1 5 2 3
Others 4 1 1 4
Table 13: Distractor Analysis for Item 15
As example, for item 15, it can be seen that the correct answer is A. Distractors B and C
are a good distractors because more low proficiency students choose those as answers.
However, distractor D is not a good distractor as more high proficiency students choose
this as response to the question than low proficiency students.
7.0 REFLECTION
From my findings, I have found that there are a few factors that contribute to a good
assessment and need to be put into consideration before constructing test paper and
administrating the test to ensure the validity and reliability of the data collected. First and
foremost, I should check students’ level of proficiency before constructing the test so that the
data distribution would be normal distribution. Secondly, while administrating the test, I should
give a fixed time for students to answer the test so that their responses would be authentic
because there are a few students that made a few attempts to perfect their responses, however,
I only took the first attempt.
18