0% found this document useful (0 votes)
56 views

Example Test Item Analysis

The document analyzes test results from a group of students. It provides the total scores of each student and identifies the highest and lowest scores. Charts are included to visualize the distribution of scores and key statistics like the mean, median, mode and standard deviation are calculated. Results are also analyzed for a specific section of the test, including difficulty and discrimination indexes for each question.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views

Example Test Item Analysis

The document analyzes test results from a group of students. It provides the total scores of each student and identifies the highest and lowest scores. Charts are included to visualize the distribution of scores and key statistics like the mean, median, mode and standard deviation are calculated. Results are also analyzed for a specific section of the test, including difficulty and discrimination indexes for each question.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

5.

0 ANALYSIS OF TEST RESULTS

5.1 Total Score


No Name Score No Name Score
. .
1 Uzayr Rayyan Ramadhan 28 / 30 17 Syafiatul Asmaq 21 / 30
2 Aiysar Nur Izdiyad 28 / 30 18 Nur Fairuz 19 / 30
3 Qamarul Isyraq 27 / 30 19 Muhammad Alif Asyraff 18 / 30
4 Aimy Nur Husna Humairah 27 / 30 20 Muhammad Aqil 18 / 30
5 Nur Sumayyah Maisarah 27 / 30 21 Nur Khaireen Eishal 18 / 30
6 Mohamad Najib 27 / 30 22 Muhammad Aiman Haiqal 17 / 30
7 Said Lutfil Daiyan 26 / 30 23 Nurul Iman Hannani 16 / 30
8 Farah Syahzarina 26 / 30 24 Muhammad Hasif 16 / 30
9 Nur Iffah Musfirah 26 / 30 25 Sham Shatul Bharizon 15 / 30
10 Zayyan 26 / 30 26 Nur Khairuna Irdyna Syifa 13 / 30
11 Dhanalakshmii 26 / 30 27 Luqman Hakim 9 / 30
12 Saidatul Amani Haninah 25 / 30 28 Syafia Qaleesya 9 / 30
13 Mohamad Akiff Zaqwan 24 / 30 29 Muhammad Zulfikri 8 / 30
14 Ainur Syifa Syuhada 23 / 30 30 Siti Balqus 8 / 30
15 Fasihah Nor Rania 22 / 30 31 Mohamad Aidil Naufal 7 / 30
16 Muhammad Adam Firdaus 22 / 30 32 Affiq Aisy Iskandar 4 / 30
Table 3: Total Score

Based on table 3, the highest score obtained is 28 which represent by two students that
are Uzayr Rayyan Ramadhan and Aiysar Nur Izdiyad. The students who got intermediate
score range between 13 to 19 points represent by nine students and the pupils who got
the lowest score is Affiq Aisy Iskandar scored only four points out of 30.

7
Total Score
12

10

Number of Students
8

0
0 to 5 6 to 10 11 to 15 16 to 20 21 to 25 26 to 30
Marks

Figure 1: Total Scores

Figure 1 shows that Year 4 students of Sekolah Kebangsaan Kuala Kubu Bharu are
mixed abilities students because they are of different level of proficiency. Based on table
3 and figure 1, a frequency histogram was built and the values for mean, median, mode
and standard deviation were calculated to know the students’ total scores. Figure 2 shows
the frequency histogram with class interval of 5 while the values for mean, median, mode
and standard deviation is shown in table 4 below.

MEAN 19.6
MODE 26
MEDIAN 21.5
STANDARD DEVIATION 7.3
Table 4: The Values for Mean, Mode, Median and Standard Deviation of the Students
Total Scores

8
Score

0
2
4
6
8
10
12
Uzayr Rayyan Ramadhan
Aiysar Nur Izdiyad
Qamarul Isyraq
Aimy Nur Husna Humairah
Nur Sumayyah Maisarah
Mohamad Najib
Said Lutfil Daiyan
Farah Syahzarina
Nur Iffah Musfirah
Zayyan
Dhanalakshmii

5.2 Analysis of Section A Results


Saidatul Amani Haninah
Mohamad Akiff Zaqwan
Ainur Syifa Syuhada

good and some are too bad in their performance.


Fasihah Nor Rania
Muhammad Adam Firdaus
Syafiatul Asmaq
Nur Fairuz

Section A's Scores

Student's Name
Muhammad Alif Asyraff

Figure 3: Section A’s Scores


Muhammad Aqil
Nur Khaireen Eishal
Figure 2: Histogram of frequency

Muhammad Aiman Haiqal


Nurul Iman Hnannani
Muhammad Hasif
Sham Shatul Bharizon
Nur Khairuna Irdyna Syifa
Luqman Hakim
Syafia Qaleesya
Muhammad Zulfikri
Siti Balqus
Mohamad Aidil Naufal
Based on this histogram of frequency, it can be said that it shows a random distribution where
there are too many peaks to show that there are many proficiency shown by students. Some are too

9
Affiq Aisy Iskandar
will reveal it. In item analysis, there are two most common statistics used to determine the quality
of an item that are the item difficulty and item discrimination. Difficulty index is a measure of the
proportion of examinees who responded to an item correctly where discrimination index is a
measure of how well the item discriminates between examinees who are knowledgeable in the
content area and those who are not.

No of Difficulty Discrimination Final


Justification Justification
Questions Index Index Justification
SECTION A
1 0.75 best 0.78 excellent retain
2 0.41 good 0.44 excellent retain
3 0.78 best 0.67 excellent retain
4 0.88 too easy 0.44 excellent modify
5 0.78 best 0.67 excellent retain
6 0.81 too easy 0.56 excellent modify
7 0.75 best 0.44 excellent retain
8 0.88 too easy 0.44 excellent modify
9 0.94 too easy 0.11 poor discard
10 0.75 best 0.44 excellent retain
SECTION B
11 0.70 best 0.44 excellent retain
12 0.50 best 0.78 excellent retain
13 0.50 best 0.78 excellent retain
14 0.09 too difficult 0.33 good discard
15 0.34 good 0.44 excellent retain
16 0.78 best 0.44 excellent retain
17 0.34 good 0.67 excellent retain
18 0.70 best 0.78 excellent retain
19 0.30 good 0.33 good retain
20 0.59 best 0.89 excellent retain
SECTION C
21 0.70 best 0.89 excellent retain
22 0.80 best 0.78 excellent retain
23 0.63 best 1.00 excellent retain
24 0.47 good 0.56 excellent retain
25 0.69 best 0.56 excellent retain
26 0.40 good 0.78 excellent retain
27 0.84 too easy 0.56 excellent modify
28 0.80 best 0.00 poor discard
29 0.84 too easy 0.56 excellent modify
30 0.81 too easy 0.44 excellent modify
Table 8: Difficulty Index and Discrimination Index

14
6.1 Difficulty Index

According to Understanding Item Analyses (n.d), difficulty index is simple the percentage
of correct answers responded by students. In this case, it is also the item mean. It ranges
between 0.0 and 1.0; the higher the value the easier the question. To determine the best
item, the difficulty index is 0.5. It is called p-value.

Difficulty Index for Question 1 - 30


1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
Q13
Q1
Q2
Q3
Q4
Q5
Q6
Q7
Q8
Q9
Q10
Q11
Q12

Q14
Q15
Q16
Q17
Q18
Q19
Q20
Q21
Q22
Q23
Q24
Q25
Q26
Q27
Q28
Q29
Q30
Figure 9

Difficulty Value Quality Recommendation


< 0.2 too difficult discard/modify
0.2 - 0.5 good
retain
0.5 - 0.8 best
> 0.8 too easy discard/modify
Table 9
Based on the data in figure 9 and table 9, Question 1 (0.75), Question 3 (0.78),
Question 5 (0.78), Question 7 (0.75), Question 10 (0.75), Question 11 (0.7), Question 12
(0.5), Question 13 (0.5), Question 16 (0.78), Question 18 (0.7), Question 20 (0.59),
Question 21 (0.7), Question 22 (0.8), Question 23 (0.63), Question 25 (0.69) and
Question 28 (0.8), are categorised as “best” quality since they have the difficulty index
between 0.50 and 0.80 and should definitely be retained. While, Question 2 (0.41),
Question 15 (0.34), Question 17 (0.34), Question 19 (0.3), Question 24 (0.47) and
Question 26 (0.4) are categorised as “good” quality since they have difficulty value
between 0.20 and 0.50 and also should be retained. Hence, I can conclude that most of
the upper and lower pupils were able to answer these questions.

15
On the other hand, Question 14 (0.09) is categorised as “very difficult” quality since they
have difficulty value below 0.20 and should be discarded. While Question 4 (0.88),
Question 6 (0.81), Question 8 (0.88), Question 9 (0.94), Question 27 (0.84), Question 29
(0.84) and Question 30 (0.81) are categorised as “very easy” quality since they have the
difficulty value above 0.80 and should be discarded or carefully reviewed. Therefore, I
can conclude that most of the upper and lower pupils were unable to answer these
questions.

6.2 Discrimination Index

According to Understanding Item Analyses (n.d), discrimination index can be referred as


the ability of an item to differentiate among students on the basis of how well they know
the material being tested. The likely range of the discrimination index is -1.0 to 1.0;
nevertheless, if an item has a discrimination value below 0.0, it suggests a problem. When
an item is negatively discriminated, generally, the most knowledgeable examinees are
getting the item wrong and the least knowledgeable examinees are getting the item right.
A negative discrimination index may point out that the item is measuring something else,
not what the rest of the test is measuring.

Discrimination Index for Question 1 - 30


1.2

1
1 0.89 0.89
0.78 0.78 0.78 0.78 0.78
0.8
0.67 0.67 0.67 0.78
0.56 0.56 0.56 0.56
0.6
0.44 0.44 0.44 0.44
0.56
0.4 0.44 0.44
0.44 0.44 0.44
0.33 0.33
0.2 0.11
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0
Q13
Q1
Q2
Q3
Q4
Q5
Q6
Q7
Q8
Q9
Q10
Q11
Q12

Q14
Q15
Q16
Q17
Q18
Q19
Q20
Q21
Q22
Q23
Q24
Q25
Q26
Q27
Q28
Q29
Q30

Figure 10

16
Discrimination
Quality Recommendation
Value
>0.4 excellent retain
0.30 - 0.39 good can be improve
0.20 - 0.29 fair need to review
<0.20 poor discard/modify
<0 very poor discard/modify
Table 10

Based on the data from figure 10 and table 10, it can be seen that most of the questions
have an excellent discrimination index where their values are above 0.4. These questions
should be retained with a slight modification for some questions. While Question 14 and
19 with the value of 0.33 each are considered ‘good’ quality of questions where their
value are in the range between 0.30 and 0.39. These questions should be revised as
Question 14, although, it has a good discrimination value, its difficulty index is too low.
However, Question 9 and 28 are considered ‘poor’ quality of questions where their values
are 0.11 and 0.0 respectively. These questions also should be revised whether to discard
them or not.

6.3 Reliability Coefficient

Level of Agreement Reliability Value


Perfect >1.00
Almost perfect 0.81 – 1.00
Substantial 0.61 – 0.80
Moderate 0.41 – 0.60
Fair 0.21 – 0.40
Slight 0.00 – 0.20
Poor <0.00
Table 11

Table 11 show the degree of reliability of a test. A test is said to be reliable when the
value is above 0.5. It is the consistency of the measurement of a test it the test was to be
tested again in the same condition. A student scores similar to the first test. For this test,
the reliability value was measured and is shown in table 12 below along with the variance,
PQ value, average difficulty index and discrimination index. It has the value of 0.93 and
according to table 11, the test is almost perfect.

17
AVERAGE DIFF. 0.65
AVERAGE DISC. 0.57
PQ 5.54
VARIANCE 53.02
RELIABIITY KR20 0.93
SEM 1.98

Table 12

6.4 Distractor Analysis

Distractor refers to the wrong responses in multiple choice questions. For example, the
correct answer is A, so the remaining choices; B, C, and D are the distractors of the
questions. (Boon, Lee, & Aeria, 2017) It can be a good, bad or non-functional distractor
based on the number of responses from upper and lower group of students.

Item 15 *A B C D
Total Students 11 6 3 12
High Proficiency 6 0 0 5
Low Proficiency 1 5 2 3
Others 4 1 1 4
Table 13: Distractor Analysis for Item 15
As example, for item 15, it can be seen that the correct answer is A. Distractors B and C
are a good distractors because more low proficiency students choose those as answers.
However, distractor D is not a good distractor as more high proficiency students choose
this as response to the question than low proficiency students.

7.0 REFLECTION

From my findings, I have found that there are a few factors that contribute to a good
assessment and need to be put into consideration before constructing test paper and
administrating the test to ensure the validity and reliability of the data collected. First and
foremost, I should check students’ level of proficiency before constructing the test so that the
data distribution would be normal distribution. Secondly, while administrating the test, I should
give a fixed time for students to answer the test so that their responses would be authentic
because there are a few students that made a few attempts to perfect their responses, however,
I only took the first attempt.

18

You might also like