Visual Attention Based Evaluation For Multiple-Choice Tests in E-Learning Applications
Visual Attention Based Evaluation For Multiple-Choice Tests in E-Learning Applications
Abstract—Multiple-choice (MC) question is an important form directed. That is, the current ”visual attention” position some-
of test to assess the students’ academic achievement, especially how indicates that the tester is interested in the information.
in the e-learning applications. However, the classical evaluation Although there are studies showing inconsistent results, it is
metrics on MC questions (such as the correctness ratio) only
consider the correctness of the final selection but ignore the widely agreed that during a complex information processing
solving progress of the testee. In the existing literature, the eye- task such as reading, eye movements and attention are linked
tracking based visual attention was studied to infer the testee’s [2]. Supported by eye-tracking technologies and neurocognitive
cognitive progress towards a specific MC question. However, there studies, there have been more applications in information pro-
is little work on the visual attention based evaluation of one cessing such as reading comprehension [3] and visual searching
complete MC test. In this paper, we measure the eye movement
data of a group of students in an online test, which consists of [4]. Recently, some researchers have adopted this technique to
forty more MC questions. We divide the screen area into five explore learning processes in complex learning contexts such
AOIs (area of interests), including one for the question and four as multimedia learning, and science problem solving strategies
for the candidate options. The fixation duration as well as the [5]–[9].
gaze sequence on these AOIs are recorded and studied. In the
case study on the most difficult question, we observe the great In mathematics problem solving, Hegarty et al. [7] found
differences among the eye movement of the testees in different that key information such as numbers and variable names
academic levels. A new metric, namely Visual-Attention-assisted to solving problems were fixated longer and were critical
Score (VAS), is proposed to assess the student’s performance with to the construction of the solution. Tsai et al. [8] began to
the bias of his fixations on the correct options. Experiment results
show that, this metric can reflect the difference of gaze movement
study the students’ visual attention in solving science MC
of testees, and thus it is helpful for the teachers to infer the real question. They find that successful problem solvers focused
level of the students’ academic achievement. more on relevant factors while unsuccessful problem solvers
experienced difficulties in decoding the problem. Based on
Tsai’s work, Nahumi [9] studied the relationships between
I. I NTRODUCTION answers and tester’s performance with regard to time spent
With the rapid development of information technology and watching the available options and gaze wavering between
Internet, online test has become a major form of assessment the most plausible choices. It is notable that, all of these
method in e-learning applications, in which the multiple-choice existing work research took one single specific MC question
(MC) questions are preferred. By comparing the testee’s answer designed carefully as the study objective. Their focus is to find
with the correct answer, the system can provide the academic the characteristics or similarities among students during their
assessment on the specified topics. However, it is hard to problem solving process in answering a single MC question.
treat such kind of evaluation result as the full description However, there is little work on the visual attention based
of the students’ achievement on the mentioned topic. Since evaluation of the whole online test.
it only provides the final results and is lack of details of In this paper, we focus on this issue and conduct a
the solving processing, the obtained score can not precisely measurement-based study. We adopt the eye tracking based
represent the real level of the students’ academic achievement. method to investigate the performance of one student class
For example, whether the selected answer is a result with (consisted of 45 students) in a serial of multiple-choice tests
careful consideration, or just a random choice. Although this (total 6 tests and 41 questions) of a short-term course. We
problem can be partial solved by designing MC questions divide the screen area into five AOIs (area of interests),
with better discrimination, there is still a need to infer and including one for the question and four for the candidate
understand the testee’s real performance. options. The fixation duration as well as the gaze sequence on
The eye tracking technology was originally used in the study these AOIs are recorded and studied. A new metric, namely
of basic cognitive processes and reading or other information Visual-Attention-assisted Score (VAS), is proposed to assess
processing by psychologists. The ”eye-mind” hypothesis was the student’s performance with the bias of his fixations on the
proposed by Just and Carpenter [1], which suggests that eye correct options. Experiment results show that, this metric can
movements provide a dynamic trace of where attention is being reflect the difference of gaze movement of testees.
Authorized licensed use limited to: ULAKBIM UASL - KARADENIZ TECHNICAL UNIVERSITY. Downloaded on November 07,2024 at 13:24:42 UTC from IEEE Xplore. Restrictions apply.
serial of {aj , tj , dj }, where aj represents the id of the focused In the process of online test, the system will show the MC
AoI, tj represents the start time when the gaze moves to this questions one by one to testees. The testees can click the
AoI, dj stands for the total fixation duration on this AoI and option part just once to choose their answers. There is no time
j is the sequence index after merging data. limitation for the students to answer the questions. As soon as
It is notable that there are many fixations completed instan- they submit their selections, their choice will be stored in the
taneous. When the fixation duration dj is smaller than the set database. And they will not have the chance to modify their
length τ (typically 100 ∼ 500ms [12]), it is considered to selections. Online test system can record and store the time
be a meaningless or invalid fixation on that AoI. So, we filter when a tester starts to answer one question and the time when
out the AoI which fixation duration is less than τ . Finally, he submits the selection. These data will be analyzed together
we obtain the valid fixation sequence of AoI: {ak , tk , dk }, with the eye movement data collected by Tobii EyeX.
(k = 1, 2, . . . , K), where K stands for the count of the
finally obtained valid gaze data. The following analysis will B. Results of all the students
all performed on this fixation sequence of AoI.
The correctness ratio of a question is a ratio of the number
D. Calculation of AoI-fixation metrics of students who were right to the total number of people
1) Fixation duration on the specific AoI: It stands for the participating in the test. The correctness ratio reflects the
total fixation duration of one single specific AoI. The fixation difficulty of the test questions, the higher the correctness ratio
duration of the AoI ak is denoted as Da , which is defined by: is, the more simple the problem is. For MC questions, there
are only two states of the students’ answer: right and wrong,
K
represented by 1 and 0 respectively. Since the result depend on
Da = dk |ak ≡a (1) the final selection of the testee, we name this kind of criteria
k=0
as Final Selection Scoring (FSS) in this paper.
where a ∈ {0, 1, 2, 3, 4} stands for different AoIs. We assume every question has 1 score, then the average
2) Fixation count on the specific AoI: It stands for the total score s of every students in this test can be calculated.
number of valid fixations locating in one AoI. The fixation Similarly, we calculate the average correctness ratio of each
count of the AoI ak is denoted as Ca , which is defined by: question. The corresponding results are plotted in Fig.2 and
Fig.3. From the correctness ratio, we can identify the difficulty
Ca = count(ak )|ak ≡a (2) of questions and the knowledge mastery of students.
where a ∈ {0, 1, 2, 3, 4} stands for different AoIs.
3) Proportion of fixation duration and count: The real value
of fixation duration and fixation count is significantly different
from people to people. There is no standard for the two metrics.
For the convenience of comparison, we introduce the relative
metric to describe the duration and count of valid AoI-fixations.
For one AoI a, its proportion fixation duration is
4
pa = D a / Dk (3)
k=0
Authorized licensed use limited to: ULAKBIM UASL - KARADENIZ TECHNICAL UNIVERSITY. Downloaded on November 07,2024 at 13:24:42 UTC from IEEE Xplore. Restrictions apply.
In Fig.2, student No.17 has the lowest correct ratio which and further calculate the corresponding proportion over the
is far away from the other students. So we will not take him total. The resulted statistics are provided in the following table,
into our account in the following analysis. The correct ratio Table I, in which the detailed fixation sequence of four testees
of that 44 students are ranging from 64.58% to 91.25% (with on the question No.3 are listed.
the mean of 76.91%, and the standard deviation of 0.071). In order to comparing the different gaze behaviors, we list
As shown in the Fig.3, the correct ratio of these 41 questions the results of two more students rather than the pre-mentioned
are ranging from 22.70% to 100.00%. From the point view of No.20 (best) and No.45 (worst). We select the student No.33
the correct ratio, question No.3 is the most difficult question, in Group H (who provides correct answer in question No.3)
question No.7 and No.13 are the most easy questions in this and student No.44 in Group L (who provide wrong answer in
test. question No.3) as the comparision candidates.
We focus on the Group H firstly. As shown in Table I, as
C. Observation on individual students for student No.20, the total account of thinking modes with
The individual performance in answering the difficult ques- the correct answer “B” is 73.1%. It is much higher than the
tions is always interested by the teachers. In this subsection, other modes which means that he paid more attention on the
we investigate the score results of the most difficult question right answer. The “BD” mode of thinking accounted for 42.3%,
No.3. For the convenience of analysis, we divide the testees indicating that student No.20 most possibly hesitated between
into three groups according to the value of his correctness ratio: the “B” and “D” selections. Therefore, we infer that student
Group H (higher than average), Group V (average), and Group No.20 has a good knowledge background on the question No.3.
L (lower than average ). The students in Group H are with As for student No.33, the total account of thinking modes with
the ID of {23, 12, 33, 27, 5, 20}, while those in Group L are the correct answer “B” is 60.6%. It is higher than the modes
{18, 45, 26, 42, 44, 40, 15, 24, 39}. unrelated to “B”. We can infer that he must grasp about the
In Group H, 50% of the members (the ID of {20, 27, 12}) knowledge of question No.33 very well, which enables him to
failed in question No.3. While, as to Group L, 77.78% of the show confidence in eye movement.
members (the ID of {18, 26, 42, 44, 40, 24, 39}) failed. It is Then we discuss the two cases in Group L. As for student
interesting that, the student with the highest score in this test No.44, who is also wrong in this question, the total account
(student No.20) also failed in question No.3; but the student of thinking modes with the correct answer “B” is 33.0%. It is
with the lowest score (student No.45) provided correct answer much lower than that unrelated to “B”. We infer that he is not
for question No.3. interested in the right answer. As for student No.45, we can
We are interested in the visual attention observations of these find he is quite confident about the option “B”, with as large
two students, whether the best student (No.20) really didn’t as 74.4% of fixation related to “B”.
know the answer? whether the worst student (No.45) wan by Through the analysis on the 4 students’ AoI sequences, we
luck? This motivated us to perform detailed analysis in the next can observe the different gaze movement modes in different
section. testees. The disadvantages of FSS scoring criteria (i.e. correct
for 1 and wrong for 0) are also illustrated. For example, the
IV. V ISUAL ATTENTION ANALYSIS : A CASE STUDY student No.20 really knows a lot about the correct option,
even more than the student that achieves the correct answer
The order in which the testee visit each AOIs reflects his
(i.e., No.33). These observations motivate us to develop a new
thinking process. For example, when he reads the question
scoring criterion, which takes the visual attention data into
content, he possibly wants to understand the question; when he
account.
reads the option areas, he aims at finding the most appropriate
answer. In this way, we can infer the cognitive process through
the sequence of AOIs. V. V ISUAL ATTENTION ASSISTED S CORING
One of the special phenomena of eye movement in MC
A. Motivation
question is the weaving behavior [9], in which the testee
moves his gaze focus between two similar options. When the Motivated by the observation results in the preceding section,
student is hesitant between two options, he will read back and we aim to propose a kind of scoring criterion for MC questions.
forth between the two selections. So in the collected fixation The basic idea is to take the measured visual attention as the
sequence, there will be two successive AoIs appear repeatedly, correction component to the existing FSS criteria. The new
such as “ABABAB...”. Motivated by this observation, we criteria should provide better discrimination for the knowledge
explore the statistics of two successive AoI fixations, rather awareness of the testees.
than study the whole sequence.
For a sequence {a1 , a2 , · · · , ak , ak+1 , · · · , aK }, we can B. Measurement of visual attention
divide the original fixation sequence into the successive AoI First of all, we need to develop a kind of quantitative metric
pair of {ak , ak+1 }, such as {AB, AC, ... }. We then count the to represent the degree of visual attention on the specific AoI.
number of appearance of each pair in the fixation sequence, As reported by the existing literature, watching more time or
Authorized licensed use limited to: ULAKBIM UASL - KARADENIZ TECHNICAL UNIVERSITY. Downloaded on November 07,2024 at 13:24:42 UTC from IEEE Xplore. Restrictions apply.
TABLE I
S TATISTICS OF CONSECUTIVE FIXATION SEQUENCE OF FOUR STUDENTS ON Q ESTION N O .3
Stu. Group Average Score in Related Unrelated A-B B-C B-D C-D A-D A-C
ID score Ques.3 to B(%) to B(%) (%) (%) (%) (%) (%) (%)
20 H 0.92 0 73.1 26.9 30.8 0.0 42.3 7.7 3.8 15.4
33 H 0.88 1 60.6 39.4 21.2 15.2 24.2 6.1 6.1 27.3
44 L 0.64 0 33.3 66.7 5.6 11.1 16.7 33.3 11.1 22.2
45 L 0.62 1 74.4 25.6 4.7 14.0 55.8 7.0 4.7 14.0
Authorized licensed use limited to: ULAKBIM UASL - KARADENIZ TECHNICAL UNIVERSITY. Downloaded on November 07,2024 at 13:24:42 UTC from IEEE Xplore. Restrictions apply.
solving progress of the testee. In the existing literature, the eye-
tracking based visual attention was studied to infer the testee’s
cognitive progress towards a specific MC question. However,
there is little work on the visual attention based evaluation of
one complete MC test.
In this paper, we measure the eye movement data of a group
of students in an online test, which consists of forty more
MC questions. We divide the screen area into five AOIs (area
of interests), including one for the question and four for the
candidate options. The fixation duration as well as the gaze
sequence on these AOIs are recorded and studied. In the case
Fig. 5. Students’ average performance in this test : FSS vs. VAS
study on the most difficult question, we observe the great
differences among the eye movement of the testees in different
academic levels.
We observe that the two curves show a certain correlation, A new metric, namely Visual-Attention-assisted Score
which imply that VAS does not change the basic trend of FSS. (VAS), is proposed to assess the student’s performance with
Due to the negative correction of VAS on the absolute score the bias of his fixations on the correct options. Experiment
1, the average VAS curve is lower than that of FSS. However, results show that, this metric can reflect the difference of gaze
the VAS curve discover some interested results. For example, movement of testees, and thus it is helpful for the teachers to
some students (such as No.28) have much less VAS than the infer the real level of the students’ academic achievement.
normal, which indicates they are not as good at the knowledge
as the previous criterion. On the other hand, some students
ACKNOWLEDGMENT
(such as No.13) have relative high VAS in the neighborhood,
which means that they have great potential to get promotion. This work has been in part supported by the Nation-
In order to demonstrate the different view of VAS, we count al Key Technology R&D Program of China under Grant
the number of students in different VAS score and compare 2015BAH33F04-05.
with that in final selection score in Table III.
R EFERENCES
TABLE III
T HE DISTRIBUTION OF STUDENT ’ S SCORE [1] M. A. Just, Carpenter, and P. A, “A theory of reading: From eye fixations
to comprehension,” Psychological Review, vol. 87, pp. 329–355, 1980.
Score VAS FSS [2] K. Rayner, “Eye movements in reading and information processing: 20
years of research,” Psychological Bulletin, vol. 124, pp. 372–422, 1998.
value # stu. proportion(%) # stu. proportion(%)
[3] K. Rayner, K. H. Chace, T. J. Slattery, and J. Ashby, “Eye Movements as
1 5 11.11 10 22.22 Reflections of Comprehension Processes in Reading,” Scientific Studies
0.8 2 4.44 0 0 of Reading, vol. 10, no. 3, pp. 241–255, 2006.
0.6 3 6.67 0 0 [4] R. Radach and A. Kennedy, “Theoretical perspectives on eye movements
0.4 4 8.89 0 0 in reading: Past controversies, current issues, and an agenda for future
research,” European Journal of Cognitive Psychology, vol. 16, no. 1-2,
0.2 10 22.22 0 0 pp. 3–26, 2004.
0 21 46.67 35 77.78 [5] J. Bartolotti and V. Marian, “Language learning and control in monolin-
guals and bilinguals,” Cognitive Science, vol. 36, pp. 1129–1147, 2012.
[6] T. V. Gog and K. Scheiter, “Eye tracking as a tool to study and enhance
Based on the FSS, 77.78% students score 0 while the others multimedia learning,” Learning & Instruction, vol. 20, no. 2, pp. 95–99,
score 1. However, based on our VAS criteria, these student 2010.
can be sorted into different levels of knowledge. 4 students [7] M. Hegarty, R. E. Mayer, and C. E. Green, “Comprehension of arithmetic
word problems: Evidence from students’ eye fixations.” Journal of
are found to provide wrong answer but still know something Educational Psychology, vol. 84, no. 1, pp. 76–84, 1992.
relevant to the correct option. That is different from the score [8] M.-J. Tsai, H.-T. Hou, M.-L. Lai, W.-Y. Liu, et al., “Visual attention
0, which means they know nothing about the question. In for solving multiple-choice science problem: An eye-tracking analysis,”
Computers and Education, vol. 58, pp. 375–385, 2012.
summary, the VAS criteria can help teachers identify the [9] M. P. N. Nugrahaningsih and S. Ricotti, “Gaze behavior analysis in
students who really has difficulty in solving problem. multiple-answer tests: An Eye tracking investigation,” 2013 12th Inter-
national Conference on Information Technology Based Higher Education
and Training (ITHET), pp. 1–6, 2013.
VI. C ONCLUSION [10] J. Hyn, “The use of eye movements in the study of multimedia learning,”
Learning and Instruction, vol. 20, no. 2, pp. 172–176, 2010.
Multiple-choice (MC) question is an important form of test [11] J. K. Kaakinen, J. Hyona, and J. M. Keenan, “ Perspective effects on on-
to assess the students’ academic achievement, especially in line text processing,” Discourse Processes, vol. 33, pp. 159–173, 2002.
[12] G. S. T. Ujbanyi, J. Katona and A. Kovari, “Eye-tracking analysis of
the e-learning applications. However, the classical evaluation computer networks exam question besides different skilled groups,” 2016
metrics on MC questions (such as the correctness ratio) only 7th IEEE International Conference on Cognitive Infocommunications
consider the correctness of the final selection but ignore the (CogInfoCom), pp. 277–282, 2016.
Authorized licensed use limited to: ULAKBIM UASL - KARADENIZ TECHNICAL UNIVERSITY. Downloaded on November 07,2024 at 13:24:42 UTC from IEEE Xplore. Restrictions apply.