I E T: A H - O T S P: Ntroduction To YE Racking Ands N Utorial FOR Tudents and Ractitioners
I E T: A H - O T S P: Ntroduction To YE Racking Ands N Utorial FOR Tudents and Ractitioners
A P REPRINT
A BSTRACT
Eye-tracking technology is widely used in various application areas such as psychology, neuroscience,
marketing, and human-computer interaction, as it is a valuable tool for understanding how people
process information and interact with their environment. This tutorial provides a comprehensive
introduction to eye tracking, from the basics of eye anatomy and physiology to the principles and
applications of different eye-tracking systems. The guide is designed to provide a hands-on learning
experience for everyone interested in working with eye-tracking technology. Therefore, we include
practical case studies to teach students and professionals how to effectively set up and operate an
eye-tracking system. The tutorial covers a variety of eye-tracking systems, calibration techniques, data
collection, and analysis methods, including fixations, saccades, pupil diameter, and visual scan path
analysis. In addition, we emphasize the importance of considering ethical aspects when conducting
eye-tracking research and experiments, especially informed consent and participant privacy. We aim
to give the reader a solid understanding of basic eye-tracking principles and the practical skills needed
to conduct their experiments. Python-based code snippets and illustrative examples are included in the
tutorials and can be downloaded at: https://gitlab.lrz.de/hctl/Eye-Tracking-Tutorial.
Keywords Eye tracking · Eye movements · Scanpaths · Data processing · User studies
A Hands-on Tutorial for Eye Tracking A P REPRINT
Contents
2 Calibration 8
2.1 Screen-based Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Wearable Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Slippage Compensation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Practical Advice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3 Data Collection 11
3.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Experiment Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3 Pilot Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.4 Experimental Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.5 Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2
A Hands-on Tutorial for Eye Tracking A P REPRINT
8 Ethical Considerations 30
3
A Hands-on Tutorial for Eye Tracking A P REPRINT
A basic comprehension of eye anatomy and physiology provides the foundation for the understanding and effective use
of eye-tracking technology. Hence, in the following, we introduce eye-tracking technology by focusing first on the
anatomy and physiology of the eye, and provide then an overview of basic eye-tracking techniques and current types of
eye trackers.
The human eye, its anatomy and physiology are the basis for our visual perception. Its main task is to capture incoming
light and convert it into electrical signals. These signals are then transmitted to the brain, where they are interpreted
and result in the information of a visual image. Figure 1 illustrates the different key components of the human eye,
including the cornea, iris, pupil, lens, retina, and optic nerve. In the following, we offer a concise overview of the
basic anatomy and physiology of the eye, with a specific focus on features relevant to eye tracking [Atchison, 2023,
Holmqvist et al., 2011, Kolb, 1995].
The cornea, a transparent structure, is the outermost layer of the eye and covers both the iris and the pupil. One of its
most important functions is to refract incoming light and direct it onto the retina, playing a crucial role in vision. In
addition, the cornea acts as a protective barrier against external agents such as dust or debris.
The iris, the colored part of the eye, regulates the amount of light entering the eye and controls the size of the pupil. The
pupil itself can dilate or constrict in response to light intensity. Positioned behind the iris, the lens fine-tunes the focus
of incoming light. The curvature of the lens can be altered by the ciliary muscles, which adjust its shape to allow for
accommodation, the process of changing focus between distant and near objects [Singh and Singh, 2012].
The retina, positioned at the rear of the eye, is a light-sensitive tissue composed of multiple layers of neurons and
photoreceptors. These photoreceptors, consisting of rods and cones, convert incoming light into neural signals. Rods
are responsible for detecting light, while cones are responsible for perceiving color. Within the retina, the macula - a
small central area - contains a dense concentration of cones. This area is responsible for high visual acuity, color vision,
and detail perception [Koretz and Handelman, 1988].
The optic nerve, originating from the back of the retina, transmits visual information to the brain for processing and
interpretation. The optic nerve comprises millions of axons, bundled together to form a compact structure that exits the
eye and carries information to several brain regions, including the visual cortex, responsible for creating the conscious
perception of visual information [Atchison, 2023].
4
A Hands-on Tutorial for Eye Tracking A P REPRINT
The complex movement of the eye is controlled by a series of six extraocular muscles that control the position of the
eye, as well as the levator palpebrae superioris muscle, which controls the elevation of the eyelid. These muscles are
innervated by several cranial nerves and coordinate to allow precise eye movements, such as smooth pursuit or saccades,
which are crucial to exploring and analyzing the visual environment [Singh and Singh, 2012].
The movement of our eyes is regulated by several brain regions, including the brainstem and the cerebellum, which work
together to ensure coordinated eye movements and stable images on the retina. These regions integrate information
from the vestibular system, responsible for detecting head movements and maintaining balance, and the visual system,
allowing for a precise and rapid response to changes in the visual environment.
As described above, the anatomy and physiology of the eye is a complex network of structures and processes responsible
for vision, eye movements and visual perception [Atchison, 2023]. Understanding the basic principles of eye anatomy
and physiology is essential for understanding the potential and applications of eye-tracking technologies in research and
clinical settings.
This section offers an overview of the different types of eye-tracking systems, including remote and head-mounted eye
trackers, along with their respective strengths and limitations.
5
A Hands-on Tutorial for Eye Tracking A P REPRINT
Eye tracking is a technique used to objectively measure and record the direction of an individual’s gaze and eye
movements [Wade and Tatler, 2005]. These are accomplished by measuring the relative position of the eye in relation
to the head or the orientation of the gaze itself [Duchowski, 2007]. Subsequently, collected data can be analyzed
and evaluated in various ways, such as identifying eye movement patterns or identifying frequently viewed areas of
interest [Holmqvist et al., 2011].
There are various methods for recording eye movements (see Table 1), including electro-oculography, which uses elec-
trodes placed around the eyes to detect electrical signals generated by eye muscles; scleral contact lenses, placed directly
on the cornea; or video-based eye tracking, which uses a camera directed at the eye to track its movements [Duchowski,
2007]. Among these methods, video-based eye tracking has gained popularity for most research applications due to
its non-invasive nature and minimal influence on eye gaze behavior [Duchowski, 2007, Fuhl et al., 2017a]. Most of
the remote and head-mounted eye trackers currently available on the market use video-based eye-tracking techniques,
which involve a camera and an infrared illumination source. Head-mounted eye trackers, in particular, often include an
additional scene camera to capture the surrounding environment [Holmqvist et al., 2011].
The pupil and cornea play a crucial role in video-based eye tracking. Algorithms use measurements of the pupil’s
center and four corneal reflections relative to the pupil’s center to map the gaze points on a screen or scene recording,
depending on the specific type of eye tracker employed [Duchowski, 2007, Fuhl et al., 2015, 2016, 2017b]. Using the
resulting gaze points, the respective eye movements can be derived. Corneal reflections serve as additional references
for reliable pupil detection and help compensate for minor head movements [Holmqvist et al., 2011, Nitschke et al.,
2013].
6
A Hands-on Tutorial for Eye Tracking A P REPRINT
Figure 3: A wearable eye tracker on the left (Tobii Pro Glasses 3), and remote eye trackers at the top (Tobii Fusion Pro)
and bottom (Tobii Pro Nano).
7
A Hands-on Tutorial for Eye Tracking A P REPRINT
2 Calibration
This section describes the process of calibrating different eye-tracking systems, an essential step for ensuring the
accuracy and reliability of the collected eye-tracking data.
Eye-tracking calibration is the process of estimating the geometric characteristics of a participant’s eyes as the basis for
a fully customized and accurate gaze point calculation [Ramanauskas, 2006]. Given the uniqueness of each participant’s
eyes in shape, size, and movement, eye-tracking devices require calibration to effectively accommodate these individual
differences. In addition, environmental factors such as lighting conditions, the position of the person’s head, and distance
from the screen to the eyes can affect the accuracy of eye-tracking data. Therefore, calibration is indispensable before
8
A Hands-on Tutorial for Eye Tracking A P REPRINT
collecting eye-tracking data to accurately determine the gaze point on a screen or within the environment [Nyström
et al., 2013].
A calibration procedure aims at establishing an accurate mapping between measured eye movements and the corre-
sponding points of gaze on a screen or within an environment. In particular, eye tracker systems capture the user’s eye
movements as they sequentially fixate on predefined calibration points displayed on a screen or within an environment.
By analyzing the relationship between these recorded eye movements and the known positions of the calibration
points, the system estimates the transformation needed to accurately map eye movements to gaze positions. One
commonly employed approach involves standard calibration, which utilizes linear or second-order models to estimate
the relationship between eye movements and gaze positions based on the recorded data from fixations on calibration
points [Liu et al., 2018, Morimoto et al., 1999]. Another prevalent technique is 2D mapping with interpolation [Sheela
and Vijaya, 2011], where users fixate on a grid of calibration points distributed across the screen or environment, and
interpolation algorithms are used to estimate gaze positions between these points. Through these calibration processes,
eye-tracking devices establish accurate mappings between eye movements and gaze positions.
In the actual calibration procedure with eye-tracking devices, participants are typically instructed to direct their gaze
towards a series of points that cover the area where relevant stimuli are presented, either on a screen or within the
environment. These points are commonly arranged in a grid or pattern that comprehensively covers the viewing area.
Throughout this process, the eye-tracking device continuously records the position of the user’s eyes as they focus
on each calibration point in succession. The participant may be prompted to blink or briefly look away between each
calibration point to ensure precise tracking by the device. During the calibration process, the eye-tracking device
collects data about the participant’s gaze position and compares it to the known locations of the calibration points. This
data is used to generate the mapping or calibration profile that allows the device to accurately track the participant’s
gaze during the data collection. Subsequently, we provide a more detailed description of two distinct types of calibration
employed in eye tracking.
The screen-based calibration method projects calibration points onto the 2D surface of the display monitor in a random
sequence, a common process for calibrating remote eye-tracking devices. The participant is then instructed to fixate on
each calibration point sequentially. The number of calibration points used in this process commonly ranges from 2 to 16
[Holmqvist et al., 2011, Balasubramanyam et al., 2018], with 9-point calibration being a widely adopted standard [Tobii,
2024]. One of the primary motivations behind screen-based calibration is the inherent variability in digital displays,
including differences in size, resolution, and aspect ratio. These variations can significantly impact the accuracy of
eye-tracking data if not properly calibrated. The eye-tracking system can account for these display discrepancies
through the calibration process, ensuring precise and reliable gaze tracking across different display configurations.
Screen-based calibration is important because digital displays can vary in size, resolution, and aspect ratio, affecting
eye-tracking data quality.
An illustration of a screen-based 5-point calibration and 4-point validation is depicted in Figure 5 (a). The 4-point
validation, which follows the calibration process, involves presenting four points at new locations to verify the eye
(a) Five-point screen-based eye-tracking calibration & (b) One-point wearable eye-tracking calibration in the Glasses 3
four-point validation in Tobii Pro Lab. Controller Software.
9
A Hands-on Tutorial for Eye Tracking A P REPRINT
tracker’s accuracy in tracking the participant’s gaze direction. Successful calibration is confirmed if the gaze data aligns
with the predetermined target positions; otherwise, recalibration is required [Tobii, 2023a].
Wearable calibration is primarily employed in scenarios where a participant’s eye movements must be tracked in
real-world environments rather than on traditional computer monitors or digital displays. This calibration method is
commonly associated with wearable eye-tracking devices, such as eye-tracking glasses and head-mounted displays.
These wearable eye-tracking devices may use various calibration procedures tailored to the device specifications and the
specific application requirements [Holmqvist et al., 2011, Santini et al., 2017a]. For instance, when using eye-tracking
glasses, participants are instructed to direct their gaze toward calibration targets positioned in the real environment.
These targets may be presented on a specially designed calibration card displaying the points. Two commonly employed
calibration procedures in wearable eye tracking are the single-point and multi-point calibration methods [Santini et al.,
2017a]. In single-point calibration, participants focus their gaze on a single calibration target, while in multi-point
calibration, they fixate on multiple calibration targets distributed across the environment. Figure 5 (b) depicts an
example of a calibration card used for wearable eye-tracking glasses.
Slippage refers to the unintended movement of the eye-tracking device on the participant’s head during a recording
session. This issue is frequently encountered during long recordings or when participants are free to move Santini
et al. [2018], Niehorster et al. [2020]. In the recording sessions, participants may push the device up on their nose,
remove it momentarily to adjust their hair or rub their eyes, or even move it through normal facial movements like
speaking or smiling. To address this issue, manufacturers have included stability features, such as tighter headbands
or additional securing mechanisms, to prevent the device from moving out of place. Despite these efforts, slippage
remains a common challenge, often requiring researchers to take additional steps to ensure data accuracy.
Several techniques have been proposed in the literature to reduce or compensate for slippage. These include the
classical pupil-glint vector technique Kolakowski and Pelz [2006] and determining camera translation, which may
involve using eyelid templates Karmali and Shelhamer [2004], eye corner tracking Pires et al. [2013] and monitoring
differences in gain values Kolakowski and Pelz [2006]. Additionally, saliency maps are useful for detecting and
correcting shifts Sugano and Bulling [2015], thereby adjusting for unintended movements. Techniques such as fast and
unsupervised calibration Santini et al. [2017b], recalibration Lander et al. [2016], and auto-calibration schemes Huang
et al. [2016] can also mitigate the effects of slippage.
In addition to these methods, using an eye tracker and eye tracker accessories designed to manage slippage could be
a more efficient and feasible option. Eye trackers often come equipped with accessories like head straps, cloth clips,
helmets, or custom facial plastic molds, all of which help to secure the device firmly on the user during recordings.
Head straps are elastic or adjustable bands that wrap around the head to effectively stabilize the eye tracker. Commonly
used in virtual reality headsets and other head-mounted devices, head straps provide a reliable attachment method.
Cloth clips, which can be attached to hats, glasses, or other headgear, are less intrusive than head straps and offer a quick
way to attach and detach the eye tracker. Helmets could be particularly useful in applications requiring robust mounting
of the eye-tracking device. Additionally, custom facial plastic molds, designed for an individual’s face, provide the
most secure and precise fit possible. However, this method is less common due to the higher cost and complexity of
creating personalized molds.
In the calibration process, one of the most important factors is lighting. To achieve optimal accuracy and precision
with wearable glasses, both direct sunlight and excessively dark settings should be avoided; moderate lighting is most
suitable. Users need to ensure that the glasses are fitted comfortably during calibration, as adjustments made afterward
can negatively affect the accuracy. Additionally, lighting must align with the conditions of the actual experimental
environment.
Remote eye trackers, which often employ infrared lights, perform better in darker rooms. If users wear glasses,
calibration accuracy can be significantly diminished. In such cases, a minor adjustment in the positioning of their
glasses might enhance the fit and improve accuracy.
In the calibration process, a neutral and non-distracting background is better for both types of eye trackers. Intricate
patterns or moving objects can interfere with eye-tracking accuracy during calibration. It is crucial to clearly explain the
calibration process and subsequent steps to prevent users from moving their heads to ask questions after the calibration.
10
A Hands-on Tutorial for Eye Tracking A P REPRINT
In the end, calibration accuracy should be verified numerically (through accuracy and precision metrics) and visually,
by ensuring that calibration points cluster closely around the target dot with minimal spread. If the calibration results
are inadequate, the process should be repeated to ensure the experiment’s validity.
3 Data Collection
The data collection procedure in eye-tracking studies is a systematic process that establishes the experimental setting,
followed by experiment design and planning. This procedure concludes with the recruitment of participants and the
collection of eye movement data. This chapter will delve into every step of the eye-tracking experiment including setup,
piloting recording and troubleshooting, providing a comprehensive and hands-on guide for each aspect.
3.1 Setup
The setup of the experimental environment plays a decisive role in the successful implementation of user studies.
Figure 6 illustrates an exemplary setup for data collection using a remote eye tracker and a computer monitor to display
the visual stimuli. Eye trackers with a high sampling frequency can yield high-resolution data. Similarly, using monitors
with higher refresh rates can positively impact data quality. In this regard, it is essential to consider the experimental
objectives; for instance, features such as saccade latencies may require the use of 240Hz monitors. Consistency in
the experimental setup is also crucial in eye-tracking research. To ensure the comparability of eye-tracking data
across different participants, the eye-tracking devices and monitors used in the experiments should remain identical.
Additionally, the configuration of the desktop or laptop used in the experiment can affect the data collection process and
should be carefully considered. In the experimental design, connecting eye trackers with additional extension cables or
hubs may affect the stability of their connection. Therefore, it is advisable to avoid using additional extension cables for
the eye trackers and instead connect them directly to the computer. If additional cables are used or if the devices are
connected to ports on the monitor, thorough testing is recommended. Moreover, utilizing ports with higher speeds on
the computer can enhance stability.
Positioning the eye trackers and configuring the monitors are crucial steps in the setup process. After positioning the
eye tracker, configuration typically involves using dedicated software for eye trackers, which may require specific
information about the monitor, such as its width, length, and resolution. Attaching the eye trackers directly to the screen
simplifies the configuration process, requiring less detailed information about the eye tracker’s position. However,
if the eye tracker is positioned elsewhere, precise measurements of the angles and distances between the eye tracker
and monitor are essential for accurate configuration. Eye tracker models equipped with their own screens require less
detailed information for configuration due to their built-in screens. Once the eye tracker has been configured, it is
important to refrain from moving or changing any setup components, as even minor adjustments can adversely affect
the data collection process. In case of any changes, reconfiguration is necessary before proceeding with the experiment.
This configuration process is essential, as the accuracy of the collected data hinges on providing precise and accurate
information during setup.
Proper screen resolution adjustment is also essential, as eye-tracking data logs typically record gaze positions in
centimeters or pixels. It is advisable to adhere to the resolutions recommended by the eye-tracking software being
Class 1
Step 1:
Compare two classes
Display
Eye Tracker
Step 2:
View for 3 seconds
Chin Rest
Keyboard
Step 3:
1
Give the class number
Figure 6: Eye-tracking data collection setup using a remote eye tracker.
(a) Data collection setup (b) Data collection
11
A Hands-on Tutorial for Eye Tracking A P REPRINT
Figure 7: Example of a chin rest used to stabilize the participant’s head during experimental procedures.
used. In cases where specific recommendations are unavailable, opting for higher resolutions may be beneficial, but
it is important to ensure that the computer’s processing power can adequately support them. Conversely, excessive
computational load during the experiment can adversely affect the results, so lower resolutions can be used if necessary.
The decision between higher and lower resolutions should be carefully weighed considering various factors.
Careful design of the experimental environment is essential to mitigate external stimuli such as sound and light.
Consistency in external factors, such as lighting conditions, is crucial if experiments are conducted at different times. If
the eye-tracking device can function in low-light conditions [Tobii, 2022], covering windows and turning off lights
could provide an optimal experimental setup, as such external light sources can fluctuate throughout the day. When
collecting data from multiple participants simultaneously, it is important to ensure that each participant is not affected
by others’ presence or actions. During the experiment, the experimental room’s door can be closed, and a warning sign
indicating that the experiment is in progress can be placed outside.
Furthermore, experimenters may opt to use a chin rest as depicted in Figure 7, or other head-stabilizing methods during
the experiments to improve eye-tracking data quality. However, such methods may cause discomfort in participants
and create an artificial experimental setting. Thus, applying these methods necessitates carefully evaluating their
advantages and disadvantages in balancing experimental control with ecological validity [McGrath, 1995] preservation.
Ecological validity refers to how accurately research findings represent real-world phenomena and their generalizability
to naturalistic settings. A chin rest can effectively minimize head movements and maintain consistent positioning of
participants throughout the experiment. This implementation enhances the accuracy and reliability of eye-tracking data
while standardizing the experimental setup across all participants. However, restricting natural head movements can be
problematic for experiments requiring participants to move their heads freely. Therefore, it is essential to carefully
weigh the benefits of employing a chin rest against its potential limitations, particularly regarding data quality and
ecological validity implications.
In the previous Section 3.1, we discussed the process of configuring hardware and establishing the experimental
environment for data collection. In this section, we provide an overview of key considerations essential for the design
of an eye-tracking experiment.
Before designing the eye-tracking experiment, careful consideration should be given to how stimuli are presented and
what data needs to be recorded. Some eye-tracking devices may require the use of specific software for experiment
preparation. When using videos or images as stimuli in the experiment, it is important to adjust their resolution carefully
to ensure participant comfort and facilitate post-processing. It may be advisable to avoid displaying stimuli in full-screen
mode and instead opt for a smaller area surrounded by a black frame. This approach helps to prevent less accurate
recording of gaze data on the monitor’s edges and minimize participant distraction.
Participants’ gaze behavior is typically recorded using eye-tracking software. Still, in certain situations, recording
keyboard and mouse movements or voice inputs may be necessary to capture participant responses or answers to
12
A Hands-on Tutorial for Eye Tracking A P REPRINT
specific tasks during the experiment. If the software does not provide these recordings, additional scripts should be
integrated to capture those data. In particular, it is crucial to ensure that timestamps are included in the recordings;
however, synchronizing these timestamps between the recording and the eye tracker logs can be challenging. This
challenge may be further complicated by using different time options, such as computer time and eye tracker recording
time. Consequently, checking these timestamps and analyzing any constant or varying bias is essential, as they can
later pose problems during the data post-processing. If complete synchronization of the timestamps is not feasible,
one potential solution is to include synchronization logs in the scripts or redesign the experiment by incorporating
press-button instructions following each stimulus.
In addition to details on the experimental environment and equipment, designing the experiment itself is another
essential step for obtaining valid findings. In eye-tracking studies, two common experimental designs are typically
employed: the between-subjects design and the within-subjects design. These designs differ primarily in how the
experimental conditions are assigned to the participants. In a between-subjects design, each participant is exposed
to only one experimental condition, with participants randomly assigned to these conditions to ensure each group
experiences a different condition. Conversely, a within-subjects design involves the same participants being exposed
to all experimental conditions. One important factor that can affect the reliability of experimental results is the order
effect, which refers to the influence of the stimulus presentation sequence on a participant’s responses. The order
in which stimuli are presented can significantly impact the participant’s response and perception, potentially leading
to biased results. To mitigate this effect, researchers often randomize the presentation order of stimuli, minimizing
potential confounding factors and ensuring the objectivity and reliability of the experimental results. Additionally,
in a within-subjects design, it is important to consider the potential impact of the training effect, which refers to the
improvement in performance that occurs as a result of repeating the same or similar tasks over time. In addition
to potential performance increases in certain tasks, familiarity with the tasks or conditions may lead to decreased
participant engagement, potentially leading to a loss of interest or focus.
Careful consideration of each stimulus’s duration and presentation timing is essential in optimizing the experimental
design and mitigating potential confounding factors such as participant fatigue. Participant fatigue can significantly affect
eye-tracking data quality and validity over time, reducing attention and slower ocular responses. Therefore, avoiding
excessively prolonged sessions and providing adequate rest periods to prevent fatigue is imperative. Throughout
the experiment, changes in the participant’s posture or disruptions in the eye-tracking signal may necessitate the
recalibration of eye trackers. Particularly in long experiments, incorporating breaks that include recalibration may help
maintain consistency in the collected data and ensure that the results accurately reflect the participant’s gaze behavior.
During the data collection process, it is important to acknowledge that participants may not be familiar with the
equipment and experiment. Merely providing instructions alone may not be sufficient for their full understanding.
Integrating an acclimation phase before the actual experiment is highly recommended to enhance understanding. This
phase should consist of a series of specially designed stimuli, different from but related to those used in the main
experiment, to familiarize participants with the tasks. Encouraging participants to ask questions during this period is
important for clarifying uncertainties and improving their comfort level for the experiment. These may increase the
quality and reliability of the data collected by minimizing the disruption during the experiment.
Moreover, it is essential to establish the budget during the experimental design phase, considering the required number
of participants. Compensation should be provided based on the duration of the experiment to encourage participant
involvement and comply with legal requirements. It is essential to note that the compensation rate should be determined
in accordance with the minimum hourly rate applicable in the region where the experiment is to be conducted.
Piloting is pivotal in eye-tracking experiments, serving as a critical preparatory phase before the main study. By
conducting a smaller-scale version of the experiment with a limited number of participants, researchers can identify and
address potential issues or inefficiencies in the design. The piloting process allows for the refinement of the experimental
design, visual stimuli used, and experimental procedures to ensure they are suitable for obtaining the research objectives.
Pilot studies also enable researchers to pinpoint potential technical issues with the eye-tracking equipment, software, or
other components of the experiment, therefore preventing data loss or errors during the main study. Moreover, they
provide an opportunity to assess participants’ comprehension of instructions and the suitability of tasks, facilitating
adjustments to task difficulty or duration as needed. Additionally, pilot studies aid in estimating the time required for
data collection per participant more accurately, enabling efficient planning and scheduling for the main experiment.
This, in turn, facilitates a more precise estimation of the budget required for the main experiment. Furthermore, pilot
studies allow for assessing the quality of the eye-tracking data collected, including accuracy, precision, and sampling
rate. This assessment informs researchers of any necessary adjustments to improve data quality and ensures that data
analysis procedures are robust and effective. If issues with data processing or missing information arise, researchers can
13
A Hands-on Tutorial for Eye Tracking A P REPRINT
refine the experimental procedure to obtain additional measurements, thus enhancing the overall reliability and validity
of the study.
Notably, controlling the experimental environment is often challenging for studies involving wearable eye trackers as
they are conducted in real-world settings. This lack of control introduces variables such as lighting conditions, which
can lead to distractions and significantly impact the quality of visual recordings. Therefore, during the pilot study, it is
also essential to experiment with different lighting conditions and analyze the resulting data to verify its quality.
As depicted in Figure 8, the initial step in conducting eye-tracking experiments entails recruiting participants, which
requires efforts to draw a representative sample of participants from the target population. A comprehensive advertise-
ment is initially constructed, outlining the experiment’s nature, time commitments, and any associated compensation.
These advertisements should provide only general information about the study, avoiding extensive details about the
experiment’s purpose and methodology to maintain experimental integrity while offering sufficient information to
potential participants. The advertisement can be disseminated across various platforms, including institutional bulletin
boards, online forums, and social media, to reach a large and diverse pool of potential participants. Additionally, student
groups on platforms such as WhatsApp, Slack, Telegram, and university email groups can serve as useful alternatives for
participant recruitment. Alongside the advertisement, brief questionnaires can be utilized to gather basic demographic
details such as age and language proficiency in predetermining participant eligibility based on broad inclusion and
exclusion criteria.
Following the recruitment phase, screening involves a more detailed assessment of selected participants to ensure they
meet the specific criteria for the eye-tracking experiment. Questionnaires can be used to gather information based
on the specific experiment’s criteria, including inquiries related to visual acuity requirements due to the nature of the
eye-tracking technology. Participants who pass the screening process can participate in the experiment. Before taking
part in the experiment, participants are provided with an informed consent form. This document outlines the study’s
nature, procedures, potential risks and benefits, and measures taken to ensure participant confidentiality. Upon approval
and signing of the consent form, participants can proceed to the experiment, be added to the participant pool, and
participate in the experiment.
Preparation of the eye-tracking experiment setup should be completed before the participant arrives. The experimenter
should ensure that all necessary components are properly connected and operational and adjust the eye tracker
configuration as needed. Mistakes and delays can be minimized by completing these preparations in advance, thus
ensuring a legitimate data collection process. Using a checklist (see Table 3) for experiment preparation may be essential
to streamline setup procedures and ensure all necessary steps are executed.
14
A Hands-on Tutorial for Eye Tracking A P REPRINT
Initial - Use eye trackers with a high sampling frequency for accurate data. 2
Setup - Choose monitors that have high refresh rates. 2
- Ensure eye trackers are connected directly to the computer for reliability. 2
- Position and configure the eye tracker accurately. 2
- Adjust screen resolution appropriately. 2
- Control external stimuli such as sound and light to prevent distractions. 2
- Utilize head stabilizers (e.g., a chin rest) for data quality. 2
Experiment - Select an appropriate experimental design (between-subjects or within-subjects). 2
Design - Determine the types of stimuli (e.g., images, videos, text) to be presented. 2
- Ensure the resolution of the stimuli is appropriate for your setup. 2
- Determine additional interactions (keyboard, mouse) with proper timestamps. 2
- Manage session duration to avoid fatigue. 2
- Recalibrate during breaks for long sessions to maintain data quality. 2
Pilot - Conduct to refine the experimental design and procedures. 2
Study - Assess participant instruction comprehension. 2
- Estimate the time required for data collection. 2
- Verify eye-tracking data quality. 2
- Validate and test your analysis method on the collected data. 2
Experimental - Recruit and screen participants. 2
Procedure - Provide informed consent forms. 2
- Prepare and test the eye-tracking setup in advance. 2
- Explain the experiment procedure to participants. 2
- Ensure participants understand the experiment procedure. 2
- Ensure that participants are comfortably seated. 2
- Accurately calibrate the eye-tracking system. 2
- Maintain consistent participant position. 2
- Provide compensation and ensure a positive experience. 2
- Save and store data securely. 2
Troubleshooting - Restart software or computer if needed. 2
- Adjust the eye tracker and screen position or angle for better capture. 2
- Ensure accurate calibration, repeat if necessary. 2
- Offer short breaks before recalibration attempts. 2
Upon the participants’ arrival, the experimenters should welcome them warmly and respectfully, creating an atmosphere
of comfort and open communication. Participants should be provided with a simple and non-technical explanation
of how the eye tracker works. It is essential to ensure that the participant is comfortably seated in front of the eye
tracker, with adjustments to accommodate their posture in case the experiment utilizes a remote eye tracker. Maintaining
a consistent position throughout the experiment is essential to preserving the quality of the eye-tracking data. If
supplementary data from devices such as a keyboard or mouse are being collected, the experimenters should ensure
15
A Hands-on Tutorial for Eye Tracking A P REPRINT
that these devices are readily accessible to the participants to prevent any posture and position changes during the
experiment.
Experimenters should clearly provide explanations of the experimental procedure. Once the experimenters are confident
that the participant understands the experimental procedure, they can proceed to the experiment. The participant’s
position should be appropriately adjusted, and the eye-tracking system should be successfully calibrated. This calibration
process may need to be repeated until small gaze errors indicating accurate tracking are achieved. Throughout the
experiment, the experimenter should be available to ensure that the participant is comfortable and that the eye-tracking
data is being properly recorded. Additionally, the experimenter must ensure the calibration process is performed with
minimal gaze errors in case of a break in the experiment.
After completing the eye-tracking experiment, participants can ask any questions they may have about the experiment.
Experimenters should respond to these inquiries openly, clearly, and respectfully. If compensation is applicable,
participants should receive it at the end of the session as agreed upon. Lastly, the experimenter should make sure a
respectful and friendly farewell happens, ensuring they leave with a positive perception of their participation. After the
participant leaves, all logs and experimental data must be saved and securely stored for future processing and analysis.
3.5 Troubleshooting
Occasionally, issues may arise with the eye-tracking software or computers during an experiment. Restarting the
software might offer a quick resolution when faced with such challenges. If this proves ineffective, rebooting the
computer and reconnecting the eye-tracking device could also resolve the issue. In situations where the eye trackers
cannot capture one or both eyes, external interference from nearby objects may be a reason. Adjusting the eye tracker’s
position or angle could help solve the problem. If the problem persists, attempting to restart the eye tracker, reconfiguring
it, and verifying its functionality may be necessary.
One of the most common problems encountered in eye-tracking experiments is the inability to achieve accurate and
precise calibration, even when the eyes are properly captured. The experimenter needs to ensure that the participant
understands the calibration instructions and gazes directly at the calibration points as they appear. This issue is
especially prevalent among participants wearing glasses. Repeating the calibration process and potentially adjusting the
participants’ position may help resolve the problem. In such situations, providing the participant with a short break of
2-3 minutes before attempting calibration again could be beneficial.
The first step of data processing involves excluding invalid raw data collected from participants. This process consists
of three aspects, detailed in the following.
Excluding incorrectly calibrated data. Data collected from participants with inaccurate calibration should be excluded
from further analysis. Incorrect calibration can lead to inaccuracies in the recorded eye movements, compromising the
reliability of the data. Therefore, it is essential to identify and remove such data to ensure the validity of subsequent
analyses. In typical eye-tracking experiments, researchers ensure proper calibration before data collection. However,
despite efforts to achieve accurate calibration, there are instances where calibration may not be successful or may
degrade over time without immediate detection. For example, the calibration process may not yield accurate results
despite repeated attempts initially. This could be due to factors that could not be solved, such as participant discomfort,
eye illness, poor calibration techniques, or equipment malfunctions. In such cases, experimenters should proceed with
16
A Hands-on Tutorial for Eye Tracking A P REPRINT
Figure 9: An example of eye-tracking data collected from a Tobii remote eye tracker, formatted in CSV.
Figure 10: An example of invalid eye-tracking data collected from a Tobii remote eye tracker, formatted in CSV. The
rectangular region highlights the missing eye-tracking signals.
the experiment rather than make further attempts to avoid causing frustration for participants. Additionally, even if
successful calibration is achieved and participants are instructed to maintain a stable head position (in remote eye
tracking) or keep the eye tracker stable on their head (in wearable eye tracking), if participants fail to adhere to these
criteria during the experiment, the experimenters should not terminate the experiment but instead mark the data for
potential exclusion in subsequent analysis.
Excluding incomplete data. Incomplete data sets should also be excluded from analysis, which includes two scenarios.
First, data collection was terminated during the experiment, either due to passive termination caused by hardware issues
(e.g., eye trackers, laptops, or mobile phones used for data collection) or voluntary termination by participants. Second,
although participants performed well during the experiment, the data quality could be compromised due to undetectable
sensor issues, resulting in a low tracking ratio (the percentage of eye-tracking data successfully tracked during the
experiment). For instance, if the tracking ratio falls below a certain threshold, such as 75%, the data may be considered
incomplete and low-quality and can be excluded from analysis. However, the specific threshold for tracking ratio
exclusion may vary depending on the experiment’s context and the experimenter’s decision. Figure 10 shows part of an
example of invalid eye-tracking data containing a lot of missing signals collected from a Tobii remote eye tracker that is
formatted in CSV. Figure 11 displays an example of incomplete eye-tracking data, using pupil signals as an example.
Excluding invalid trials. In eye-tracking experiments featuring multiple trials or tasks, the experimenters should check
the participants’ task performance closely and exclude data from invalid trials before proceeding with further data
analysis. Such invalid trials include cases where participants fail to follow task instructions adequately, exhibit erratic
eye movements, or engage in behaviors that interfere with the data quality. For example, participants may become
distracted during a task, leading to inconsistent gaze patterns or prolonged periods of inattention. Additionally, instances
of participant fatigue or discomfort can impact the quality of the eye-tracking data, as they may result in decreased focus
17
A Hands-on Tutorial for Eye Tracking A P REPRINT
Figure 11: An example of eye-tracking pupil signals collected from a Tobii remote eye tracker. The rectangular region
highlights the missing pupil signals; the yellow rectangular region highlights the noise; the bottom figure displays the
smoothed and interpolated pupil signals.
or the adoption of unnatural viewing behaviors. Furthermore, trials affected by external factors such as environmental
distractions or interruptions may also fall into this category, as they can introduce noise or bias into the collected data.
Experimenters can identify invalid trials occurring under these conditions by monitoring participants’ performance
during the experiments or by conducting post-hoc checks, such as reviewing video recordings of the experiments or
gaze recordings provided by eye-tracking analysis software (e.g., Tobii Pro Lab [Tobii, 2023b].
Building upon the initial cleaning of datasets conducted in the first step, the subsequent step in data processing mainly
involves cleaning the remaining data to address any artifacts or inconsistencies within the eye-tracking data. This
process consists of several aspects, including smoothing the data to eliminate high-frequency noise or fluctuations
and utilizing interpolation methods to fill in missing data points or gaps in the dataset. We provide more details in
Section 5.6 for pupil diameter. Figure 11 demonstrates the smoothing and interpolation process applied to pupil signals
as an example.
Smoothing. Smoothing techniques in eye-tracking data processing aim to reduce the impact of high-frequency noise or
abrupt fluctuations in the eye movement signals, thereby enhancing the clarity and interpretability of the underlying
gaze patterns. One commonly used smoothing method is the moving average filter, where each data point is replaced
with the average value of neighboring data points within a specified window. This averaging process helps to suppress
rapid variations in the data, resulting in a smoother trajectory of eye movements over time.
Interpolation. Interpolation methods are invaluable in eye-tracking data processing, particularly for addressing missing
data points or temporal gaps in the dataset, facilitating a more comprehensive analysis of participants’ gaze behavior.
For instance, when applied to pupil diameter data, interpolation techniques aim to estimate the values of missing
or corrupted pupil diameter measurements by inferring them from neighboring valid data points. Commonly used
interpolation methods include linear interpolation, nearest neighbor interpolation, and weighted average interpolation.
When selecting interpolation methods for eye-tracking data processing, researchers should consider several factors to
ensure the appropriate technique is applied based on the dataset’s characteristics and the research objectives.
Following data cleaning, the next step in the data processing pipeline is data segmentation, which involves dividing
the continuous stream of eye-tracking data into meaningful segments or epochs based on specific criteria or events of
18
A Hands-on Tutorial for Eye Tracking A P REPRINT
interest. These segments serve as the basis for further analysis and interpretation of participants’ gaze behavior. More
details about data segmentation are provided in the following for the eye-tracking studies.
Segmentation criteria. Researchers establish the criteria for segmenting the eye-tracking data based on the research
objectives and experimental design. Segments can be defined by temporal factors, such as time intervals corresponding
to different experimental conditions or task phases, or by event-based triggers, such as stimulus onset or participant
responses. For example, in a reading task, segments could be defined based on temporal factors corresponding to
different phases of the reading process, such as segmenting data according to paragraphs, sentences, or specific words
of interest to the researcher.
Segmentation method. Different methods can be employed to segment eye-tracking data, depending on the nature
of the study and the characteristics of the data. Time-based segmentation involves dividing the data into fixed or
variable time intervals, while event-based segmentation relies on detecting specific events or triggers within the data
stream. Hybrid approaches may combine both time-based and event-based criteria for segmentation. For example, in a
video-watching task, researchers may employ different segmentation methods according to their research objectives:
time-based segmentation to analyze participants’ behavior during different video intervals and event-based segmentation
to define segments based on detected scene changes within the video or hybrid segmentation approach.
Segmentation validation. It is essential to validate the effectiveness of the segmentation process to ensure that segments
accurately capture the intended aspects of participants’ gaze behavior. Validation may involve several aspects, such as
visual inspection of segmented data. Researchers can visually inspect the segmented data to verify whether the defined
segments align with the intended aspects of participants’ gaze behavior. This involves reviewing the eye-tracking data
alongside the corresponding stimuli or task events to ensure that segments capture relevant periods of interest. Secondly,
the researcher may compare segmentations with external criteria or annotations to validate their accuracy. For example,
researchers may compare segment boundaries with predefined task events or stimulus timestamps if the study involves
analyzing gaze behavior during specific task phases or stimulus presentations.
Segmentation tools. Researchers may utilize specialized software tools or programming scripts to automate the
segmentation process and facilitate efficient processing of large datasets. These tools often provide features for defining
segmentation criteria, applying segmentation algorithms, and visualizing segmented data for inspection and validation
(e.g., Tobii Pro Lab [Tobii, 2023b]).
The raw eye-tracking data typically consists of frame-by-frame sensor recordings captured by the eye tracker. To gain
insights into participant’s visual perception, in the next step, eye-tracking metrics must be extracted from this raw data
through different data processing steps. Thus, feature extraction is an essential step in the data processing pipeline,
where relevant information is extracted from the cleaned and segmented raw eye-tracking data. This section discusses
methods to identify eye-tracking measures, including fixations, saccades, pupil diameter, and associated statistical
metrics.
Eye movement event detection. It is commonly achieved through two main methods. Firstly, utilizing eye-tracking
software provided by the eye tracker manufacturer, such as Tobii Pro Lab, allows for the easy extraction of eye movement
events like fixations and saccades (refer to the definitions of these eye movement events in Section 5.5). These software
packages often include built-in functions specifically designed for this purpose, simplifying the extraction process
with just a few clicks. Alternatively, in cases where dedicated eye-tracking software is not available, such as in cases
where an eye tracker is integrated into virtual reality (VR) headsets, eye movement events may need to be extracted
by implementing custom processing pipelines. This custom extraction process involves carefully analyzing the raw
eye-tracking data to identify and mark relevant eye movement events, such as fixations and saccades, based on predefined
criteria or algorithms. The detailed algorithms for manual feature extraction can be found in Section 5.5.
Statistical metrics. Once features for fixations, saccades, and pupil diameter are extracted, researchers can compute
various statistical metrics to characterize participants’ gaze behavior quantitatively. These metrics offer valuable
insights into various aspects of eye movement patterns and pupil dynamics observed during the specific eye-tracking
task. Common statistical metrics include measures of central tendency, such as the mean and median, which provide
information about the typical or average values of the data. For example, the mean fixation duration can indicate
the average duration of fixations across participants or experimental conditions. Measures of dispersion, such as the
standard deviation and variance, help assess the spread or variability of the data around the central tendency. For
instance, the standard deviation of saccade amplitudes can indicate how much individual saccade amplitudes deviate
from the mean amplitude, providing insights into the consistency or variability of saccadic eye movements during the
whole task.
19
A Hands-on Tutorial for Eye Tracking A P REPRINT
In conclusion, the data processing pipeline outlined in this section serves as a fundamental framework for handling raw
eye-tracking data collected from participants in user studies. By systematically implementing data cleaning, filtering,
and feature extraction techniques, researchers can ensure that the data is prepared in a standardized manner for further
analysis. Potential artifacts and inconsistencies within the dataset are addressed through these steps, and relevant
eye-tracking metrics are extracted to accurately characterize participants’ gaze behavior. By following this structured
approach, researchers can enhance the reliability and validity of their findings, ultimately contributing to a deeper
understanding of human visual perception and cognition.
5.1 Fixations
Fixations are periods during which the eyes remain still and focus on a particular point of interest [Kasneci et al., 2014,
Tafaj et al., 2012]. Humans often acquire new information about the presented stimulus during these periods [Hessels
et al., 2018]. In addition, apart from knowledge intake, they are engaged with the fixated region of the stimulus. Such
knowledge intake or engagement often lasts between 100 ms to 350 ms [Salvucci and Goldberg, 2000, Rayner, 1998],
despite shorter fixation durations that have been considered in the literature.
While fixations can be analyzed individually, one common analysis technique in the eye-tracking literature is aggregating
them and analyzing their summary statistics per stimulus or task. Fixation duration and number of fixations are two
important and widely used measures for assessing visual attention processes and cognitive load. For instance, longer
fixation durations are typically associated with increased cognitive load or difficulty in processing information, similar
to the relationship between task difficulty and fixation durations [Pomplun et al., 2013, Gao et al., 2021]. However,
studies have also shown that fixation behavior can vary depending on the observer’s level of expertise Gegenfurtner
et al. [2011], Kübler et al. [2015]. In particular, it was found that although experts tend to have more fixations, these are
mainly concentrated in areas with relevant information Castner et al. [2017], Gegenfurtner et al. [2011]. Experts also
showed a shorter total fixation duration compared to novices Castner et al. [2017].
5.2 Saccades
Another common and extensively studied type of eye movement is saccadic behavior. Saccades are high-speed and
ballistic eye movements that direct visual attention from one fixation to another [Agtzidis et al., 2019]. Like fixations,
researchers often analyze aggregated measures for saccades, such as saccade velocities and amplitudes. Depending
on the stimulus, higher saccade velocities may provide insights into the efficiency of the visual system in processing
information and its prioritization of different regions of interest. In addition, larger saccade amplitudes may indicate
that attention is being drawn from a distance [Goldberg et al., 2002, Gao et al., 2021].
Smooth pursuit is a type of eye movement that occurs when tracking a moving object to maintain its position on the
fovea [Purves et al., 2001]. Unlike fixations and saccades, where the gaze is fixed or rapidly shifts between points of
interest, smooth pursuit involves a slow and continuous movement of the eyes to track the moving target. This eye
movement is commonly observed during activities where a moving object is present, such as driving, sports, following
a conversation during mobile settings, or when encountering dynamic visual stimuli [Santini et al., 2016, Kasneci et al.,
2015]. Like fixations and saccades, analyzing smooth pursuit provides valuable insights into human visual perception
and attention.
5.4 Blinks
Blinks refer to the rapid closing and opening of the eyelids, typically lasting for a fraction of a second [Schiffman,
2001]. Apart from serving essential functions such as protecting the eyes from external factors, blinks temporarily
render users blind. Thus, ignoring blinking behaviors in eye movement analysis could degrade the quality of data
analysis. Blinking periods should either be excluded from the analysis or incorporated with smoothed and cleaned
measurements to ensure a legitimate understanding of visual attention, perception, and cognition. On the other side, the
blink rate has also been investigated in association with increased cognitive load, hence robustly detecting blinks in the
20
A Hands-on Tutorial for Eye Tracking A P REPRINT
eye-tracking data can reveal further insights into visual intake and cognitive processes [Baccour et al., 2019, Appel
et al., 2018, 2021, Chen and Epps, 2014, Biondi et al., 2023].
Fixations and saccades together form the visual scanpath. Detecting and separating these eye movements form an
important step for further analysis, and there are different algorithms in the literature to achieve these, from rather
lightweight, threshold-based algorithms [Agtzidis et al., 2019, Gao et al., 2023] to probabilistic [Tafaj et al., 2012,
Santini et al., 2016] or deep-learning algorithms [Zemblys et al., 2019, Elmadjian et al., 2023]. In this tutorial, we
focus on two simple algorithms that are commonly used based on velocity and dispersion thresholds: Identification
by Velocity-Threshold (I-VT) and Identification by Dispersion-Threshold (IDT) [Salvucci and Goldberg, 2000]. Both
algorithms rely on the fact that fixations are the eye movements that stabilize over a region of interest, and saccades are
characterized by their high speeds as defined in Sections 5.1 and 5.2, respectively. For the algorithms that are different
than I-VT and I-DT or for the customized ones such as for virtual and augmented reality (VR/AR), we refer the reader
to papers by [Salvucci and Goldberg, 2000], [Agtzidis et al., 2019], [Gao et al., 2021], and [Bozkir et al., 2023].
The I-VT algorithm considers gaze velocities while detecting fixations and saccades. After a predetermined threshold
(e.g., 30-50 degrees/second), point-to-point velocities considering each sample are calculated. If the velocity of the
corresponding point is below the predetermined threshold, it is labeled as a fixation point; otherwise, it is identified as a
saccade point. Later, labeled fixation points are grouped together, discarding the saccade points. Then, the centroid of
each fixation group is calculated, and all fixations are returned. It is also essential to ensure a minimum duration for
fixations to count them as valid ones. The summary of the aforementioned process is simplified for better understanding
and depicted in Algorithm 1.
The I-VT algorithm can be considered simple, yet computationally efficient. However, one should note that different
velocity and duration thresholds have been used in the literature, and suitable thresholds should be decided based on
factors such as the experimental setup and the nature of the visual stimuli.
Another commonly used algorithm for eye movement identification is the I-DT algorithm, and it is based on the idea
that fixations have very small variations in gaze positions, whereas the regional movements of the saccades are large.
In this algorithm, as a first step, like I-VT, a predetermined dispersion threshold (i.e., Dthreshold ) and a minimum
duration threshold are set. Then, from a moving window starting from the first sample based on the duration threshold
considering the sampling frequency of the eye tracker, the algorithm checks whether the points in the window lie within
the predetermined dispersion threshold. If the calculated distance (i.e., D = [max(x)−min(x)]+[max(y)−min(y)])
is greater than the predetermined dispersion with D > Dthreshold , it means that the points in the window do not
represent a fixation together. If the calculated distance is less than the dispersion threshold, this means that points within
the window are part of a fixation. In this case, the length of the window is enlarged until D > Dthreshold . When the
final window size is found, points within the final window are assigned to fixation by calculating the centroid of all
fixation points in that window. This process is iterated until there are no more samples to evaluate.
Like the I-VT, the I-DT algorithm can also be considered simple and efficient. However, one should also adjust the
dispersion and duration thresholds according to factors such as experimental design and visual stimuli. In case the
reader is interested in using algorithms that adapt their parameters online (e.g., for tasks that change dynamically) and
other alternatives, such as utilization Kalman filters, we refer to algorithms presented by [Tafaj et al., 2012, Santini
21
A Hands-on Tutorial for Eye Tracking A P REPRINT
et al., 2016] and [Koh et al., 2009, 2010, Komogortsev et al., 2010], respectively. For machine or deep-learning-based
algorithms, where events are extracted in an end-to-end fashion (after adequate training of the underlying models), we
refer the reader to some recent approaches, such as [Fuhl et al., 2018, 2021, Hoppe and Bulling, 2016, Zemblys et al.,
2019, Elmadjian et al., 2023]. Both I-VT and I-DT algorithms are available in our eye-tracking tutorial repository1 .
Pupil dilations and constrictions are distinct from conventional eye movements such as fixations and saccades, focusing
on changes in pupil size rather than the sequence of gaze points. Pupil size is often associated with factors such as task
difficulty [Beatty, 1982] and is related to cognitive load and mental effort [Appel et al., 2021, Castner et al., 2020a]. For
instance, larger pupil size is typically associated with higher cognitive load [Bozkir et al., 2019] and mental effort [Chen
et al., 2011]. However, this measure is sensitive to external influences, particularly illumination changes, especially in
the wild. Therefore, careful processing is essential to interpret pupil diameter data accurately.
To interpret the pupil sizes, once the pupil is detected and its size is identified, a processing pipeline is needed to handle
the temporal aspects of the pupil data before conducting further analysis. While more complex analyses can be applied
to the pupil size data, we focus on the basic preprocessing steps, including data smoothing (e.g., to eliminate blinks and
measurement noise) and baseline correction, which is similar to data normalization.
Blink removal and measurement noise reduction can be achieved through various techniques, including interpolation,
averaging, and filtering. Interpolation involves filling in missing data points during blink periods with estimated values
derived from surrounding data points. In contrast, averaging techniques involve replacing blink periods with the average
pupil sizes before and after the blinks. Filtering involves applying a low-pass filter to the pupil data to smooth it and
remove any high-frequency noise that is caused by blinks.
While such techniques can be effective in artifact removal, each of them has advantages and disadvantages. Interpolation,
for instance, can introduce artifacts and distortions in the pupil data, especially if the duration of the blink is long.
Filtering, on the other hand, can effectively remove blink-related noise but can also filter out some parts of the valid
pupil size. Similar techniques also exist for smoothing the pupil diameter to discard outliers. This can be achieved
by averaging the pupil diameter values across fixed time windows (i.e., simple moving average) or by replacing pupil
diameter values with the median value within a time window (i.e., median filtering). Similar to the aforementioned eye
movement identification algorithms, when choosing a method to remove blinks or smooth pupil diameter values, the
experimental design and visual stimulus details should be considered to increase the quality of pupil data quality and
analyses.
In addition to the blink removal and smoothing of the pupillometry data, baseline correction is another important step,
especially for normalizing the pupil size signal using a baseline duration. Baseline correction is a technique used to
account for the baseline size of pupil diameter to isolate differences related to individuals. There are two mainstream
techniques for baseline correction: subtractive and divisive baseline correction [Mathôt et al., 2018]. In both techniques,
typically, a baseline duration of up to 1 second is selected. In subtractive baseline correction, the median (or mean)
baseline pupil diameter value is subtracted from each data point, resulting in positive and negative changes relative to
the baseline value. In contrast, in divisive baseline correction, each pupil diameter value is divided by the median (or
1
https://gitlab.lrz.de/hctl/Eye-Tracking-Tutorial
22
A Hands-on Tutorial for Eye Tracking A P REPRINT
mean) baseline value. Using median values for baseline correction is often preferred as median values are less sensitive
to noise in the data than mean values.
In this approach, a scanpath is typically represented by encoding the spatial location information of fixations into a string,
where each character denotes the region of interest (ROI) the gaze fixates upon. Various string-comparison algorithms
can be applied to these encoded strings, such as measuring pairwise string similarity through methods like string edit
distance. One such algorithm is the Hamming Distance, which counts the differing characters at corresponding positions
in two strings of equal length. However, the Hamming Distance is limited due to its restrictive nature, as it only allows
substitutions and requires identical string lengths. An alternative algorithm, ScanMatch [Cristino et al., 2010], offers
greater flexibility albeit with increased complexity.
6.1.1 ScanMatch
This method [Cristino et al., 2010] utilizes an established algorithm used for comparing DNA sequences in bioin-
formatics, known as the Needleman-Wunsch algorithm [Needleman and Wunsch, 1970]. In the context of scanpath
comparison, the scanpath is spatially and temporally binned to encode a string that preserves fixation location, duration,
and order information. This allows for comparing spatial, temporal, and sequential characteristics between scanpaths.
Before encoding, the visual stimulus (e.g., the image being viewed by the subject) is divided into ROIs. Each ROI
is assigned a letter (or a group of letters if there are more than 26 ROIs in the image). In Figure 12, a normal string
sequence of “aAaDbB” can be derived from the given scanpath without temporal binning. With temporal binning of 50
ms bins, the resulting encoding is “aAaDaDaDbBbB.”
The scanpath encodings can then be pairwise compared by finding the optimal alignment where, for each position, the
following cases may occur:
This optimal alignment can be calculated efficiently using the Needleman-Wunsch algorithm, a dynamic programming
approach Lew and Mauch [2006], as described in Algorithm 3. Only “two” parameters need to be set: the substitution
matrix and the gap penalty. This algorithm introduces the following concepts:
1. Substitution Matrix: Provides the score for substituting one letter with another. The higher the score, the
more similar the pair of letters are to each other. The scores reflect the relationships between ROIs and can be
23
A Hands-on Tutorial for Eye Tracking A P REPRINT
based on distance (how near they are to each other), color, and semantic segmentation. Moreover, a cutoff
point needs to be chosen, determining whether the score between two ROIs should be positive (highly related)
or not (loosely related).
2. Gap Penalty: This is incurred when an element in the sequence is aligned with a gap instead of a substitution.
Depending on how the penalty is determined, the gap penalty can encourage (or discourage) gaps over
substitutions.
3. Scoring: As the alignment score is highly affected by sequence lengths, the score needs to be normalized,
with the highest score being 1, as shown in Eq. 1.
Algorithm 3 Needleman-Wunsch Algorithm with Insertion and Deletion Costs [Needleman and Wunsch, 1970].
1: procedure N EEDLEMAN W UNSCH(seq1 , seq2 , S, cins , cdel )
2: m ← length of seq1
3: n ← length of seq2
4: Create a 2D matrix DP of size (m + 1) × (n + 1)
5: for i ← 0 to n do
6: DP [i][0] ← i · cdel ▷ Cost of deletion in seq1
7: end for
8: for j ← 0 to m do
9: DP [0][j] ← j · cins ▷ Cost of insertion in seq2
10: end for
11: for i ← 1 to m do
12: for j ← 1 to n do
13: matchScore ← DP [i − 1][j − 1] + S(seq1 [i], seq2 [j]) ▷ Match/Mismatch cost
14: deletionScore ← DP [i − 1][j] + cdel ▷ Cost of deletion in seq1
15: insertionScore ← DP [i][j − 1] + cins ▷ Cost of insertion in seq2
16: DP [i][j] ← min(matchScore, insertionScore, deletionScore) ▷ Fill DP matrix
17: end for
18: end for
19: return DP [n][m] ▷ Final alignment cost
20: end procedure
String-based methods have their limitations, particularly with the need to predefine ROIs, which can constrain the
stimulus and potentially compromise basic spatial information about the scanpath [Jarodzka et al., 2010]. Furthermore,
fixations occurring near the borders of these predefined ROIs may not be accurately quantified. Additionally, such
methods often fail to preserve the inherent locality of the data Newport et al. [2022].
Figure 12: A stimulus divided into 4 × 2 bins. The fixations are represented by circles of varying radius, representing
their corresponding durations in milliseconds. The arrows between subsequent fixations represent the saccades.
24
A Hands-on Tutorial for Eye Tracking A P REPRINT
score
Normalized score = (1)
Max (substitution matrix) ∗ length of longest sequence
Another approach to comparing scanpaths involves their geometrical properties. This method defines the problem
as finding the optimal mapping between fixation locations in both scanpaths according to their spatial distance with
neighboring fixations. [Mannan et al., 1995] proposed a nearest neighbor approach in which the eye movement is
represented as a set of fixations in the form of x and y coordinate pairs. Each fixation in one set is mapped to the nearest
fixation from the other set, resulting in a set of mapping distances. The sum of all mapping distances is then calculated
after normalization, accounting for the length of eye movement sequences.
[Mathôt et al., 2012] extends this approach by proposing a method called Eyenalysis that makes use of a double
mapping technique, which involves mapping each fixation (represented as a vector of arbitrary dimension containing
various properties such as spatial and temporal information) in one scanpath to its nearest neighbor in the other, and
then repeating the process the other way around. This foregoes the decision step of whether fixations should have
multiple connections with the other scanpath. Figure 13 exemplifies the double mapping technique. The distance is
calculated by summing up the point-mappings and normalizing by the length of the longest scanpath. The mapping
between points p from scanpath S and q from scanpath T is associated with the Euclidean distance d(p, q) as defined
below (where n is the number of dimensions):
v
u n
uX
d(p, q) = t (pi − qi )2 (2)
i=1
where nS and nT are the lengths of S and T respectively. The numerator refers to the sum of all point mappings from
S to T and from T to S, respectively.
Another approach is MultiMatch [Dewhurst et al., 2012], which is a more complex vector-based method that produces
scanpath distances across multiple dimensions while still preserving positional and temporal information. First, a
process of amplitude-based and direction-based simplifications on the scanpath is performed by grouping small, locally
Figure 13: A simplistic example of the Eyenalysis approach proposed by [Mathôt et al., 2012]. The resulting distance is
calculated as follows: D(S, T ) = ((30 + 25 + 10 + 30) + (30 + 25 + 30 + 10 + 50))/max(4, 5) = 48.
25
A Hands-on Tutorial for Eye Tracking A P REPRINT
Figure 14: Two ways of determining ROIs as presented by [Kübler et al., 2017]. On the left, regular gridded bins are
formed and overlayed over the data. On the right, the bins are formed using data percentiles. The fixations are assigned
to their corresponding bins represented by the dark and light regions on both figures.
contained saccades together through thresholding and merging successive saccades following the same general direction.
Representative values such as location and fixation duration are chosen as vector dimensions, and an optimal mapping
is determined using the Dijkstra algorithm.
Probabilistic methods tackle the high normal variability between scanpaths by hypothesizing the stochastic nature of
eye movement parameters as random variables sampled from underlying stochastic processes [Coutrot et al., 2018].
Hidden Markov Models (HMMs) have been used to model eye movement by learning an HMM from one or a group
of scanpaths, which may incorporate dynamic and individualistic properties of the gaze behavior allowing for the
extraction of patterns characteristic of a certain class [Coutrot et al., 2018]. Subsmatch 2.0 [Kübler et al., 2017] is
another probabilistic method that employs a novel string kernel approach for scanpath comparison, where the scanpaths
are first encoded into a string and further spliced into smaller subsequences. The frequency of specific subsequences that
resemble typical, repeatedly occurring behavioral patterns is then used as a similarity feature. The concept is derived
from the transition matrix approach, which counts the number of transitions from one ROI to another. It considers
exploratory gaze patterns that consist of sequences of more than two subsequent fixations, using n-gram features to
represent subsequences of length n. The method can be applied to scenarios where labeling ROIs is not possible, such
as viewing abstract art or interactive, dynamic scenarios, using a regular grid or percentile mapping to determine ROIs
from the data as shown in Figure 14. It can also infer scanning patterns associated with specific experimental factors by
applying machine learning techniques, such as a support vector machine (SVM) with a linear kernel.
Deep learning algorithms have emerged as powerful tools in scanpath analysis. These algorithms, based on artificial
neural networks, can automatically learn hierarchical representations of data, making them well-suited for tasks
involving complex patterns and relationships. In the context of scanpath analysis, deep learning algorithms can be
used to extract features from scanpaths that capture both low-level and high-level information, such as the spatial and
temporal characteristics of eye movements and the semantic content of the visual stimuli being viewed. With the growing
popularity of machine learning and the emergence of easily accessible models, along with the introduction of transfer
learning and fine-tuning, a handful of deep learning-based scanpath comparison methods have been introduced. For
instance, [Castner et al., 2020b] proposed a method of extracting scene information at the fixation level by incorporating
convolutional neural networks (CNN) as a means to extract features from each fixation according to the corresponding
image patch on which the fixation has landed. The proposed approach is exemplified in Figure 15. Table 4 shows the
comparison of different approaches to scanpath analysis.
26
A Hands-on Tutorial for Eye Tracking A P REPRINT
Scanpath
Comparison Characteristics Advantages Limitations
Method
String Encodes spatial locations of Applicable for comparing spa- Requirement of predefined
Alignment fixations into strings for com- tial, temporal, and sequential ROIs may compromise spatial
parison using algorithms like characteristics of scanpaths. information. Inherent locality
the Hamming Distance and is also not preserved.
ScanMatch.
Geometrical Captures the optimal map- Does not require predefined May not effectively capture
ping between fixation loca- ROIs and more properly pre- the temporal sequence of fixa-
tions based on spatial dis- serves spatial information tions.
tance, employing techniques compared to string alignment
like nearest neighbor map- methods.
ping and Eyenalysis for dou-
ble mapping.
Probabilistic Models eye movements as Accounts for the stochastic Complex to implement and
random variables sampled nature of eye movement, cap- interpret, requiring sophisti-
from underlying stochastic turing dynamic and individu- cated statistical knowledge.
processes using methods like alistic gaze behavior patterns.
HMMs and Subsmatch 2.0.
Deep Learning Employs neural networks to Can automatically extract Requires significant compu-
learn hierarchical representa- complex patterns and relation- tational resources and large
tions from scanpaths, captur- ships without the need for pre- datasets for training.
ing both spatio-temporal char- defined ROIs.
acteristics and semantic con-
tent.
the results and the interpretation of complex patterns in eye movement behavior. This section introduces important
techniques and best practices for scientifically visualizing eye-tracking data, including saliency maps, scanpaths, and
gaze plots.
Saliency maps are among the most commonly used visualizations of eye-tracking data and serve as a powerful
means to quickly and easily highlight areas within a visual scene that attract the most visual attention. These maps
Figure 15: The proposed approach by [Castner et al., 2020b]. A corresponding image patch (green box) is extracted for
each fixation (red dot). Each image patch (fi where i = 1...N and N is the total number of fixations) is passed through
a CNN to extract the features F (fi ), which are concatenated into a scanpath vector S. This can then be compared with
other scanpath vectors.
27
Augmentation
A Hands-on Tutorial for Eye Tracking Augmentation
A P REPRINT
CXR-Eye
CUB-200-2011
Large
Large
Medium Medium
Small Small
Figure 16: Saliency maps on (a) CUB-GHA and (b) CXR-Eye datasets. Color red indicates more attention from humans,
while color blue indicates less attention.
aggregate fixation data into a heat-like representation that intuitively illustrates the distribution of visual attention. More
specifically, areas that receive a high amount of or longer fixations are typically shown as hot or red, indicating these
spots are of high interest or saliency to the observers. In contrast, areas with fewer or shorter fixations appear cooler,
marked by blue, suggesting that less attention was paid to these parts of the visual scene.
The typical construction of saliency maps involves plotting of fixation points over the stimulus and applying a Gaussian
blur to each point, thereby creating a continuous probability distribution that represents visual attention. Due to its
simplicity, saliency maps are used to quickly assess visual attention distribution in various applications, such as usability
testing, marketing research, psychology and many more [Borji and Itti, 2012, Eder et al., 2021, Kou et al., 2023, Yan
et al., 2021, Rong et al., 2022].
Eye-tracker
Display
l
Human d
Eye
θ
Figure 17: Illustration of a human observing an image on the eye tracker display.
Their construction is described in detail in the following. To visually illustrate human attention, i.e., gaze fixation, it is
common to add a Gaussian filter on fixation points to form a heatmap [Judd et al., 2012], which is also called saliency
map [Kümmerer et al., 2016], which is depicted in Figure 16. This technique is usually applied to the gaze data remote
eye trackers collect. In this section, we introduce how to visualize the saliency map for fixation in practice. We use the
gaze data from the dataset CUB-GHA [Rong et al., 2021] as an example to clarify the procedure 2 . Before collecting
gaze data, the practitioner should gather the following information:
2
Code for visualization can be found at https://gitlab.lrz.de/hctl/Eye-Tracking-Tutorial.
28
A Hands-on Tutorial for Eye Tracking A P REPRINT
Figure 18: Scanpath visualization representing measured and transformed fixations of a museum visitor viewing
paintings and descriptive texts sequentially.
After collecting the fixation gaze data, the saliency maps can be visualized based on the fixation position (on the image)
and duration. Concretely, Figure 17 illustrates a human observing an image on the eye tracker display. We post-process
every fixation location as a Gaussian distribution N (µ, σ 2 ) on the gaze fixation saliency map, where σ is 75 pixels (in
the display’s resolution). We calculate the standard deviation σ as follows. d refers to the distance between the human
eye and the eye tracker display. In practice, for instance in [Rong et al., 2021], d is set to 60 cm, and the visual angle θ
is set to 2◦ following [Vickers, 2007]. In this case, l = tan 2◦ · d = 21 mm. According to the settings of the display, in
the horizontal direction, the length of the display is 530 mm, and the resolution is 1920 pixels. Therefore, we can get
that l = 21 mm covers approximately 75 pixels on the display. We set 75 pixels as the standard deviation with the
image rescaled to the display resolution (1920 × 1080). The saliency map is rescaled to its original size afterwards.
7.2 Scanpaths
The concept of visual scanpaths, as previously discussed, applies to the sequential order of fixations and accompanying
saccadic movements [Goldberg and Helfman, 2014, Noton and Stark, 1971b]. Figure 18 visually represents such a
scanpath, where fixations are depicted as dots and saccades as connecting lines. The size of the dots correlates with
fixation duration, providing precision to the scanpath and enabling insights into overall gaze behavior [Holmqvist et al.,
2011].
Accurately visualizing a scanpath requires essential information such as stimuli dimensions, fixation coordinates, and
corresponding durations. This data extraction process is relatively straightforward with remote eye-tracking systems,
involving knowledge of stimulus placement on the display and subsequent data processing for fixation metrics. However,
in scenarios involving navigation through three-dimensional environments while wearing a head-mounted eye tracker,
such as exploring a museum, visualizing resulting scanpaths introduces added complexity. The spatial nature of the
setting necessitates careful transformation of measured fixations into a 2D representation.
A simplified approach to scanpath visualization may entail focusing solely on illustrating saccadic lines. This streamlined
method proves advantageous when comparing multiple scanpaths by overlaying them on the same stimuli. By omitting
dots representing fixation duration, the level of visual disorder and overload in the visualization is reduced, facilitating
clearer comparisons between different scanpaths. This approach enhances the interpretability of visual exploration
patterns in complex environments, thereby aiding researchers in understanding gaze behavior dynamics more effectively.
Gaze plot visualization is a method used to depict eye-tracking data, offering valuable insights into an individual’s
attentional focus and the sequence in which they explore different areas within a stimulus [Burch et al., 2014, Takahashi
et al., 2018, Shrestha and Lenz, 2007]. Unlike scanpaths, which offer a chronological perspective of eye movements,
gaze plots focus on visualizing an individual’s gaze path by illustrating the sequence and spatial distribution of fixations
and saccades over a specified time window [Takahashi et al., 2018]. By graphically representing an individual’s gaze
path, gaze plots depict how attention shifts across different parts of the stimulus, aiding researchers and practitioners in
understanding visual exploration patterns.
29
A Hands-on Tutorial for Eye Tracking A P REPRINT
Figure 19: Gaze plot visualization adapted from [Blascheck et al., 2014].
In a traditional gaze plot, as depicted in Figure 19, each fixation is typically represented by a circle, with the size of the
circle indicating the duration of the fixation. Similarly to scanpaths, lines connecting the circles depict the saccades, or
rapid eye movements, between the fixations.
Gaze plot visualization provides valuable insights into visual attention patterns and cognitive processes. This information
can be leveraged to inform the design of more effective interfaces, advertisements, or educational materials, enhancing
user engagement and comprehension.
8 Ethical Considerations
As eye-tracking data is collected from human participants, one should consider important ethical issues when the
purpose of the data collection is research. The eye-tracking community has recently also drawn attention to such
issues [Byrne et al., 2024], and we here cover the most relevant aspects. In general, in eye-tracking experiments,
it is important to obtain ethical approval from the institutional review board (IRB) before conducting user studies
to ensure the well-being of the participants. IRBs assess the ethical aspects of research and relevant experiments
to ensure studies adhere to ethical standards and regulations. These ethical aspects include but are not limited to
recruitment methods, informed consent procedures, risks and benefits for the participants, participants’ privacy, and
confidentiality of personal information. The Belmont Report [National Commission for the Protection of Human
Subjects of Biomedical and Behavioral Research, 1979] and the Declaration of Helsinki [World Medical Association,
2013] provide several principles and guidelines for research ethics to this end. Both of these reports also acknowledge
informed consent and risk and benefit assessment as essential applications, further recognizing the importance of
participant selection, privacy, and confidentiality.
Informed consent. Informed consent is a critical component of any research study that includes human participants,
and it is the process of obtaining voluntary agreement from a participant to participate in a study after they are provided
with relevant information about the study, such as the purpose of the study, experimental procedures, potential risks,
and benefits. It is essential for the researchers that the informed consent is designed in a way that participants clearly
understand what they consent to, and it protects their rights and welfare.
30
A Hands-on Tutorial for Eye Tracking A P REPRINT
In a step-by-step manner, in most lab studies, the researcher should provide a description of the details of the study,
equipment, procedures involved, and potential risks and benefits, in addition to the written consent form. The researchers
should also provide information on how the data will be collected, stored, and used further, with the researcher’s or
principal investigator’s contact information. The researcher should answer any questions that participants may have and
ensure they understand the information. Once participants have reviewed and understood the details and consent form,
they should sign it to indicate their voluntary agreement to participate in the study. If a user study involves minors who
are under 18 years of age, as they can only assent but not provide consent, a parent or a legal guardian should sign the
consent form. If a user study includes data collection in the form of images or videos (e.g., video data of eye images
that include iris textures, which form personally identifiable data), the consent forms should explicitly mention this
and provide further details on how the data will be processed, stored, and managed. In addition, the researcher should
ensure that the participant is not pressured into providing consent, and they should clearly state that participants are free
to withdraw from the study without any consequences at any time. The researcher should keep copies of the signed
consent forms and other relevant experiment documents.
Risks and benefits. Researchers and IRBs should carefully assess the potential risks and benefits that experiments
provide in experiments. Most eye-tracking experiments have small to negligible risks unless the presented stimuli
in the experiments are not sensitive and risky or evoke emotions that might harm the participants psychologically or
physically. The experiments in extended reality (XR) might cause cybersickness, nausea, or dizziness. In addition, long
eye-tracking experiments might cause boredom.
Some studies might involve deception, making the experiments and ethical processes more complex. The user studies
and experiments that include deception must be carefully designed and thoroughly reviewed by IRBs as these involve
greater risks to participants than conventional experiments. While the informed consent procedures remain similar
for the studies with deception, in such experiments, the participants must be informed and agree with the deception
protocol in advance. This can be achieved by recruiting the participants from a pool that participants have already
approved in advance to participate in experiments that involve deception. In such experiments, it is essential to fully
debrief participants after the completion of the experiment. Debriefing should include an explanation and purpose of
the deception. As deception can potentially cause harm to participants, it is important to consider the risks involved in
the study and ensure that they are justified. Deception should only be used when it is necessary to answer a research
question that cannot be answered without deception. In addition, to ensure the wellness of the participants, it may be
necessary to include control conditions in the experiments to compare the outcome of the deception condition with the
control condition.
Regarding benefits, gift cards, monetary support, or hourly course credits (in some institutions) are usual ways of
compensating participants’ time and effort. When compensated with monetary support, researchers should keep the
minimum hourly rates in the particular country in mind; however, compensations should not be very high to attract
potential participants that participants only join the study due to the high amount of money, which might create a
potentially biased sample pool.
Privacy and Confidentiality. Regardless of the informed consent or deception processes, it is vital to maintain the
confidentiality of personal information and protect participants’ privacy. Collected data, including eye tracking, should
be anonymized if possible, and personally identifiable information should not be included in any publications unless
the participants provide explicit permission. In addition, data from user studies, such as eye movement data, can be
representative of personal identifiers or sensitive participant characteristics depending on the displayed stimulus [Bozkir
et al., 2023, Liebling and Preibusch, 2014]. Researchers and practitioners must not attempt to identify such identifiers
and characteristics unless the research questions require these to be carried out and participants agree with them.
In addition, especially for practitioners, it is advisable that privacy-preserving approaches for eye-tracking data are
employed [Bozkir et al., 2020, 2021, David-John et al., 2023, Elfares et al., 2023, 2024, Ozdel et al., 2024].
This tutorial represents a gentle and comprehensive introduction to the essentials of eye-tracking user studies. We
began with an overview of the technology and the calibration necessary for accurate data collection. We progressed by
defining the basic types of eye movements – fixations, saccades, blinks, and smooth pursuits – and their significance in
understanding human cognitive processes. Popular algorithms for detecting these movements and the formation of
visual scanpaths were presented, along with techniques for processing pupillometry data. We extended the discussion
to the complexities of visual scanpaths, including their processing, comparison, and visualization techniques. The
technical content was complemented and rounded up by addressing ethical considerations in eye-tracking user studies,
emphasizing the importance of informed consent, the assessment of risks and benefits, as well as the safeguarding of
participant privacy.
31
A Hands-on Tutorial for Eye Tracking A P REPRINT
We believe that this holistic approach to eye-tracking user studies not only enhances the students’ and practitioners’
understanding of eye movements in many applications and scientific fields, especially in the context of human-computer
interaction, but also reinforces best practices in conducting and evaluating eye-tracking research in a principled and
ethical manner.
References
David Atchison. Optics of the Human Eye. CRC Press, 2 edition, 2023. doi:10.1201/9781003128601. URL
https://doi.org/10.1201/9781003128601.
Kenneth Holmqvist, Marcus Nyström, Richard Andersson, Richard Dewhurst, Halszka Jarodzka, and Joost Van de
Weijer. Eye tracking: A comprehensive guide to methods and measures. OUP Oxford, 2011.
Helga Kolb. Gross Anatomy of the Eye. 01 1995.
Rhcastilhos and Jmarchn. Schematic diagram of the human eye, 2007. URL https://commons.wikimedia.org/
wiki/File:Schematic_diagram_of_the_human_eye_en.svg. Last access 4/3/24.
Hari Singh and Jaswinder Singh. Human eye tracking and related issues: A review. International Journal of Scientific
and Research Publications, 2(9):1–9, 2012.
Jane F Koretz and George H Handelman. How the human eye focuses. Scientific American, 259(1):92–99, 1988.
Richard Andersson, Marcus Nyström, and Kenneth Holmqvist. Sampling frequency and eye-tracking measures: how
speed affects durations, latencies, and more. Journal of Eye Movement Research, 3(3), 2010.
Nicholas Wade and Benjamin Tatler. The Moving Tablet of the Eye: The Origins of Modern Eye Movement Research.
Oxford University Press, 2005.
Andrew Duchowski. Eye tracking methodology: Theory and practice. Springer, 2007.
Wolfgang Fuhl, Thomas C. Kübler, Dennis Hospach, Oliver Bringmann, Wolfgang Rosenstiel, and Enkelejda Kasneci.
Ways of improving the precision of eye tracking data: Controlling the influence of dirt and dust on pupil detection.
Journal of Eye Movement Research, 10(3), may 2017a.
Wolfgang Fuhl, Thomas Kübler, Katrin Sippel, Wolfgang Rosenstiel, and Enkelejda Kasneci. ExCuSe: Robust pupil
detection in real-world scenarios. In Computer Analysis of Images and Patterns, pages 39–51, Cham, 2015. Springer
International Publishing. doi:10.1007/978-3-319-23192-1_4.
Wolfgang Fuhl, Thiago Santini, Thomas Kübler, and Enkelejda Kasneci. Else: ellipse selection for robust pupil
detection in real-world environments. pages 123–130, 03 2016. doi:10.1145/2857491.2857505.
Wolfgang Fuhl, Thiago Santini, Gjergji Kasneci, and Enkelejda Kasneci. Pupilnet v2.0: Convolutional neural networks
for robust pupil detection. In CoRR, 2017b.
Christian Nitschke, Atsushi Nakazawa, and Haruo Takemura. Corneal imaging revisited: An overview of corneal
reflection analysis and applications. IPSJ Transactions on Computer Vision and Applications, 5:1–18, 2013.
doi:10.2197/ipsjtcva.5.1.
Lucie Lévêque, Hilde Bosmans, Lesley Cockmartin, and Hantao Liu. State of the art: Eye-tracking studies in medical
imaging. IEEE Access, 6:37023–37034, 2018. doi:10.1109/ACCESS.2018.2851451.
Alicia Abundis-Gutiérrez, Victor Hugo González-Becerra, Jahaziel Molina Del Rio, Mónica Almeida López, Anaid
Amira Villegas Ramírez, Diana Ortiz Sánchez, José Rodolfo Alcázar Huerta, and Luis Alfonso Zepeda Capilla.
Reading comprehension and eye-tracking in college students: Comparison between low-and middle-skilled readers.
Psychology, 9(15):2972–2983, 2018.
Michel Wedel and Rik Pieters. A review of eye-tracking research in marketing. Review of marketing research, pages
123–147, 2017.
Thammathip Piumsomboon, Gun Lee, Robert W. Lindeman, and Mark Billinghurst. Exploring natural eye-gaze-based
interaction for immersive virtual reality. In 2017 IEEE Symposium on 3D User Interfaces (3DUI), pages 36–39,
2017. doi:10.1109/3DUI.2017.7893315.
Bronisław Kapitaniak, Marta Walczak, Marcin Kosobudzki, Zbigniew Jóźwiak, and Alicja Bortkiewicz. Application
of eye-tracking in drivers testing: A review of research. International journal of occupational medicine and
environmental health, 28(6), 2015.
Nerijus Ramanauskas. Calibration of video-oculographical eye-tracking system. Elektronika Ir Elektrotechnika, 72(8):
65–68, 2006.
32
A Hands-on Tutorial for Eye Tracking A P REPRINT
Marcus Nyström, Richard Andersson, Kenneth Holmqvist, and Joost Van De Weijer. The influence of calibration method
and eye physiology on eyetracking data quality. Behavior research methods, 45:272–288, 2013. doi:10.3758/s13428-
012-0247-4.
Gang Liu, Yuechen Yu, Kenneth Alberto Funes Mora, and Jean-Marc Odobez. A differential approach for gaze
estimation with calibration. In BMVC, volume 2, page 6, 2018.
Carlos H Morimoto, Dave Koons, Arnon Amir, and Myron Flickner. Frame-rate pupil detector and gaze tracker. In
Proceedings of the IEEE ICCV, volume 99, 1999.
SV Sheela and PA Vijaya. Mapping functions in gaze tracking. International Journal of Computer Applications, 26(3):
36–42, 2011.
Adithya Balasubramanyam, Lee Hanna, Pavan Kumar B N, and Youngho Chai. Calibration techniques and gaze
accuracy estimation in pupil labs eye tracker. TECHART: Journal of Arts and Imaging Science, 5:38–41, 02 2018.
doi:10.15323/techart.2018.2.5.1.38.
Tobii. Calibration. https://developer.tobiipro.com/commonconcepts/calibration.html, 2024. Last
access 4/3/24.
Tobii. Eye tracker calibration and validation. https://connect.tobii.com/s/article/
eye-tracker-calibration, 2023a. Last access 4/3/24.
Thiago Santini, Wolfgang Fuhl, and Enkelejda Kasneci. Calibme: Fast and unsupervised eye tracker calibration for
gaze-based pervasive human-computer interaction. pages 2594–2605, 05 2017a. doi:10.1145/3025453.3025950.
Thiago Santini, Hanna Brinkmann, Luise Reitstätter, Helmut Leder, Raphael Rosenberg, Wolfgang Rosenstiel, and
Enkelejda Kasneci. The art of pervasive eye tracking: Unconstrained eye tracking in the austrian gallery belvedere.
In Proceedings of the 7th workshop on pervasive eye tracking and mobile eye-based interaction, pages 1–8, 2018.
Diederick C Niehorster, Thiago Santini, Roy S Hessels, Ignace TC Hooge, Enkelejda Kasneci, and Marcus Nyström.
The impact of slippage on the data quality of head-worn eye trackers. Behavior research methods, 52:1140–1160,
2020.
Susan M Kolakowski and Jeff B Pelz. Compensating for eye tracker camera movement. In Proceedings of the 2006
symposium on Eye tracking research & applications, pages 79–85, 2006.
Faisal Karmali and Mark Shelhamer. Automatic detection of camera translation in eye video recordings using multiple
methods. In The 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society,
volume 1, pages 1525–1528. IEEE, 2004.
Bernardo Pires, Myung Hwangbo, Michael Devyver, and Takeo Kanade. Visible-spectrum gaze tracking for sports. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 1005–1010,
2013.
Yusuke Sugano and Andreas Bulling. Self-calibrating head-mounted eye trackers using egocentric visual saliency. In
Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology, pages 363–372, 2015.
Thiago Santini, Wolfgang Fuhl, and Enkelejda Kasneci. Calibme: Fast and unsupervised eye tracker calibration for
gaze-based pervasive human-computer interaction. In Proceedings of the 2017 chi conference on human factors in
computing systems, pages 2594–2605, 2017b.
Christian Lander, Frederic Kerber, Thorsten Rauber, and Antonio Krüger. A time-efficient re-calibration algorithm for
improved long-term accuracy of head-worn eye trackers. In Proceedings of the Ninth Biennial ACM Symposium on
Eye Tracking Research & Applications, pages 213–216, 2016.
Michael Xuelin Huang, Tiffany CK Kwok, Grace Ngai, Stephen CF Chan, and Hong Va Leong. Building a personalized,
auto-calibrating eye tracker from user interactions. In Proceedings of the 2016 CHI Conference on Human Factors in
Computing Systems, pages 5169–5179, 2016.
Tobii. Eye Tracker Data Quality Test Report. https://www.tobiipro.com/siteassets/tobii-pro/
accuracy-and-precision-tests/tobii-pro-spectrum-accuracy-and-precision-test-report.
pdf/?v=1.1, 2022. Last access 25.7.22.
Joseph E. McGrath. Methodology matters: Doing research in the behavioral and social sciences. In Readings in
Human–Computer Interaction, pages 152–169. 1995. doi:10.1016/B978-0-08-051574-8.50019-4.
Tobii. Tobii pro lab. https://www.tobii.com/products/software/behavior-research-software/
tobii-pro-lab, 2023b. Last access 4/3/24.
Enkelejda Kasneci, Gjergji Kasneci, Thomas C Kübler, and Wolfgang Rosenstiel. The applicability of probabilistic
methods to the online recognition of fixations and saccades in dynamic scenes. In Proceedings of the symposium on
eye tracking research and applications, pages 323–326, 2014. doi:10.1145/2578153.2578213.
33
A Hands-on Tutorial for Eye Tracking A P REPRINT
Enkelejda Tafaj, Gjergji Kasneci, Wolfgang Rosenstiel, and Martin Bogdan. Bayesian online clustering of eye
movement data. In Proceedings of the symposium on eye tracking research and applications, pages 285–288, 2012.
doi:10.1145/2168556.2168617.
Roy S. Hessels, Diederick C. Niehorster, Marcus Nyström, Richard Andersson, and Ignace T. C. Hooge. Is the
eye-movement field confused about fixations and saccades? a survey among 124 researchers. Royal Society Open
Science, 5(8):180502, 2018. doi:10.1098/rsos.180502.
Dario D. Salvucci and Joseph H. Goldberg. Identifying fixations and saccades in eye-tracking protocols. In Proceedings
of the 2000 Symposium on Eye Tracking Research & Applications, page 71–78. ACM, 2000. doi:10.1145/355017
.355028.
Keith Rayner. Eye movements in reading and information processing: 20 years of research. Psychological bulletin, 124
(3):372, 1998.
Marc Pomplun, Tyler W. Garaas, and Marisa Carrasco. The effects of task difficulty on visual search strategy in virtual
3D displays. Journal of Vision, 13(3):24–24, 2013. ISSN 1534-7362. doi:10.1167/13.3.24.
Hong Gao, Efe Bozkir, Lisa Hasenbein, Jens-Uwe Hahn, Richard Göllner, and Enkelejda Kasneci. Digital transforma-
tions of classrooms in virtual reality. In Proceedings of the 2021 CHI Conference on Human Factors in Computing
Systems. ACM, 2021. doi:10.1145/3411764.3445596.
Andreas Gegenfurtner, Erno Lehtinen, and Roger Säljö. Expertise differences in the comprehension of visualizations: A
meta-analysis of eye-tracking research in professional domains. Educational Psychology Review - EDUC PSYCHOL
REV, 23:523–552, 12 2011. doi:10.1007/s10648-011-9174-7.
Thomas Kübler, Shahram Eivazi, and Enkelejda Kasneci. Automated visual scanpath analysis reveals the expertise
level of micro-neurosurgeons. 10 2015.
Nora Castner, Shahram Eivazi, Katharina Scheiter, and Enkelejda Kasneci. Using eye tracking to evaluate and develop
innovative teaching strategies for fostering image reading skills of novices in medical training. Eye Tracking
Enhanced Learning (ETEL2017), 2017.
Ioannis Agtzidis, Mikhail Startsev, and Michael Dorr. 360-degree video gaze behaviour: A ground-truth data set
and a classification algorithm for eye movements. In Proceedings of the 27th ACM International Conference on
Multimedia, page 1007–1015. ACM, 2019. doi:10.1145/3343031.3350947.
Joseph H. Goldberg, Mark J. Stimson, Marion Lewenstein, Neil Scott, and Anna M. Wichansky. Eye tracking in web
search tasks: Design implications. In Proceedings of the 2002 Symposium on Eye Tracking Research & Applications,
page 51–58. ACM, 2002. doi:10.1145/50 7072.507082.
Dale Purves, George J Augustine, David Fitzpatrick, Lawrence C Katz, Anthony-Samuel LaMantia, James O McNamara,
and S Mark Williams. Neuroscience. 2nd edition. Sinauer Associates 2001, 2001. ISBN 0-87893-742-0. Types of
Eye Movements and Their Functions.
Thiago Santini, Wolfgang Fuhl, Thomas Kübler, and Enkelejda Kasneci. Bayesian identification of fixations, saccades,
and smooth pursuits. In Proceedings of the Ninth Biennial ACM Symposium on Eye Tracking Research & Applications,
pages 163–170, 2016.
Enkelejda Kasneci, Gjergji Kasneci, Thomas C Kübler, and Wolfgang Rosenstiel. Online recognition of fixations,
saccades, and smooth pursuits for automated analysis of traffic hazard perception. In Artificial Neural Networks:
Methods and Applications in Bio-/Neuroinformatics, pages 411–434. Springer, 2015.
Harvey Richard Schiffman. Sensation and perception: An integrated approach. John Wiley & Sons; 5th edition, 2001.
Mohamed Hedi Baccour, Frauke Driewer, Enkelejda Kasneci, and Wolfgang Rosenstiel. Camera-based eye blink
detection algorithm for assessing driver drowsiness. In 2019 IEEE Intelligent Vehicles Symposium (IV), pages
987–993. IEEE, 2019.
Tobias Appel, Christian Scharinger, Peter Gerjets, and Enkelejda Kasneci. Cross-subject workload classification using
pupil-related measures. In Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications,
pages 1–8, 2018.
Tobias Appel, Peter Gerjets, Stefan Hoffman, Korbinian Moeller, Manuel Ninaus, Christian Scharinger, Natalia
Sevcenko, Franz Wortha, and Enkelejda Kasneci. Cross-task and cross-participant classification of cognitive load in
an emergency simulation game. IEEE Transactions on Affective Computing, 2021.
Siyuan Chen and Julien Epps. Using task-induced pupil diameter and blink rate to infer cognitive load. Hu-
man–Computer Interaction, 29(4):390–413, 2014. doi:10.1080/07370024.2014.892428.
34
A Hands-on Tutorial for Eye Tracking A P REPRINT
Francesco N. Biondi, Babak Saberi, Frida Graf, Joel Cort, Prarthana Pillai, and Balakumar Balasingam. Distracted
worker: Using pupil size and blink rate to detect cognitive load during manufacturing tasks. Applied Ergonomics,
106:103867, 2023. ISSN 0003-6870. doi:10.1016/j.apergo.2022.103867.
Hong Gao, Lisa Hasenbein, Efe Bozkir, Richard Göllner, and Enkelejda Kasneci. Exploring gender differences in
computational thinking learning in a vr classroom: Developing machine learning models using eye-tracking data and
explaining the models. International Journal of Artificial Intelligence in Education, 33(4):929–954, 2023. ISSN
1560-4306. doi:10.1007/s40593-022-00316-z.
Raimondas Zemblys, Diederick C. Niehorster, and Kenneth Holmqvist. gazeNet: End-to-end eye-movement event
detection with deep neural networks. Behavior Research Methods, 51(2):840–864, 2019. ISSN 1554-3528.
doi:10.3758/s13428-018-1133-5.
Carlos Elmadjian, Candy Gonzales, Rodrigo Lima da Costa, and Carlos H. Morimoto. Online eye-movement classifica-
tion with temporal convolutional networks. Behavior Research Methods, 55(7):3602–3620, 2023. ISSN 1554-3528.
doi:10.3758/s13428-022-01978-2.
Efe Bozkir, Suleyman Ozdel, Mengdi Wang, Brendan David-John, Hong Gao, Kevin Butler, Eakta Jain, and Enkele-
jda Kasneci. Eye-tracked virtual reality: A comprehensive survey on methods and privacy challenges. 2023.
doi:10.48550/arXiv.2305.14080.
Do Hyong Koh, Sandeep A. Munikrishne Gowda, and Oleg V. Komogortsev. Input evaluation of an eye-gaze-guided
interface: kalman filter vs. velocity threshold eye movement identification. In Proceedings of the 1st ACM SIGCHI
Symposium on Engineering Interactive Computing Systems, page 197–202. ACM, 2009. ISBN 9781605586007.
doi:10.1145/1570433.1570470.
Do Hyong Koh, Sandeep Munikrishne Gowda, and Oleg V. Komogortsev. Real time eye movement identification
protocol. In CHI ’10 Extended Abstracts on Human Factors in Computing Systems, page 3499–3504. ACM, 2010.
ISBN 9781605589305. doi:10.1145/1753846.1754008.
Oleg V. Komogortsev, Sampath Jayarathna, Do Hyong Koh, and Sandeep Munikrishne Gowda. Qualitative and
quantitative scoring and evaluation of the eye movement classification algorithms. In Proceedings of the 2010
Symposium on Eye-Tracking Research & Applications, page 65–68. ACM, 2010. doi:10.1145/1743666.1743682.
Wolfgang Fuhl, Nora Castner, and Enkelejda Kasneci. Histogram of oriented velocities for eye movement detection. In
Proceedings of the Workshop on Modeling Cognitive Processes from Multimodal Data, pages 1–6, 2018.
Wolfgang Fuhl, Yao Rong, and Enkelejda Kasneci. Fully convolutional neural networks for raw eye tracking data
segmentation, generation, and reconstruction. In 2020 25th International Conference on Pattern Recognition (ICPR),
pages 142–149. IEEE, 2021.
Sabrina Hoppe and Andreas Bulling. End-to-end eye movement detection using convolutional neural networks. arXiv
preprint arXiv:1609.02452, 2016.
Jackson Beatty. Task-evoked pupillary responses, processing load, and the structure of processing resources. Psycho-
logical bulletin, 91(2):276, 1982. doi:10.1037/0033-2909.91.2.276.
Nora Castner, Tobias Appel, Thérése Eder, Juliane Richter, Katharina Scheiter, Constanze Keutel, Fabian Hüttig,
Andrew Duchowski, and Enkelejda Kasneci. Pupil diameter differentiates expertise in dental radiography visual
search. PloS one, 15(5):e0223941, 2020a.
Efe Bozkir, David Geisler, and Enkelejda Kasneci. Assessment of driver attention during a safety critical situation in
VR to generate VR-based training. In ACM Symposium on Applied Perception 2019, pages 23:1–23:5. ACM, 2019.
doi:10.1145/3343036.3343138.
Siyuan Chen, Julien Epps, Natalie Ruiz, and Fang Chen. Eye activity as a measure of human mental effort in HCI.
In Proceedings of the 16th International Conference on Intelligent User Interfaces, page 315–318. ACM, 2011.
doi:10.1145/1943403.1943454.
Sebastiaan Mathôt, Jasper Fabius, Elle Van Heusden, and Stefan Van der Stigchel. Safe and sensible preprocessing and
baseline correction of pupil-size data. Behavior Research Methods, 50(1):94–106, 2018. doi:10.3758/s13428-017-
1007-2.
David Noton and Lawrence Stark. Scanpaths in saccadic eye movements while viewing and recognizing patterns.
Vision research, 11(9):929–IN8, 1971a. doi:10.1016/0042-6989(71)90213-6.
David Noton and Lawrence Stark. Scanpaths in eye movements during pattern perception. Science, 171(3968):308–311,
1971b. doi:10.1126/science.171.3968.308.
Mihai T, ichindelean, Monica Teodora T, ichindelean, Iuliana Cetină, and Gheorghe Orzan. A comparative eye tracking
study of usability—towards sustainable web design. Sustainability, 13(18):10415, 2021. doi:10.3390/su131810415.
35
A Hands-on Tutorial for Eye Tracking A P REPRINT
Nora Castner, Thomas C Kuebler, Katharina Scheiter, Juliane Richter, Thérése Eder, Fabian Hüttig, Constanze
Keutel, and Enkelejda Kasneci. Deep semantic gaze embedding and scanpath comparison for expertise classifi-
cation during opt viewing. In ACM symposium on eye tracking research and applications, pages 1–10, 2020b.
doi:10.1145/3379155.3391320.
Wenjin Li, Wenju Zhou, Minrui Fei, Yulin Xu, and Erfu Yang. Eye tracking methodology for diagnosing neu-
rological diseases: a survey. In 2020 Chinese Automation Congress (CAC), pages 2158–2162. IEEE, 2020.
doi:10.1109/CAC51589.2020.9326691.
Stanislav Popelka and Marketa Beitlova. Scanpath comparison using scangraph for education and learning purposes:
Summary of previous educational studies performed with the use of scangraph. In 2022 Symposium on Eye Tracking
Research and Applications, pages 1–6, 2022. doi:10.1145/3517031.3529243.
Thomas C Kübler, Colleen Rothe, Ulrich Schiefer, Wolfgang Rosenstiel, and Enkelejda Kasneci. Subsmatch 2.0:
Scanpath comparison and classification based on subsequence frequencies. Behavior research methods, 49:1048–
1064, 2017. doi:10.3758/s13428-016-0765-6.
Filipe Cristino, Sebastiaan Mathôt, Jan Theeuwes, and Iain D Gilchrist. ScanMatch: A novel method for comparing
fixation sequences. Behavior research methods, 42:692–700, 2010. doi:10.3758/BRM.42.3.692.
Saul B Needleman and Christian D Wunsch. A general method applicable to the search for similarities in the amino
acid sequence of two proteins. Journal of molecular biology, 48(3):443–453, 1970. doi:https://doi.org/10.1016/0022-
2836(70)90057-4.
Art Lew and Holger Mauch. Dynamic programming: A computational tool, volume 38. Springer, 2006.
Halszka Jarodzka, Kenneth Holmqvist, and Marcus Nyström. A vector-based, multidimensional scanpath similarity
measure. In Proceedings of the 2010 symposium on eye-tracking research & applications, pages 211–218, 2010.
doi:10.1145/1743666.1743718.
Robert Ahadizad Newport, Carlo Russo, Sidong Liu, Abdulla Al Suman, and Antonio Di Ieva. Softmatch: Comparing
scanpaths using combinatorial spatio-temporal sequences with fractal curves. Sensors, 22(19):7438, 2022.
S Mannan, Keith H Ruddock, and David S Wooding. Automatic control of saccadic eye movements made in visual
inspection of briefly presented 2-d images. Spatial vision, 9(3):363–386, 1995. doi:10.1163/156856895x00052.
Sebastiaan Mathôt, Filipe Cristino, Iain D Gilchrist, and Jan Theeuwes. A simple way to estimate similarity between
pairs of eye movement sequences. Journal of Eye Movement Research, 5(1), 2012. doi:10.16910/jemr.5.1.4.
Richard Dewhurst, Marcus Nyström, Halszka Jarodzka, Tom Foulsham, Roger Johansson, and Kenneth Holmqvist.
It depends on how you look at it: Scanpath comparison in multiple dimensions with multimatch, a vector-based
approach. Behavior research methods, 44:1079–1100, 2012. doi:10.3758/s13428-012-0212-2.
Antoine Coutrot, Janet H Hsiao, and Antoni B Chan. Scanpath modeling and classification with hidden markov models.
Behavior research methods, 50(1):362–379, 2018. doi:10.3758/s13428-017-0876-8.
Ali Borji and Laurent Itti. State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and
machine intelligence, 35(1):185–207, 2012.
Thérése F Eder, Juliane Richter, Katharina Scheiter, Constanze Keutel, Nora Castner, Enkelejda Kasneci, and Fabian
Huettig. How to support dental students in reading radiographs: effects of a gaze-based compare-and-contrast
intervention. Advances in Health Sciences Education, 26:159–181, 2021.
Qiqi Kou, Ruihang Liu, Chen Lv, He Jiang, and Deqiang Cheng. Advertising image saliency prediction method based
on score level fusion. IEEE Access, 11:8455–8466, 2023.
Fei Yan, Cheng Chen, Peng Xiao, Siyu Qi, Zhiliang Wang, and Ruoxiu Xiao. Review of visual saliency prediction:
Development process from neurobiological basis to deep models. Applied Sciences, 12(1):309, 2021.
Yao Rong, Nora Castner, Efe Bozkir, and Enkelejda Kasneci. User trust on an explainable ai-based medical diagnosis
support system. arXiv preprint arXiv:2204.12230, 2022.
Tilke Judd, Frédo Durand, and Antonio Torralba. A benchmark of computational models of saliency to predict human
fixations. http://hdl.handle.net/1721.1/68590, 2012.
Matthias Kümmerer, Thomas SA Wallis, and Matthias Bethge. DeepGaze II: Reading fixations from deep features
trained on object recognition. 2016. doi:10.48550/arXiv.1610.01563.
Yao Rong, Wenjia Xu, Zeynep Akata, and Enkelejda Kasneci. Human attention in fine-grained classification. BMVC,
2021. doi:10.48550/arXiv.2111.01628.
Joan N Vickers. Perception, cognition, and decision training: The quiet eye in action. Human Kinetics, 2007.
36
A Hands-on Tutorial for Eye Tracking A P REPRINT
Joseph Goldberg and Jonathan Helfman. Eye tracking on visualizations: Progressive extraction of scanning strategies,
pages 337–372. 01 2014. doi:10.1007/978-1-4614-7485-2_13.
Michael Burch, Fabian Beck, Michael Raschke, Tanja Blascheck, and Daniel Weiskopf. A dynamic graph visualization
perspective on eye movement data. In Proceedings of the Symposium on Eye Tracking Research and Applications,
page 151–158. ACM, 2014. doi:10.1145/2578153.2578175.
Ryo Takahashi, Hiromasa Suzuki, Jouh Yeong Chew, Yutaka Ohtake, Yukie Nagai, and Koichi Ohtomi. A system
for three-dimensional gaze fixation analysis using eye tracking glasses. Journal of Computational Design and
Engineering, 5(4):449–457, 2018. ISSN 2288-4300. doi:10.1016/j.jcde.2017.12.007.
Sav Shrestha and Kelsi Lenz. Eye gaze patterns while searching vs. browsing a website. Usability News, 9(1):1–9,
2007.
Tanja Blascheck, Kuno Kurzhals, Michael Raschke, Michael Burch, Daniel Weiskopf, and Thomas Ertl. State-of-the-art
of visualization for eye tracking data. In Eurovis (stars), page 29, 2014.
Sean Anthony Byrne, Nora Castner, Efe Bozkir, Diederick C. Niehorster, and Enkelejda Kasneci. From lenses
to living rooms: A policy brief on eye tracking in XR before the impending boom. In 2024 IEEE Inter-
national Conference on Artificial Intelligence and eXtended and Virtual Reality (AIxVR), pages 90–96, 2024.
doi:10.1109/AIxVR59861.2024.00020.
National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. Belmont
Report: Ethical Principles and Guidelines for the Protection of Human Subjects of Research, 1979. https:
//www.hhs.gov/ohrp/regulations-and-policy/belmont-report/index.html.
World Medical Association. World medical association declaration of helsinki: Ethical principles for medical research
involving human subjects. JAMA, 310(20):2191–2194, 2013. doi:10.1001/jama.2013.281053. First version in 1964.
Daniel J. Liebling and Sören Preibusch. Privacy considerations for a pervasive eye tracking world. In Proceedings of
the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication, page
1169–1177. ACM, 2014. doi:10.1145/2638728.2641688.
Efe Bozkir, Ali Burak Ünal, Mete Akgün, Enkelejda Kasneci, and Nico Pfeifer. Privacy preserving gaze estimation
using synthetic images via a randomized encoding based framework. In ACM Symposium on Eye Tracking Research
and Applications. ACM, 2020. doi:10.1145/3379156.3391364.
Efe Bozkir, Onur Günlü, Wolfgang Fuhl, Rafael F. Schaefer, and Enkelejda Kasneci. Differential privacy for eye
tracking with temporal correlations. PLOS ONE, 16(8):1–22, 2021. doi:10.1371/journal.pone.0255979.
Brendan David-John, Kevin Butler, and Eakta Jain. Privacy-preserving datasets of eye-tracking samples with
applications in xr. IEEE Transactions on Visualization and Computer Graphics, 29(5):2774–2784, 2023.
doi:10.1109/TVCG.2023.3247048.
Mayar Elfares, Zhiming Hu, Pascal Reisert, Andreas Bulling, and Ralf Küsters. Federated learning for appearance-based
gaze estimation in the wild. In Annual Conference on Neural Information Processing Systems, pages 20–36. PMLR,
2023.
Mayar Elfares, Pascal Reisert, Zhiming Hu, Wenwu Tang, Ralf Küsters, and Andreas Bulling. PrivatEyes: Appearance-
based gaze estimation using federated secure multi-party computation, 2024.
Suleyman Ozdel, Efe Bozkir, and Enkelejda Kasneci. Privacy-preserving scanpath comparison for pervasive eye
tracking. arXiv preprint arXiv:2404.06216, 2024. doi:10.48550/arXiv.2404.06216.
37