Fixation: A Universal Framework For Experimental Eye Movement Research
Fixation: A Universal Framework For Experimental Eye Movement Research
ABSTRACT
We propose a free and open-source framework for experimental eye
movement research, whereby a researcher can record and visualize
gaze data and export it for analysis using an offline web applica-
tion without proprietary components or licensing restrictions. The
framework is agnostic to the source of the raw gaze stream and can
be used with any eye tracking platform by mapping data to a stan-
dard json-based format and streaming it in real time. We leverage
web technologies to address data privacy concerns and demonstrate
support for recording at 300 Hz and real-time visualization.
KEYWORDS
eye tracking, experimental research, high-frequency recording, real-
impede research progress. Tobii Pro Lab (formerly Tobii Pro Studio),
time visualization, web
SMI BeGaze (discontinued after Apple acquisition), and SR Research
ACM Reference Format: Experiment Builder/WebLink and Data Viewer are used most often
Mostafa Elshamy and Peter Khooshabeh. 2021. Fixation: A universal frame- to record and analyze eye tracking data [Blascheck et al. 2017].
work for experimental eye movement research. In ETRA ’21: 2021 Symposium Tobii Pro Sprint is a discontinued subscription-based cloud platform
on Eye Tracking Research and Applications (ETRA ’21 Short Papers), May
for user experience testing. Pupil Cloud is well documented, open-
25–27, 2021, Virtual Event, Germany. ACM, New York, NY, USA, 5 pages.
source software for head-mounted eye trackers [Kassner et al. 2014]
https://doi.org/10.1145/3448018.3458007
and free within certain capacity caps.
1 INTRODUCTION Recent advances in machine learning and availability of large-
scale datasets inspired a growing number of video-based gaze esti-
Eye tracking technology has advanced remarkably from its early mation models to appear in the literature [Valliappan et al. 2020;
days [Yarbus 1967] and is being used today to study computational Zhang et al. 2019]. OpenGaze provides C++ apis for gaze estimation
models of vision [Itti and Baldi 2009], mutual gaze [Müller et al. using off-the-shelf cameras, but user-friendly software for conduct-
2020], whose role in social interaction was studied by psychologists ing experimental research using such neural network models is not
decades earlier [Argyle and Cook 1976], and human–computer available yet.
interaction [Nielsen and Pernice 2010]. We introduce novel methods to address problems of universal
While there has been significant progress in democratizing eye compatibility, scalable processing of high-frequency data, and data
tracking [Krafka et al. 2016; Papoutsaki et al. 2016], commercial privacy. These methods are themselves quite interesting and gener-
software coupled to expensive and specialized hardware remains ally applicable outside eye tracking to high-frequency recording
dominant. Cost, licensing restrictions, and sparse documentation and real-time visualization of neural or physiological signals.
∗ Fixation
was developed under the codename “Spellbound,” inspired by the dream
sequence designed by Salvador Dali for the 1945 Alfred Hitchcock film of that title. 2 ARCHITECTURE
Publication rights licensed to ACM. ACM acknowledges that this contribution was Rather than overwhelm researchers with a complex feature set,
authored or co-authored by an employee, contractor or affiliate of the United States we start with a basic set that we can grow based on community
government. As such, the Government retains a nonexclusive, royalty-free right to
publish or reproduce this article, or to allow others to do so, for Government purposes adoption and feedback. We provide high-performance modules
only. for processing a raw gaze stream and fixation detection using a
ETRA ’21 Short Papers, May 25–27, 2021, Virtual Event, Germany well-specified algorithm, capture and storage of stimulus and user
© 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-1-4503-8345-5/21/05. . . $15.00 camera frames during detected fixations and playback of stored
https://doi.org/10.1145/3448018.3458007 frames overlaid with a scanpath or heatmap visualization.
ETRA ’21 Short Papers, May 25–27, 2021, Virtual Event, Germany Elshamy and Khooshabeh
Fixation Link
Eye tracker
Browser
Fixation WebSocket client
Diagnosis
WebSocket server
Service Worker
Replay
Storage Data
Figure 2: We reduce the problem of universal compatibility to mapping data before it is streamed over the network using
fixation links to a client application running in-browser. Application modules use Web apis to provide their functionality.
e.g., the “Record” module uses the Canvas api for real-time visualization, the Service Worker api for offline cache storage, the
Screen Capture api to capture the display as a media stream, and the Media Streams api to request access to a local webcam.
The idea of a web application was an early design decision to sup- Fixation detection is useful because it significantly reduces data
port distributed experiments, which proved useful for solving other size without compromising high-level analysis [Salvucci and Gold-
problems (e.g., since we are able to capture html dom structure berg 2000]. We apply dispersion-threshold identification (I-DT)
during a fixation, we can analyze scanpaths without explicit area of with dispersion threshold D t and frequency f determined by a
interest specification as in previous literature [Eraslan et al. 2016]). configurable duration threshold (often 100–200 ms). We start with
However, there are 1) data privacy concerns in cloud environments an initial set of k gaze points P = {p1 , p2 , . . . , pk } = {(x 1 , y1 , t 1 ),
due to the sensitivity of eye or face images and potential personal (x 2 , y2 , t 2 ), . . . , (x k , yk , tk )} where k = D t × f and add additional
identification from gaze data [Liebling and Preibusch 2014] and points until the dispersion D exceeds the threshold D t
2) performance concerns when processing data at 60–300 Hz in
D = [max(X ) − min(X )] + [max(Y ) − min(Y )] > D t
real time, which can easily block the main thread in JavaScript. We
show in Section 4 how to store sensitive data in the browser cache where X = {x 1 , x 2 , . . . , x n }, Y = {y1 , y2 , . . . , yn } and n ≥ k then
and in Sections 4 and 5 how a module can run scripts concurrently note a fixation at the centroid C
n n
!
by spawning background threads known as web workers. 1 Õ Õ
C= xi , yi
n i=1 i=1
2.1 Fixation Link
A fixation link is a WebSocket1 server that has a single function: at timestamp t 1 and of duration d = tn − t 1 .
transform the gaze stream into a standard json-based format (see
Figure 5) and emit events to a connected WebSocket client so that
it can correctly interpret gaze points. The x and y coordinates are
normalized relative to the display such that (0, 0) and (1, 1) are the
top-left and bottom-right corners respectively. The timestamp is
measured in µs. Additional unstructured data (e.g., pupil size, eye
position, or head pose in 3D space) may optionally be sent along
and stored for analysis.
Since virtually all programming languages support json and
WebSocket, it can be written in any programming language sup-
ported by an eye tracker’s sdk or a gaze estimation model’s api.
We provide fixation links for Tobii TX300 (Python), Tobii EyeX Figure 3: Visual inspection of mean absolute error indicates
(Node.js, C#, 48 sloc) and a simulator2 for testing purposes (Python, whether a re-calibration is needed.
58 sloc).
Five-point calibration is generally adequate [Duchowski 2017],
3 DIAGNOSIS
particularly if our goal is to determine if a re-calibration is needed
A quick check before recording reduces the risk of lost or mis- before recording. We consider 5 points at (0.1, 0.1), (0.9, 0.1), (0.5, 0.5),
calibrated data. We provide visual tools to 1) verify the client is (0.1, 0.9), (0.9, 0.9) and plot horizontal and vertical mean absolute
receiving WebSocket messages from a connected fixation link and error E for n raw data points while gazing at p.
can detect fixations and 2) inspect calibration accuracy to verify n
the absence of large errors. 1Õ
E= |x i − x 1 | + |yi − y1 |
1 WebSocket
n i=1
is a protocol to enable web applications to maintain bidirectional commu-
nications with server-side processes. An early prototype provided a calibration tool that relied on an
2 It is worth noting how the simulator works, which is by sampling gaze points from a
Gaussian hmm with 9 hidden states (each corresponding to a screen region) trained on sdk-provided api. We observed that not all eye trackers support
a small dataset of 27572 raw gaze points captured using a Tobii TX300. this, but almost all provide a free calibration tool.
Fixation: A universal framework for experimental eye movement research ETRA ’21 Short Papers, May 25–27, 2021, Virtual Event, Germany
We use a pool of n worker threads and for each new task select
the worker Wmin with minimum load min({l 0 , l 1 , . . . , ln }) where
li is the load of worker i or the number of frames it is processing.
This can be easily achieved by having each worker communicate
its load back to the main thread after performing an asynchronous
task. This is necessary to maintain high recording performance,
Figure 4: Left: Captured display media stream. Right: User and a naive synchronous approach will eventually block the main
camera stream. A visual indicator labeled “System Health” thread as encoding a frame is relatively slow and new frames will
provides feedback about the status of the worker load. Green be generated at a rate faster than they can be encoded and stored
as opposed to red markers indicate that the system is ade- in cache storage. We further reduce processing latency by never
quately processing new frames. copying image data between execution contexts. Transferable objects
can be used to transfer ownership of a byte array representing the
image between the main thread and the web worker.
4 RECORDING We considered both cache storage and IndexedDB for storing
data. Both are widely supported, accessible from web workers, and
Synchronization of independent data streams is a major problem in asynchronous. IndexedDB can arguably be faster when seeking a
multimodal systems. If input data from different sources is recorded frame during replay, but is a low-level api that requires significant
at different sampling rates, the clock of each source has to be syn- setup before use. We show in Section 5 how to achieve fast seek
chronized with a master clock to reliably correlate the data. This has performance using a frame buffer.
often been resolved by Network Time Protocol (ntp) based synchro-
nization protocols [Mills 1991] or manual alignment. Recognizing
that little to no visual processing is achieved during a saccade [Fuchs 4.1 Tuning stimulus image encoding
1971], we adopt a different approach and capture stimulus and user parameters
camera frames at each identified fixation rather than record inde-
pendent gaze, stimulus and user camera data streams. We believe
this reduces problems of experimental reproducibility and results 1e6 Size-quality tradeoff
in more reliable datasets. 1.5
Size in chars
ing the fixation and stores the augmented document (see Figure 5) 100
in cache storage. A similar procedure using getUserMedia() from
the Media Streams api can be used to record user camera frames. 50
{
Encoding a stimulus frame before storage significantly decreases
id : 0 ,
...
storage requirements but can degrade image quality and recording
duration : 150.485869 , performance if not done properly. We can optimize encoder options
stimulus : " data : image / webp ; base64 , UklGR ..." using a simple procedure: plot execution time and encoded image
user_cam : " data : image / webp ; base64 , UklGR ..." size as a function of image quality (using a 0.1 or 0.01 step) for
} every compression format, then visually inspect the plots for an
optimal value. This can vary for different screen resolutions and
stimuli. For example, we see from Figure 6 that the jpeg format is
Figure 5: Sample json documents. Top: Raw gaze point. Bot- faster than both png and WebP, but a WebP encoder may produce
tom: An identified fixation, augmented with its duration and fewer artifacts for graphical stimuli. The png format is lossless and
base64-encoded stimulus and user camera images. results in significantly larger base64-encoded images.
ETRA ’21 Short Papers, May 25–27, 2021, Virtual Event, Germany Elshamy and Khooshabeh
6 DATA MANAGEMENT
Figure 7: A dynamic heat map overlaid over the stimulus The data management module provides tools to import and ex-
during replay of a recorded session. (Captured still from port recorded data and information about cache content and disk
White Fawn’s Devotion [Young Deer 1910]. Public Domain.) space. Data can be exported as a json file and imported later by
dragging-and-dropping it on a designated drop target. The exported
document may be analyzed or used as a machine learning dataset.
5 REPLAY How much data can be stored in the browser cache? Browser
Visualization of eye movements during replay of the recorded ses- implementations vary, but the amount of available storage is usually
sion is critical to verification of data integrity and exploration of po- a percentage of total disk space (e.g., Chrome and Firefox allow the
tential pockets where further analysis may be useful or to running a browser to use up to 80% and 50% respectively). Table 1 shows usage
retrospective think-aloud study. We provide two basic methods for and quota per session length as reported by the estimate method
gaze visualization: scanpaths and heatmaps. Both depict areas of rel- of the StorageManager interface of the Storage api in Chrome.
ative cognitive importance, but scanpaths depict temporal viewing
order while heatmaps depict aggregate gaze more intuitively. 7 DISCUSSION
We use a buffer of n frames and request the next n frames from Existing open-source eye tracking software supports recording from
a single prespawned web worker when we start consuming frames hardware of a particular form factor (i.e., head-mounted) [Kassner
from the current buffer. A single worker consistently outperformed et al. 2014], few popular models [Dalmaijer et al. 2014; Voßkühler
multiple workers in experiments to improve replay performance. et al. 2008], or focuses on post-experimental analysis of imported,
We bypass decoding of base64-encoded images and instantly render prerecorded data [Kubler 2020]. Fixation is differentiated by its
frames on the canvas by creating n hidden elements in the document hardware-agnostic interface, inherently synchronized datasets, and
and setting their background image to the encoded image. accessibility on the web without compromising data privacy. We
were able to experimentally verify our methods at 60–300 Hz, but
5.1 Rendering dynamic heatmaps they can in theory scale to higher sampling rates (e.g. 1000 Hz).
Rendering real-time heatmaps atop video is computationally in- The development of Fixation was motivated by a limited budget
tensive and has required a gpu implementation in the literature during purchase of a Tobii TX300 eye tracker for an exploratory
[Duchowski et al. 2012]. We extend an approach introduced in study of an eye movement phenomenon reported in Day [1964].
Biedert et al. [2012] to render discretized heatmaps in real time us- We were also concerned about unforeseen limitations of Tobii Pro
ing a cpu implementation by relaxing the requirement for smooth Studio posing obstacles to our research. Commercial eye tracking
rendering via a Gaussian point-spread function and taking advan- software may process raw gaze data in partially specified pipelines
tage of eye tracking measurement error. It is an open question to protect the manufacturer’s intellectual property. We hope, by
whether smooth heatmaps have advantages beyond visual familiar- open-sourcing the framework and releasing it for free, that more
ity, but we may consider a gpu-based option in future work. researchers will have access to production-grade tools needed to
If we consider a grid G of n tiles where each tile t ∈ G has size obtain reliable and verifiable datasets. This promotes openness,
w t ×ht approximately equal to expected eye tracking error, then on decreases fragmentation within the research community, and ac-
every new gaze point p = (x, y) we can check if p ∈ t and increment celerates experimental eye movement research.
i t , intensity of tile t. A prespawned web worker is responsible for Future work will involve benchmarking performance and grow-
intensity accumulation. We can map the accumulated intensity ing the feature set to support use cases found to be problematic,
count normalized relative to the maximum intensity imax to a but we are prioritizing some issues on our roadmap. We expect
tile’s alpha channel or to rgb color space via a color gradient. The to 1) add more fixation links to support popular models and new
traditional rainbow color map seen in Figure 7 can be generated models that appear in the literature, 2) provide additional event
through the hsla() functional notation such that hot has a hue of detection algorithms and methods for visualizing gaze data, and 3)
209 and cold has a hue of 0 at 50% saturation, lightness, and alpha. explore using the platform to generate large datasets for training
Finding the maximum intensity is the bottleneck of the cpu gaze estimation models. We encourage the eye tracking research
implementation and has O(wh) complexity given a w ×h frame, but community to contribute to the project.
Fixation: A universal framework for experimental eye movement research ETRA ’21 Short Papers, May 25–27, 2021, Virtual Event, Germany
ACKNOWLEDGMENTS (Stuttgart, Germany, June 2–5, 2020) (ETRA ‘20 Full Papers). ACM, New York, NY,
USA, Article 7, 10 pages. https://doi.org/10.1145/3379155.3391332
The authors thank Prof. Stacy Marsella, who acquired funding used Jakob Nielsen and Kara Pernice. 2010. Eyetracking Web Usability. New Riders, Berkeley,
to purchase the Tobii TX300 eye tracker. This work was supported CA, USA.
Alexandra Papoutsaki, Patsorn Sangkloy, James Laskey, Nediyana Daskalova, Jeff
in part by the U.S. Army rdecom. Content does not necessarily Huang, and James Hays. 2016. WebGazer: Scalable Webcam Eye Tracking Using User
reflect the position or policy of the U.S. government, and no official Interactions. In Proceedings of the 25th International Joint Conference on Artificial
endorsement should be inferred. Intelligence (New York, NY, July 9–15, 2016) (IJCAI ‘16). AAAI, Palo Alto, CA, USA,
3839–3845.
Dario D. Salvucci and Joseph H. Goldberg. 2000. Identifying Fixations and Saccades
CODE AVAILABILITY AND LICENSING in Eye-Tracking Protocols. In Proceedings of the 2000 Symposium on Eye Tracking
Research and Applications (Palm Beach Gardens, FL, November 6–8, 2000) (ETRA
Fixation is hosted at https://polar-ocean-09884.herokuapp.com at ‘00). ACM, New York, NY, USA, 71–78. https://doi.org/10.1145/355017.355028
the time of publication. Source code and documentation is available Nachiappan Valliappan, Na Dai, Ethan Steinberg, Junfeng He, Kantwon Rogers, Venky
Ramachandran, Pingmei Xu, Mina Shojaeizadeh, Li Guo, Kai Kohlhoff, et al. 2020.
on the corresponding author’s GitHub3 under the mit license. Accelerating Eye Movement Research via Accurate and Affordable Smartphone
Eye Tracking. Nature Communications 11, 1, Article 4553 (Sept. 2020), 12 pages.
https://doi.org/10.1038/s41467-020-18360-5
REFERENCES Adrian Voßkühler, Volkhard Nordmeier, Lars Kuchinke, and Arthur M. Jacobs. 2008.
Michael Argyle and Mark Cook. 1976. Gaze and Mutual Gaze. (1976). OGAMA (Open Gaze and Mouse Analyzer): open-source software designed to
Ralf Biedert, Andreas Dengel, Mostafa Elshamy, and Georg Buscher. 2012. Towards analyze eye and mouse movements in slideshow study designs. Behavior Research
Robust Gaze-Based Objective Quality Measures for Text. In Proceedings of the Methods 40, 4 (2008), 1150–1162. https://doi.org/10.3758/BRM.40.4.1150
Symposium on Eye Tracking Research and Applications (Santa Barbara, CA, March Alfred L. Yarbus. 1967. Eye Movements and Vision. Springer, New York, NY, USA.
28–30, 2012) (ETRA ‘12). ACM, New York, NY, USA, 201–204. https://doi.org/10. https://doi.org/10.1007/978-1-4899-5379-7
1145/2168556.2168593 James Young Deer. 1910. White Fawn’s Devotion. Retrieved March 31, 2021
Tanja Blascheck, Kuno Kurzhals, Michael Raschke, Michael Burch, Daniel Weiskopf, from https://www.filmpreservation.org/preserved-films/screening-room/t1-white-
and Thomas Ertl. 2017. Visualization of Eye Tracking Data: A Taxonomy and fawn-s-devotion-1910
Survey. Computer Graphics Forum 36, 8 (Dec. 2017), 260–284. https://doi.org/10. Xucong Zhang, Yusuke Sugano, and Andreas Bulling. 2019. Evaluation of Appearance-
1111/cgf.13079 Based Methods and Implications for Gaze-Based Applications. In Proceedings of
Edwin S. Dalmaijer, Sebastiaan Mathôt, and Stefan Van der Stigchel. 2014. PyGaze: An the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, UK,
open-source, cross-platform toolbox for minimal-effort programming of eyetracking May 4–9, 2020) (CHI ‘19). ACM, New York, NY, USA, Article 416, 13 pages. https:
experiments. Behavior Research Methods 46, 4 (2014), 913–921. https://doi.org/10. //doi.org/10.1145/3290605.3300646
3758/s13428-013-0422-2
Merle E. Day. 1964. An Eye Movement Phenomenon Relating to Attention, Thought
and Anxiety. Perceptual and Motor Skills 19, 2 (Oct. 1964), 443–446. https://doi.org/
10.2466/pms.1964.19.2.443
Andrew T. Duchowski. 2017. Eye Tracking Methodology: Theory and Practice. Springer,
Cham, Switzerland. https://doi.org/10.1007/978-3-319-57883-5
Andrew T. Duchowski, Margaux M. Price, Miriah Meyer, and Pilar Orero. 2012. Aggre-
gate Gaze Visualization with Real-Time Heatmaps. In Proceedings of the Symposium
on Eye Tracking Research and Applications (Santa Barbara, CA, March 28-30, 2012)
(ETRA ‘12). ACM, New York, NY, USA, 13–20. https://doi.org/10.1145/2168556.
2168558
Sukru Eraslan, Yeliz Yesilada, and Simon Harper. 2016. Eye Tracking Scanpath Analysis
Techniques on Web Pages: A Survey, Evaluation and Comparison. Journal of Eye
Movement Research 9, 1 (Feb. 2016). https://doi.org/10.16910/jemr.9.1.2
Albert F. Fuchs. 1971. The Saccadic System. In The Control of Eye Movements, Paul
Bach-y Rita, Carter C. Collins, and Jane E. Hyde (Eds.). Academic Press, New York,
NY, USA, 343–362. https://doi.org/10.1016/B978-0-12-071050-8.50017-3
Laurent Itti and Pierre Baldi. 2009. Bayesian surprise attracts human attention. Vision
Research 49, 10 (June 2009), 1295–1306. https://doi.org/10.1016/j.visres.2008.09.007
Moritz Kassner, William Patera, and Andreas Bulling. 2014. Pupil: An Open Source
Platform for Pervasive Eye Tracking and Mobile Gaze-based Interaction. In Pro-
ceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous
Computing (Seattle, WA, September 13–17, 2014) (UbiComp ‘14 Adjunct). ACM, New
York, NY, USA, 1151–1160. https://doi.org/10.1145/2638728.2641695
Kyle Krafka, Aditya Khosla, Petr Kellnhofer, Harini Kannan, Suchendra Bhandarkar,
Wojciech Matusik, and Antonio Torralba. 2016. Eye Tracking for Everyone. In
IEEE Conference on Computer Vision and Pattern Recognition (Las Vegas, NV, June
26–30, 2020) (CVPR ‘16). IEEE Computer Society, Las Alamitos, CA, USA, 2176–2184.
https://doi.org/10.1109/CVPR.2016.239
Thomas C. Kubler. 2020. The Perception Engineer’s Toolkit for Eye-Tracking data
analysis. In ACM Symposium on Eye Tracking Research and Applications (Stuttgart,
Germany, June 2–5, 2020) (ETRA ‘20 Short Papers). ACM, New York, NY, USA, Article
15, 4 pages. https://doi.org/10.1145/3379156.3391366
Daniel J. Liebling and Sören Preibusch. 2014. Privacy Considerations for a Pervasive
Eye Tracking World. In Proceedings of the 2014 ACM International Joint Conference on
Pervasive and Ubiquitous Computing (Seattle, WA, September 13–17, 2014) (UbiComp
‘14 Adjunct). ACM, New York, NY, USA, 1169–1177. https://doi.org/10.1145/2638728.
2641688
David L. Mills. 1991. Internet Time Synchronization: The Network Time Protocol.
IEEE Transactions on Communications 39, 10 (1991), 1482–1493. https://doi.org/10.
1109/26.103043
Philipp Müller, Ekta Sood, and Andreas Bulling. 2020. Anticipating Averted Gaze in
Dyadic Interactions. In ACM Symposium on Eye Tracking Research and Applications
3 https://github.com/melhosseiny/fixation