Eye Tracker Data Quality: What It Is and How To Measure It
Eye Tracker Data Quality: What It Is and How To Measure It
net/publication/254007815
CITATIONS READS
244 16,267
3 authors, including:
Kenneth Holmqvist
Nicolaus Copernicus University; Universität Regensburg; University of the Free State
183 PUBLICATIONS 9,169 CITATIONS
SEE PROFILE
All content following this page was uploaded by Kenneth Holmqvist on 11 February 2015.
Abstract
Data quality is essential to the validity of research results and to
the quality of gaze interaction. We argue that the lack of standard
measures for eye data quality makes several aspects of manufactur-
ing and using eye trackers, as well as researching eye movements
and vision, more difficult than necessary. Uncertainty regarding the
comparability of research results is a considerable impediment to
progress in the field. In this paper, we illustrate why data qual- Figure 1: Good and poor precision in two remote 50 Hz eye
ity matters and review previous work on how eye data quality has trackers as seen in an x-/y-visualisation (scanpath view). From
been measured and reported. The goal is to achieve a common [Holmqvist et al. 2011], page 149.
understanding of what data quality is and how it can be defined,
measured, evaluated, and reported.1
1 Does data quality matter? Figure 2: Very inaccurate data in one corner. From [Holmqvist
et al. 2011], page 132.
The validity of research results based on eye movement analysis are
clearly dependent on the quality of eye movement data. The same is
true of the performance of gaze based communication devices. Eye
Since fixation analysis obscures the original data quality, most re-
data contain noise and error which must be accounted for. There are
searchers estimate the quality of their own recordings from various
currently no norms or standards for what researchers report about
plots of raw data samples. For instance, Figure 1 shows good versus
data quality in publications, or for what manufacturers report about
poor precision, and Figure 2 a case of poor accuracy in the upper
their eye tracker’s typical performance. What may be a serious im-
left corner. It may be obvious that eye tracker data quality affects
pediment for one purpose may not be significant for other purposes,
the validity of results, but how large is the effect? Is it reasonable to
for example, a cheap eye tracker composed of off-the-shelf compo-
assume valid results from a commercial eye tracker without mea-
nents may be sufficient for clicking large buttons in gaze interaction
suring quality in a particular data set, or should all eye movements
or for looking at larger AOIs with sufficient margin sizes, and may
researchers check their data quality and report it as part of their
work as an assistive device mounted on a wheelchair, whereas a
results? To illustrate these issues, we begin with four examples.
more expensive, high performance eye tracker may have better data
quality and a greater number of valid eye movement measures nec-
1.1 Example 1: Effect of accuracy on dwell time mea-
essary in much psychological, neurological and reading research. It
is a case of matching the system to the purposes and also to the user sures
or participant group, and this is a very difficult task without some
Accuracy (sometimes called offset) is one of the most highlighted
standardized measures of data quality. If data quality is measured
aspect of data quality. Loosely speaking, it refers to the difference
and characterised for the eye tracker, participant group and in terms
between the true and the measured gaze direction.
of the specific experimental measures of interest, there are meth-
ods of dealing with low quality to maximise the validity of results: Figure 3(a) shows high quality data recorded from one participant
correcting or abandoning data [Holmqvist et al. 2011, p. 140 and looking at the stimulus image for 30 seconds, with the task of esti-
224]. However, these methods cannot be considered without first mating the age of people in the scene. Binocular data were recorded
analysing the data and identifying what is and is not noise or error. with a tower-mounted eye tracker sampling at 500 Hz, but only data
from the left eye are shown and analysed. The eye tracker reports
∗ [email protected] an average accuracy of 0.30◦ horizontally and 0.14◦ vertically after
† [email protected] calibration and a four-point validation procedure.
‡ [email protected]
1 We thank the members of the COGAIN Technical Committee (see
Figure 3(b) displays areas of interest (AOIs) for faces in the stimu-
lus image. Because this is a real image, there is no whitespace—i.e.
www.cogain.org/EyeDataQualityTC) for the standardisation of eye data
an area not covered by any AOIs—between the faces that could be
quality for their ongoing participation and comments to this text.
used as AOI margins. AOIs with small margins are common in
Copyright © 2012 by the Association for Computing Machinery, Inc. reading research, web studies, and studies that use videos or real
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
world stimuli. They are also common in gaze interaction scenar-
for commercial advantage and that copies bear this notice and the full citation on the ios, e.g. when typing on an onscreen keyboard. When there is no
first page. Copyrights for components of this work owned by others than ACM must be room for margins, data with poor accuracy will sometimes move
honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on to another AOI than the one intended. We can simulate degrees
servers, or to redistribute to lists, requires prior specific permission and/or a fee.
Request permissions from Permissions Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail
of poor quality by adding 0.5◦ offset to the recorded data, moving
[email protected]. them a bit in space. Even with this additional offset, the accuracy
ETRA 2012, Santa Barbara, CA, March 28 – 30, 2012.
© 2012 ACM 978-1-4503-1225-7/12/0003 $10.00
45
60 700
1200
x−coordinate
1000 y−coordinate
Number of fixations
800 50 600
600
400 40 500
200
RMS: 0.03 degrees RMS: 0.37 degrees
0 30 400
5000 6000 7000 8000 9000 10000 0 0.1 0.2 0.3 0.4
Sample number Precision (RMS of intersample distances)
(a) Original data in high quality. (b) AOI positions.
(a) Illustration of data with high (b) Influence of precision on the
7000 7000 (left) and low (right) precision. number of fixations and the average
6000 6000 fixation duration.
Total dwell time (ms)
4000 4000 Figure 4: How a decrease in precision affects the number and du-
3000 3000 ration of detected fixations. The precision was decreased by adding
2000 2000
Gaussian noise with an increasing variance. Fixations were de-
1000 1000
tected with the algorithm by [Nyström and Holmqvist 2010], using
0 0
1 2 3 4
AOI
5 6 7 1 2 3 4
AOI
5 6 7 default settings.
(c) Total dwell time in each (d) Total dwell time after 0.5◦
AOI; original data. inaccuracy (offset) has been
added to the data. dwell-time based selection to restart, or if very inaccurate with no
space between targets, may mean selection is very difficult or can
Figure 3: Figure (c) and (d) compare total dwell times with accu- only be made on very large (or magnified) targets.
rate vs slightly inaccurate data. The inaccuracy was added to the
original data. Note that 0.5◦ is considered a very small error. 1.2 Example 2: Effects of precision on the number and
duration of fixations
is still considered rather high in comparison to what is commonly Inaccuracy is not the only data quality issue affecting the viability
reported in the literature, in fact several manufacturers report 0.5◦ of research results. While accuracy refers to the difference between
offset as their standard or even best possible accuracy. If a system’s true and recorded gaze direction, precision refers to how consistent
inaccuracy is not taken into account when designing test stimuli calculated gaze points are, when the true gaze direction is constant.
and analysing data from a study, what kind of effect may it have on It is often tested with an artificial eye, which does not move at all.
results? Precision measures are commonly conducted to test a particular eye
tracker, and when using an artificial eye, this measure gives an idea
Dwell time ( ‘gaze duration’, ‘glance time’, ...) is the time gazed at of system noise or error, which varies with the quality of the eye
an AOI, from entry to exit, whereas total dwell time is the sum of tracking system. In essence, this enables us to investigate the effect
all dwell times to a specific AOI over a trial [Holmqvist et al. 2011, of collecting data with or without a bite-bar or chin rest, or with a
pp. 190 and 389]. It is a very common measure in eye-movement tower-mounted eye tracker compared to a remote one. It is also one
research. Figure 3(c) shows dwell times for seven AOIs based on aspect of testing eye tracker quality. By adding Gaussian noise with
the original, and Figure 3(d) shows dwell time for the same AOIs an increasing standard deviation to the eye movement data in Fig-
after a 0.5◦ offset has been added. Note that for some AOIs, total ure 3(a), we can simulate poor precision in an eye tracker. Figure
dwell time is reduced, for others it is significantly reduced or even 4(a) shows an example of the original data (left part of figure) and
totally removed from one AOI, and for some AOIs, dwell time is the data after noise has been added. The range of added noise has
hardly affected at all. The effect is not uniform across AOIs and so been chosen to conform to recorded precision values for current eye
can’t be corrected or controlled for. The purpose of this example trackers, which according to [Holmqvist et al. 2011] is 0.01−0.05◦
study was to analyse AOIs for dwell time and number of fixations, for tower-mounted systems and 0.03 − 1.03◦ for remote ones. The
which is typical of many studies. Adding 0.5◦ degree imprecision larger values in the latter range, however, are likely to reflect eye
to the data simulates many common recording scenarios. The point trackers with exceptionally poor precision, and are therefore not in-
to note is that even when precision is relatively good, the small cluded in the data presented. Precision values are calculated as the
amount of imprecision present can lead to significant differences in root mean square (RMS) of intersample distances in the data.
the results.
Figure 4(b) illustrates how precision influences the number and du-
Often noise in data can be counteracted by increasing the amount
ration of fixations, as detected by the adaptive velocity algorithm
of data; as, for instance, with the effect of low sampling frequency
developed by [Nyström and Holmqvist 2010]. According to this al-
on fixation duration [Andersson et al. 2010]. In contrast, more data
gorithm, fixations become fewer and longer as precision decreases.
does not remedy the effect of poor accuracy on AOI measures such
This is most likely due to the saccade detection threshold increas-
as dwell time, because the different data are likely often to be dis-
ing as a direct consequence of the higher noise level, which prevents
tributed in the same direction, out of the AOI.
small saccades from being detected. These small saccades then be-
Apart from the effect of accuracy on research results, accuracy also come part of adjacent fixations, merged into one longer fixation.
affects gaze based communication technologies. In gaze based in- The effect in Figure 4(b) is dramatic; even though the data should
teraction, interactive on screen targets are in fact AOIs with clear represent exactly the same eye movement behaviour, the number of
margins. Dwell time select is a common method of ‘clicking’ a fixations decreases by more than 30%, whereas the average fixation
button with gaze. Having several buttons side by side in an array, duration increases by about 10%. The size of this effect will change
for example in an on-screen keyboard, will produce error in selec- with the method of event detection used. For example, it is likely
tion if data moves to the neighbouring button. When the selection that systems using dispersion based fixation detection algorithms
method involves dwell time, this may cause an almost complete produce a different result.
46
1400
60 550 be attributed to differences in cognitive processing or emotional re-
1200
x−coordinate
y−coordinate
sponses to the objects. [Gagl et al. 2011] reported a similar effect
and also propose a method to correct the errors. Such problems can
50 500
800
be corrected, but only if the error is first measured for the particular
600
set up used.
40 450
400
200
2 Factors influencing data quality
Proportion of lost samples: 0.18
0
0 5000 10000
Sample number
15000 30
0 0.05 0.1
Proportion of lost data
0.15
400
0.2 Many factors influence data quality, including:
(a) Data with missing samples (indi- (b) Influence of data loss on the num- 1. Participants have different eye physiologies, varying neurol-
cated with red dots). 18% of the sam- ber of fixations and the average fixa- ogy and psychology, and differing ability to follow instruc-
ples were lost in this example. tion duration. tions. Some participants may wear glasses, contact lenses, or
mascara, or may have long eyelashes or droopy eyelids which
Figure 5: How data loss affects the number and duration of de- all interfere with the eye image and may or may not be ac-
tected fixations. Data loss was simulated by randomly insert- counted for in a system’s eye model [Nyström et al. submit-
ing burst losses with a length uniformly drawn from the interval ted].
[10, 100] pixels. Fixations were detected with the algorithm by 2. Operators have differing levels of skill, and more experienced
[Nyström and Holmqvist 2010], using default settings. operators should be able to record data with higher quality
[Nyström et al. submitted]. Operator skills include adjusting
eye to camera angles and mirrors, monitoring the data quality
in order to decide whether to recalibrate, as well as providing
1.3 Example 3: Effect of data loss on the number and
clear instructions to the participants.
duration of fixations
3. A task that requires participants to move around a lot, for ex-
Lost data refers to samples that are reported as invalid by the eye ample, could affect data quality. A task that causes partici-
tracker. Typically, this correspond to (0, 0)-coordinates or sam- pants blink more often leads to more data loss, unless blinks
ples that are flagged with a certain validity code in the data file. are modeled as eye events.
Data losses derive from periods when critical features in the eye 4. The recording environment has a strong influence on data
image—often the pupil and the corneal reflection(s)—cannot be re- quality. Was the data collected outdoors in sunlight or indoors
liably detected and tracked. This can occur when, for example, in a controlled laboratory environment, for instance? Were
glasses, contact lenses, eyelashes, or blinks prevent the video cam- there any vibrations in the room that reduced the stability of
era from capturing a clear image of the eye. the eye movement signal? These factors should be considered
Sometimes, it may be desirable to differentiate blinks from other and reported.
sources of data loss. This may be because blinks are used as a be- 5. The geometry, that is the relative positions of eye camera, par-
havioural measure (e.g. [Holland and Tarlow 1972], [Tecce 1992]) ticipant, and stimulus affects data quality, as does the position
or because they are used for gaze based interaction, for example of the head in what is known as the head box [Holmqvist et al.
as a ‘click’ select input. In such cases, simply removing raw data 2011, p. 58]. This may be of particular importance when using
samples with (0, 0) coordinates is not possible, and blinks need to eye trackers as a communication aide for the disabled, who
be modeled and differentiated from other causes of loss of signal. may be constrained in their movement or sitting/lying posi-
Many eye trackers do not output blinks as an event. tion.
Figure 5(a) shows how losses have been introduced into the eye 6. The eye tracker design does of course have a large impact on
movement signal, where red dots represent lost or invalid samples. the quality of the recorded data. Simply put, an eye tracker
To simulate short, local losses of data, invalid data are inserted as consists of a camera, illumination, and a collection of soft-
burst losses, which occur with probability Pl and last for Nl sam- ware that detects relevant features in the eye, and map these
ples, where Nl is drawn uniformly from A = {10, 11, . . . , 100}. to positions on the screen. The resolution of the video cam-
Figure 5(b) reveals the same trend for data loss as Figure 4(b) did era and the sharpness of the eye image are important fac-
for decreased precision: a reduction in the number of fixations and tors that are directly related to some aspects of data qual-
an increase in fixation duration. ity. Equally important are the image analysis algorithms, the
eye model, the eye illumination and the calibration procedure.
1.4 Example 4: Effect of screen position on pupil size Eye tracker system specifications will also have an influence
on data quality. The most quoted system specification is sam-
Pupil size reacts primarily to changes in illumination, but it is of- ple rate, or sampling frequency. Sample frequency will dic-
ten used as a measure of mental workload, emotional valence, or tate the system’s ability to record brief events and to produce
as an indication of drug use [Holmqvist et al. 2011, pp. 393–394]. accurate velocity profiles. Other system specifications which
A prerequisite for such investigations (apart from controlled light influence data quality are whether the system is bright or dark
conditions) is that the recorded change in pupil size reflects the true pupil based (i.e. whether the eye illumination is on or off
change in pupil size, and therefore that the eye tracker does not add axis, producing a bright or dark image of the pupil, for a re-
any systematic or variable error to the data. Pupil size measures view of the various set-ups currently in use see [Hansen et al.
will include systematic error if the apparent change in pupil size 2011]). This may interact with eye colour or other factors to
with viewing angle is not controlled for by the eye tracking system, effect data quality. Finally, whether the eye tracker records
or corrected in the recorded data subsequently. The effect of view- monocularly or binocularly is of interest. Accuracy and pre-
ing angle is that pupil size is larger when the eye is on-axis with cision of fixation data may improve if data from two eyes is
the eye camera. This typically means that the pupil is largest when combined, particularly if using a dispersion based fixation de-
looking in the centre of the screen compared to the edges. Without tection method, but, if data from two eyes are not separable,
knowing this relationship between pupil size and screen position for saccade velocity profiles, microsaccades, drift, and saccade
the particular system being used, the difference in pupil size may amplitude measures will lose validity.
47
3 Terminology for data quality
First, let us make clear that we cannot know where a human is look-
ing. Even when a participant says she looks at a point, the centre of
the fovea can be slightly misaligned. When we talk about ‘actual
gaze’ we refer to this subjective but reportable impression, which is
what the vast majority of eye trackers are designed to measure.
Thus, in general terms, data quality can be defined as the spatial Figure 6: The set of raw data samples on the left have large sample-
and temporal deviation between the actual and the measured gaze to-sample distances, and therefore RMS will be high. They are not
direction and the nature of this deviation, on a sample to sample so dispersed, so standard deviation will be low. The data set on the
basis. In the very simplest case, we consider these deviations in the right, typical of a vibration in the eye tracker, has short sample-to-
presence of only one data sample x̂i . This sample can either be re- sample distances, which gives a low RMS, but it is fairly dispersed,
ported as valid or invalid by the eye tracker, where an invalid sample so standard deviation will higher.
usually means that relevant eye features could not be detected from
the video feed of the eye, for instance due to loss of the eye image.
Clearly, with the exception of blinks, it does not make much sense is often a good indication of whether the system has problems track-
to characterize the quality of missing data other than to classify it ing a particular individual or in a particular environment.
as invalid. When the eye tracker reports a valid sample, data qual-
ity can be defined as the distance θi (in visual degrees) between the The spatial accuracy and precision of pupil size can be defined in a
actual xi and the measured x̂i gaze position, known as the spatial similar manner. The unit of measurement is either pixels in the eye
accuracy, or just accuracy, as well as the difference between the camera, or the perhaps more intuitive unit millimeters. Since pupil
time of the actual movement of the eye ti and the time reported by size values are recorded at the same rate as gaze samples, temporal
the eye tracker t̂i , known as latency or temporal accuracy. If both quality values for pupil size are shared with those calculated for
accuracy and precision differences are zero, the data quality for this gaze samples.
single sample is optimal. Closely related to spatial precision is a measure termed as spatial
resolution, which refers to the smallest eye movement that can be
The example with only one sample is however mainly of academic
detected in the data. If such small eye movements are oscillat-
interest. Typically, one needs to consider several samples recorded
ing quickly, they can only be represented in data with high tem-
from a whole experiment, a trial, or a single event such as a fixation.
poral resolution or sampling frequency, according to the Nyquist–
Given n recorded samples, accuracy can be calculated as
Shannon sampling theorem [Shannon 1948].
n
1X
θOffset = θi , (1) 4 Measuring data quality using an artificial
n i=1
eye
The variance in accuracy is often referred to as spatial precision and The artificial eye is an important and versatile tool in the assess-
the variance in latency is typically called temporal precision. Two ment of data quality. However, eye trackers vary in terms of their
common ways to estimate the spatial precision in the eye move- eye models, therefore, finding an artificial eye which will ‘trick’ all
ment signal are the standard deviation of the samples and the root eye trackers is difficult. When deciding which eye tracker to buy or
mean square (RMS) of inter-sample angular distances, but a whole use for a particular study, artificial eyes provide a way of comparing
range of other dispersion measures exist that could be alternatives inherent system noise and error, and can be used to check system
[Holmqvist et al. 2011, p. 359-369]. The standard deviation for a latency. Artificial eyes are usually available from the manufacturer,
set of n data samples x̂i is calculated as at least for systems intended for research purposes. While it is rel-
v atively simple to produce artificial eyes for systems which are dark
u n pupil (i.e. the eye illumination is off-axis with the eye) based, it
u1 X
sx = t (x̂i − x̂avg )2 (2) is trickier for bright pupil based systems (i.e. where the eye illu-
n i=1
mination is on-axis). Battery-equipped eyes with actively luminous
pupils would be one solution.
where x̂avg denotes the sample average. Letting θ denote the angular
distances between samples, precision can be expressed as 4.1 Precision measurements with an artificial eye
Optimal precision for an eye tracker should be calculated with sam-
v
n
u r
u1 X 2 θ12 + θ22 + · · · + θn2 ples originating from a period when the eye is fixating. The only
θRMS = t θi = (3)
n i=1 n way to completely eliminate biological eye movement from the eye
movement signal is to use a completely stationary eye [Holmqvist
et al. 2011, p. 35-40]. Since this is not possible with actual par-
These two precision calculations reflect different factors. Precision ticipants, an artificial eye, which produces the corneal reflections
in particular reacts to vibrations in the environment when calculated required by the eye tracker, is usually employed. This is also
as standard deviation, but not so much when calculated as root mean how many manufacturers measure precision [SR Research 2007;
square (RMS). Figure 6 illustrates this important difference. It is Sadeghnia 2011; Johnsson and Matos 2011]. When assessing pre-
likely that a full standard needs several precision calculations that cision in real data, it is useful to know what the maximum possible
each measure an aspect of data. precision of the system is. If system noise means that baseline pre-
Both accuracy and precision can be computed separately for hori- cision is low, many eye movement measures may not be validly
zontal and vertical dimensions. This may be of particular signifi- recorded. For example, the measurement of velocity profiles will
cance for persons with physical disability. [Cotmore and Donegan be far more effected by low precision than by low accuracy, if the
2011] outlines the development of a gaze controlled interface for a offset in accuracy is uniform across the screen. Likewise, low pre-
user who only has good control of movements in one dimension, for cision may effect which kind of event detection is preferable for the
example. Moreover, the proportion of valid data samples recorded data. The experimental procedure is simple: first of all, calibrate
48
110
first calibrating, you may do that, but be aware that the precision 80
where the human eye(s) would have been, and make sure the arti- 50
2 2.5 3 3.5 4 4.5
ficial eyes are securely attached. Beware of vibration movements Diameter of artificial pupil (mm)
49
These measurement points should be placed across the stimulus
presentation screen or area which the data quality values refer to.
This area is typically the whole monitor, and for standardisation
purposes or when testing an eye tracker (as opposed to the data
recorded for a particular study), it seems reasonable to assume the
monitor provided with the system is the relevant presentation area.
In many eye trackers, accuracy tends to be best in the middle of
the monitor/recording area, and worst in the corners [Hornof and
Figure 9: Example measure of eye-tracker latency: an artificial eye Halverson 2002]. If the purpose is to give a realistic account of data
is positioned so that gaze coordinates can be measured. A single quality across varying stimulus presentations in future experiments
infrared light on one side of the eye tracker is used to to create or for future interfaces, then we should select measurement points
a corneal reflection. This light is turned off, and another one on at positions between calibration points, across the whole area of the
the other side, immediately turned on. This will cause a immediate monitor, varying gaze angle and position across the whole range
change in position of the corneal reflection at a time that is known possible when looking at the screen. Hence, the target points pre-
by software, the time until a change in gaze coordinates has been sented should cover the entire area used to display the experimental
registered is the latency. Reproduced from manufacturer document. stimuli.
50
7 Reporting data quality from experiments
A standardized set of eye data quality measures could be automated
for use in experimental research, as part of the software package and
Calibration target
compared to an independent report for that eye tracking system or
for a similar participant group tested on other systems. Automated
data quality measures which are standardized across systems would
mean that researchers can easily access them as part of running an
ordinary study. They could also be made publicly available by an
independent body, in a similar fashion to specifications for other
Figure 10: Three fixations are detected (labelled ‘valid samples’) computer based technologies. Table 1 shows what we propose such
under the period when the participant is asked to look at the tar- a report could look like in a publication.
get (reproduced from [Nyström et al. submitted]). Which samples
should be included in precision and accuracy calculations? Table 1: Data quality report from a collection of data in an exper-
iment. Precision values reflect the RMS of inter-sample distances.
51
particular purposes more transparent and straightforward for both Behavior Research Methods, Instruments, & Computers 34, 4,
researchers and users of gaze control systems. 592–604.
JAN D REWES , A NNA M ONTAGNINI , G. S. M. 2011. Effects of
8 Conclusion and future work pupil size on recorded gaze position: a live comparison of two
eyetracking systems. Talk presented at the 2011 Annual Meeting
Clearly, standardization work for eye data quality would benefit eye of the Vision Science Society.
movements technology and research in general. This work has al-
ready begun as a collaborative effort of the COGAIN Association, J OHNSSON , J., AND M ATOS , R. 2011. Accuracy and precision
in the form of a technical committee for the standardisation of eye test method for remote eye trackers. Tobii Technology.
data quality. In the absence of agreed standard measures while this K ARSH , R., AND B REITENBACH , F. W. 1983. Looking at looking:
work is underway, there is an immediate benefit in promoting the The amorphous fixation measure. In Eye Movements and Psy-
testing and reporting of data quality as standard in eye movement chological Functions: International Views, R. Groner, C. Menz,
research using the measures outlined above. D. F. Fisher, and R. A. Monty, Eds. Mahwah NJ: Lawrence Erl-
baum Associates, 53–64.
Not all aspects of data quality would benefit from standardization,
however, there are a number of issues which might be better allowed KOMOGORTSEV, O. V., G OBERT, D., JAYARATHNA , S., KOH ,
to freely evolve, including: (a) How accuracy and precision is ac- D. H., AND G OWDA , S. 2010. Standardization of automated
tually achieved, which is proprietary information and core business analyses of oculomotor fixation and saccadic behaviors. IEEE
of manufacturers. (b) What the eye tracker can be used for? What Transactions on Biomedical Engineering 57, 11, 2635–2645.
conclusions can be drawn from the tests? This should be left up to M ULLIN , J., A NDERSON , A. H., S MALLWOOD , L., JACKSON ,
the informed researcher or developer. (c) How can low accuracy or M., AND K ATSAVRAS , E. 2001. Eye-tracking explorations in
precision be accommodated or overcome? Standardising this would multimedia communications. In Proceedings of IHM/HCI 2001:
likely hold back research in the area. Magnifying windows in gaze People and Computers XV – Interaction without Frontiers, Cam-
interaction software or extra post processing of the data in research bridge: Cambridge University Press, A. Blandford, J. Vander-
must be stated, but should not be standardized. (d) Event detection donckt, and P. Gray, Eds., 367–382.
algorithms and filters used in them. Research is not mature enough. N YSTR ÖM , M., AND H OLMQVIST, K. 2010. An adaptive algo-
Many researchers may be unaware of the magnitude of the effect rithm for fixation, saccade, and glissade detection in eye-tracking
of data quality on their research results or interface functionality data. Behavior Research Methods 42, 1, 188–204.
and there are no guidelines on how to go about assessing their data. N YSTR ÖM , M., A NDERSSON , R., H OLMQVIST, K., AND VAN DE
Likewise, manufacturers may be unsure if their in-house test meth- W EIJER , J. submitted. Participants know best–influence of cali-
ods compare to those of other manufacturers or end user’s quality bration method and eye physiology on eye-tracking data quality.
tests. We hope this paper sets a clear target which will have a pos- Journal of Neuroscience Methods.
itive impact on all aspects of eye movement research, eye tracker
development and gaze based interaction. P ERNICE , K., AND N IELSEN , J. 2009. Eyetracking Methodology -
How to Conduct and Evaluate Usability Studies Using Eyetrack-
ing. Berkeley, CA: New Riders Press.
References
S ADEGHNIA , G. R. 2011. SMI Technical Report on Data Quality
A NDERSSON , R., N YSTR ÖM , M., AND H OLMQVIST, K. 2010. Measurement. SensoMotoric Instruments.
Sampling frequency and eye-tracking measures: How speed af- S ALVUCCI , D., AND G OLDBERG , J. H. 2000. Identifying fixa-
fects durations, latencies, and more. Journal of Eye Movement tions and saccades in eyetracking protocols. In Proceedings of
Research 3, 6, 1–12. the 2002 Symposium on Eye-Tracking Research & Applications,
B URMESTER , M., AND M AST, M. 2010. Repeated web page visits New York: ACM, 71–78.
and the scanpath theory: A recurrent pattern detection approach. S CHNIPKE , S. K., AND T ODD , M. W. 2000. Trials and tribulations
Journal of Eye Movement Research 3, 4, 1–20. of using an eye-tracking system. In CHI’00 Extended Abstracts
C OTMORE , S., AND D ONEGAN , M. 2011. ch. Participatory De- on Human Factors in Computing Systems, ACM, 273–274.
sign - The Story of Jayne and Other Complex Cases. S HANNON , C. E. 1948. A mathematical theory of communication.
G AGL , B., H AWELKA , S., AND H UTZLER , F. 2011. Systematic Bell System Technical Journal 27, 379–423, 623–656.
influence of gaze position on pupil size measurement: analysis S HIC , F., S CASSELLATI , B., AND C HAWARSKA , K. 2008. The
and correction. Behavior Research Methods, 1–11. incomplete fixation measure. In Proceedings of the 2008 Sym-
posium on Eye-Tracking Research & Applications, New York:
H ANSEN , D. W., V ILLANUEVA , A., M ULVEY, F., AND M AR -
ACM, 111–114.
DANBEGI , D. 2011. Introduction to Eye and Gaze Trackers. In
Gaze Interaction and Applications of Eye Tracking: Advances SR R ESEARCH. 2007. EyeLink User Manual 1.3.0. Mississauga,
in Assistive Technologies, P. Majaranta, H. Aoki, M. Donegan, Ontario, Canada.
D. W. Hansen, J. P. Hansen, A. Hyrskykari, and K.-J. Räihä, T ECCE , J. 1992. McGraw-Hill Yearbook of Science & Technol-
Eds., no. 2010. IGI Global: Medical Information Science Refer- ogy. New York: McGraw-Hill, ch. Psychology, physiological
ence, Hershey PA, ch. 19, 288–295. and experimental., 375–377.
H OLLAND , M., AND TARLOW, G. 1972. Blinking and mental
load. Psychological Reports 31, 119–127.
H OLMQVIST, K., N YSTR ÖM , M., A NDERSSON , R., D EWHURST,
R., JARODZKA , H., AND VAN DE W EIJER , J. 2011. Eye track-
ing: A comprehensive guide to methods and measures. Oxford:
Oxford University Press.
H ORNOF, A., AND H ALVERSON , T. 2002. Cleaning up systematic
error in eye-tracking data by using required fixation locations.
52