Lecture 12 - Evaluation
Lecture 12 - Evaluation
EVALUATION
The aims
• Explain the key concepts and terms used in evaluation
• Introduce different types of evaluation methods.
• Show how different evaluation methods are used for
different purposes at different stages of the design
process and in different contexts of use.
• Show how evaluators mix and modify methods to meet
the demands of evaluating novel systems.
• Discuss some of the challenges that evaluators have
to consider when doing evaluation.
• Illustrate how methods discussed in Chapters 7 and 8
are used in evaluation and describe some methods
that are specific to evaluation.
www.id-book.com 2
Why, what, where and when to
evaluate
Iterative design & evaluation is a continuous
process that examines:
• Why: to check users’ requirements and that they
can use the product and they like it.
• What: a conceptual model, early prototypes of a
new system and later, more complete prototypes.
• Where: in natural and laboratory settings.
• When: throughout design; finished products can be
evaluated to collect information to inform new
products.
www.id-book.com 3
Bruce Tognazzini tells you why you need to
evaluate
“Iterative design, with its repeating cycle of
design and testing, is the only validated
methodology in existence that will consistently
produce successful results. If you don’t have
user-testing as an integral part of your design
process you are going to throw buckets of
money down the drain.”
www.id-book.com 4
Types of evaluation
• Controlled settings involving users, eg
usability testing & experiments in
laboratories and living labs.
• Natural settings involving users, eg field
studies and in the wild studies to see
how the product is used in the real world.
• Settings not involving users, e.g. to
predict, analyze & model aspects of the
interface analytics.
www.id-book.com 5
Living labs
• People’s use of technology in their everyday
lives can be evaluated in living labs.
• Such evaluations are too difficult to do in a
usability lab.
• Eg the Aware Home was embedded with a
complex network of sensors and audio/video
recording devices (Abowd et al., 2000).
www.id-book.com 6
Usability testing & field studies can
compliment
www.id-book.com 7
Evaluation case studies
• Experiment to investigate a computer game
• Crowdsourcing
www.id-book.com 8
Challenge & engagement in a
collaborative immersive game
• Physiological measures
were used.
• Players were more engaged when playing
against another person than when playing
against a computer.
• What precautionary measures did the evaluators
take?
www.id-book.com 9
Challenge & engagement in a
collaborative immersive game
www.id-book.com 10
What does this data tell you?
www.id-book.com 11
Why study skiers in the wild ?
www.id-book.com 12
e-skiing system components
www.id-book.com 13
What did we learn from the case
studies?
• How to observe users in natural settings.
• Unexpected findings resulting from in the wild
studies.
• Having to develop different data collection and
analysis techniques to evaluate user experience
goals such as challenge and engagement.
• The ability to run experiments on the Internet that
are quick and inexpensive using crowdsourcing.
• How to recruit a large number of participants using
Mechanical Turk.Test text
www.id-book.com 14
Evaluation methods
Method Controlled Natural Without users
settings settings
Observing x x
Asking users x x
Asking x x
experts
Testing x
Modeling x
www.id-book.com 15
The language of evaluation
Analytics Informed consent form
Analytical evaluation In the wild evaluation
Living laboratory
Biases
Predictive evaluation
Controlled experiment Reliability
Crowdsourcing Scope
Ecological validity Summative evaluation
Expert review or crit Usability laboratory
User studies
Field study
Usability testing
Formative evaluation Users or participants
Heuristic evaluation Validity
www.id-book.com 16
Participants’ rights and getting their
consent
• Participants need to be told why the
evaluation is being done, what they will be
asked to do and their rights.
• Informed consent forms provide this
information.
• The design of the informed consent form, the
evaluation process, data analysis and data
storage methods are typically approved by a
high authority, eg. Institutional Review Board.
www.id-book.com 17
Things to consider when
interpreting data
• Reliability: does the method produce the
same results on separate occasions?
• Validity: does the method measure what it is
intended to measure?
• Ecological validity: does the environment of
the evaluation distort the results?
• Biases: Are there biases that distort the
results?
• Scope: How generalizable are the results?
www.id-book.com 18
Key points
• Evaluation and design are very closely integrated.
• Some of the same data gathering methods are used in
evaluation as for establishing requirements and
identifying users’ needs, e.g. observation, interviews,
and questionnaires.
• Evaluations can be done in controlled settings such as
laboratories, less controlled field settings, or where
users are not present.
• Usability testing and experiments enable the evaluator
to have a high level of control over what gets tested,
whereas evaluators typically impose little or no control
on participants in field studies.
www.id-book.com 19
Lecture 12 (Part B)
EVALUATION
The aims:
• Explain how to do usability testing
www.id-book.com 21
Usability testing
• Involves recording performance of typical users
doing typical tasks.
• Controlled settings.
• Users are observed and timed.
• Data is recorded on video & key presses are
logged.
• The data is used to calculate performance times,
and to identify & explain errors.
• User satisfaction is evaluated using
questionnaires & interviews.
• Field observations may be used to provide
contextual understanding.
www.id-book.com 22
Experiments & usability testing
www.id-book.com 23
Usability testing & research
Usability testing Experiments for research
www.id-book.com 24
Usability testing
• Goals & questions focus on how well users
perform tasks with the product.
www.id-book.com 25
Testing conditions
• Usability lab or other controlled space.
• Emphasis on:
– selecting representative users;
– developing representative tasks.
• 5-10 users typically selected.
• Tasks usually around 30 minutes
• Test conditions are the same for every
participant.
• Informed consent form explains procedures and
deals with ethical issues.
www.id-book.com 26
Types of data
Time to complete a task.
www.id-book.com 29
Portable equipment for use in the
field
www.id-book.com 30
Portable equipment for use in the
field
www.id-book.com 31
Mobile head-mounted eye tracker
www.id-book.com 32
Usability testing the iPad
• 7 participants with 3+ months experience with iPhones
• Signed an informed consent form explaining:
– what the participant would be asked to do;
– the length of time needed for the study;
– the compensation that would be offered for participating;
– participants’ right to withdraw from the study at any time;
– a promise that the person’s identity would not be disclosed; and
– an agreement that the data collected would be confidential and
would be available to only the evaluators
• Then they were asked to explore the iPad
• Next they were asked to perform randomly assigned specified
tasks
www.id-book.com 33
Examples of the tasks
www.id-book.com 34
Example of the equipment
www.id-book.com 35
Problems and actions
• Problems detected:
– Accessing the Web was difficult
– Lack of affordance and feedback
– Getting lost
– Knowing where to tap
• Actions by evaluators:
– Reported to developers
– Made available to public on nngroup.com
• Accessibility for all users important
www.id-book.com 36
Experiments
• Test hypothesis
• Predict the relationship between two or
more variables.
• Independent variable is manipulated by the
researcher.
• Dependent variable influenced by the
independent variable.
• Typical experimental designs have one or
two independent variables.
• Validated statistically & replicable.
www.id-book.com 37
Experimental designs
• Different participants - single group of
participants is allocated randomly to the
experimental conditions.
• Same participants - all participants appear
in both conditions.
• Matched participants - participants are
matched in pairs, e.g., based on expertise,
gender, etc.
www.id-book.com 38
Different, same, matched
participant design
Design Advantages Disadvantages
www.id-book.com 39
Field studies
• Field studies are done in natural settings.
• “In the wild” is a term for prototypes being used
freely in natural settings.
• Aim to understand what users do naturally and
how technology impacts them.
• Field studies are used in product design to:
– identify opportunities for new technology;
– determine design requirements;
– decide how best to introduce new technology;
– evaluate technology in use.
www.id-book.com 40
Technology for context-aware field
data collection
www.id-book.com 41
An in the wild study:
UbiFit Garden
www.id-book.com 42
Data collection & analysis
www.id-book.com 43
Data presentation
• The aim is to show how the products are
being appropriated and integrated into
their surroundings.
• Typical presentation forms include:
– Vignettes,
– Excerpts,
– Critical incidents,
– Patterns, and narratives.
www.id-book.com 44
Key points
• Usability testing takes place in controlled usability labs or temporary labs.
• Usability testing focuses on performance measures, eg. how long and how many errors
are made when completing a set of predefined tasks. Indirect observation (video and
keystroke logging), user satisfaction questionnaires and interviews are also collected.
• Affordable, remote testing systems are more portable than usability labs. Many also
contain mobile eye-tracking and other devices.
• Field studies are evaluation studies that are carried out in natural settings to discover
how people interact with technology in the real world.
• Field studies that involve the deployment of prototypes or technologies in natural settings
may also be referred to as ‘in the wild’.
• Sometimes the findings of a field study are unexpected, especially for in the wild studies
in which explore how novel technologies are used by participants in their own homes,
places of work, or outside.
www.id-book.com 45