Game Usability
Game Usability
Game Usability
Advice from the Experts for
Advancing the Player Experience
Katherine Isbister
Noah Schaffer
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any
form or by any means—electronic, mechanical, photocopying, scanning, or otherwise—without prior
written permission of the publisher.
Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford,
UK: phone: (44) 1865 843830, fax: (44) 1865 853333, E-mail: [email protected]. You may
also complete your request online via the Elsevier homepage (http://elsevier.com), by selecting
“Support & Contact” then “Copyright and Permission” and then “Obtaining Permissions.”
ISBN: 978-0-12-374447-0
08 09 10 11 12 5 4 3 2 1
Foreword
Randy Pagulayan Dennis Wixon
Microsoft Game Studios Microsoft Surface
One size fits all – it’s all story, it’s all mechanics
One of the pitfalls of thinking about game design and research is the seduction of
a dogmatic approach. The arguments for dogmatic approaches are often passionate
and persuasive containing compelling examples. Typical example of a dogma is “the
story is the most or only critical element of game design”. After all, the argument
goes, shooting alone is not compelling unless the story is good. Alternatively one
could argue that mechanics are the only critical element of game design. What’s the
story line for Guitar Hero or Hexic?
Like most dogmas these assertions contain a kernel of truth that has been over-
extended. Making a great game depends on many elements (mechanics, story, vis-
uals, sound, characterization, etc.) and the relative importance of these elements
varies from game to game and genre to genre. In addition the elements are comple-
mentary and not mutually exclusive. It’s time to think of games holistically. When
we think of movies, novels, paintings, and great meals, we already know that all
the elements combine for great experience. Let’s apply a lesson from Gestalt psy-
chology to games, “The whole is different than the sum of the parts.” There is a
“Pragnanz*” in game design where it all comes together; this is just like when we
look at a set of dots but perceive the closure of a circle. The underlying principle
vii
FOREWORD
is that it’s not the dots but their relationship that matters. The important corollary
for game design is that a single failed part (something out of place) can disrupt an
otherwise great experience. We can start to think of game design in terms of what’s
blocking the fun. In other words, what design element was out of place that pre-
vented closure on the intended experience? Let’s move away from useless dogma
and take the road to more productive thinking.
viii
FOREWORD
but we also must be honest with ourselves in terms of what truly is important
to games. Otherwise, we end up with researchers and practitioners performing
research that ultimately serves no one but ourselves (which isn’t very good).
Promise
After a promising start in the early 90’s, research to determine which usabil-
ity method was the most effective in the real world abruptly stopped. This set-
back has been partly attributed to an unfortunate article by Gray and Stalzman
(1998). In it they argued that evaluating user research methods in busi-
ness required a classical experimental approach. While this article may have
provoked some interesting discussion, it served to stifle a promising area
of research on the relative effectiveness of methods. Fortunately, research
on methods in games has proceeded in a real world context and has taken
a case-study approach as opposed to a formal experimental approach.
Several authors have described the contribution of a research design col-
laboration to the commercial success of games. This is a trend to be supported
ix
FOREWORD
and encouraged. Perhaps games research can reinvigorate efforts to evaluate meth-
ods for productivity applications in the context of real world products and tools.
Given the sterility of formal experimental methods for real world applications, it
would be a welcome change.
* Editors’ note: Pragnanz is a term from Gestalt theory in Psychology, meaning a sort of ordered and
balanced image that the mind pulls together when perceiving and making sense of the world.
x
CHAPTER
ONE
Introduction
Katherine Isbister and Noah
Schaffer
1.1 Why Usability Now?
More and more game developers (and educators in the field of game development)
are talking about user research and usability. There have been articles in indus-
try venues such as Gamasutra, and workshops on usability at the annual Game
Developers Conference. You may be wondering what exactly the excitement is
about, and what it has to do with your daily challenges as a game developer.
There are many reasons for the increasing interest in user research for games
that led us to feel the time was right for an edited volume about what’s state-of-the-
art in this emerging field:
● Developers and publishers are trying to reach out to broader audiences. User
research becomes more crucial to development teams when the target audience
is someone other than people who closely resemble the developers themselves.
● Game development teams have grown. User research can help to keep larger
teams “on track” in their efforts—it’s harder to manage by intuition when one
person can’t have all the many facets of the design in their head.
● Proliferation of platforms. Designing for new input modes and modified plat-
forms, or for many platforms at once, creates usability problems that user
research can help to anticipate and lessen.
For these and other reasons, more and more game developers are turning to tactics
that emerged from the study of productivity software, to help fine-tune their efforts.
3
CHAPTER ONE • INTRODUCTION
person trying to accomplish the tasks at hand. Making software usable means paying
attention to human limits in memory, perception, and attention; it also means antici-
pating likely errors that can be made and being ready for them, and working with
the expectations and abilities of those who will use the software. Traditional usability
testing, then, has been testing with people in the target user group to see whether the
software meets expectations in these practical concerns about task. In more recent
years, productivity software designers have also become interested in a broader sense
in the overall user experience—what it is like to interact with the software, including
how engaging the experience is, regardless of the end goals. This leads to testing tech-
niques that are concerned with qualities such as engagement, flow, and fun—qualities
that bring user research closer to the primary concerns of game developers.
Game developers have evolved two main tactics for collecting play feedback and
reincorporating it into design: playtesting and QA (quality assurance). In playtests,
the focus is on whether the game is fun to play, but also where players may be get-
ting stuck or frustrated (similar to usability test concerns). Playtests are conducted
when there’s a playable version of the game, but as early as possible in the process,
to help correct any issues before full production. QA is testing done fairly late in the
development process, focused mostly on catching bugs in the game software, but
also aiding in tuning play, for example adjusting the difficulty level of the game.
In this book, you’ll see that each author has a slightly different way of using
these terms. We see this as an indication that the field is still evolving—the differ-
ences reflect the origins of each author’s knowledge and practice. If you keep in
mind the broad definitions above, you should be able to follow along regardless of
these variations.
4
1.5 ACKNOWLEDGMENTS
satisfaction with games. Whatever your resource level and interest level, we believe
you’ll find something of use in these pages, including:
● Advice for how (and why) to fire up your company about usability and user
research (see Chapter 2),
● Bread-and-butter techniques that have broad relevance (see Part II),
● Special contexts, such as casual games, and types of players, such as players in
other cultural markets, for example Japan (see Part III),
● Advanced tactics to try out, such as biometrics and instrumentation (see Part IV),
● A use matrix that helps you decide what techniques may be appropriate to the
project and phase you are in (see Part V),
● Interesting perspectives on how gaming has influenced the broader world of
design and user research (two interviews in Part V).
If you are someone with an existing basic knowledge of usability, interested in new
techniques.
You may want to skip to Part IV of the book, to learn about methods in the van-
guard of user research for games.
1.5 Acknowledgments
We’d like to thank Regina Bernhaupt for the wonderful International Conference on
Advances in Computer Entertainment Technology (ACE) 2007 workshop on meth-
ods for evaluating games, at which quite a few of the book’s authors were gathered.
Regina has been a valuable advocate for bringing user experience in games to the
conversation in traditional user research circles. We’d also like to thank the staff at
Morgan Kaufmann for their help in shaping this book, and in bringing it to press.
Thanks also to Jason Della Rocca, for his excellent editorial comments along the way.
5
CHAPTER
TWO
Organizational
Challenges for User
Research in the
Videogame Industry:
Overview and Advice
Mie Nørgaard is a Ph.D. fellow in human computer
interaction (HCI) at the University of Copenhagen. Her
research interests include collaborative and organiza-
tional aspects of user research and experienced-focused
HCI. With her background in archaeology, she is also
curious about modern technological artefacts and their
potential for supporting everyday life.
7
CHAPTER TWO • ORGANIZATIONAL CHALLENGES FOR USER RESEARCH
2.1 Overview
In this chapter, we take a look at organizational challenges for third-party develop-
ers who are interested in implementing and conducting HCI-related user research,
such as usability testing, in a game development setting. We discuss the challenges
related to justifying the return on investment of user research, formalizing work pro-
cedures involving user research, and the building of cross-professional relationships
amongst key stakeholders to user research. Furthermore, we also discuss the chal-
lenges related to the fact that many games developers are owned or closely affiliated
with a publisher. Through the lenses of a questionnaire survey including members
from the game industry, we specifically look at the relationship between third-party
developers and the publisher’s marketing department, and investigate how and
to which extent these two parties collaborate on user research issues. During the
chapter we also present concrete advice on how to tackle the various challenges
mentioned.
2.2 Introduction
There are many potential rewards for the videogame developer who wants to imple-
ment methods to evaluate usability or user experience in the game development
process, but there are also a multitude of challenges. Not only are games compli-
cated pieces of software, the nature of their use is also very different from the use of
traditional task-oriented software, which is what most usability evaluation methods
are designed for, and this poses a challenge for user researchers in the game indus-
try. Well-known usability measures, such as efficiency, effectiveness, and satisfac-
tion (such as identified by ISO 9241–11) can only partially give a picture of how well
a game performs. In fact, one may wonder what “efficiency” in relation to videog-
ames actually means, or whether the term makes sense in this context at all (Barr
et al., 2007; Pagulayan et al., 2004; Jørgensen, 2004; Philips, 2006; Bernhaupt et al.,
2007). But adapting methods or designing new ones are not the only challenges for
user research in the game industry. In this chapter, we discuss organizational chal-
lenges for user research, such as justifying return on investment, formalizing work
procedures, and the building of cross-professional relationships. We further identify
a challenge that in some aspects is unique for the game industry; it is a challenge
that is connected to the developer-publisher relation and springs from the fact that
many game developer studios are either formally owned by or are affiliated with
a publisher. In this structural setup, the publisher handles, for example, market-
ing and distribution, whereas the developer handles the actual development of the
game.
Introducing user research in the form of usability or user experience evaluation
at the developer site can potentially create a conflict between the publisher and the
developer, because both parties—sometimes simultaneously—conduct user studies.
Practically speaking, the user research workers at the developer site will do research
8
2.3 THREE WELL-KNOWN CHALLENGES
with, for instance, a usability focus, whereas the publisher mainly focuses its user
research on marketing issues. At the very least, such a situation will require intense
coordination between the publisher and the developer site, because they need to
agree on, for example, who the users are, and what the consequences of particu-
lar results should be; i.e. if and how to use the results in the development and/or
marketing of the game.
This challenge of coordinating marketing and development user research efforts
is also present in the software industry, but because of the videogame industry’s
close historical and structural ties with the toy and entertainment industry, the
marketing-development relation and power balance in games development are dif-
ferent from those in the software industry. This poses unique challenges to imple-
menting user research methods in videogame R&D.
In the course of this chapter, we will use the term “UR champion” to describe the
person who incorporates—or wishes to incorporate—user research in the develop-
ment process at the developer site. In terms of job roles in the game industry, such
a person may belong to level design, QA, management, etc. The goal of the chap-
ter is to discuss the organizational challenges such a person may encounter and to
provide tips for how to work around them.
We wish to emphasize that the results and advice presented in the following
should not be understood as devious tactics to gain world domination for UR cham-
pions at the expense of developers, for instance. Neither should it be understood
as an attempt to point out either developers or publishers as antagonists—both
can be quite positive towards user research. On the contrary, it is the authors’ firm
belief that user research should help and enable game developers, as well as pub-
lishers and marketing, to develop, market, and sell better games. Accordingly, the
challenges and advice presented in the following are aimed at how to manage
the organizational aspects of implementing and maintaining a new methodology to
the benefit of all parties.
9
CHAPTER TWO • ORGANIZATIONAL CHALLENGES FOR USER RESEARCH
10
2.3 THREE WELL-KNOWN CHALLENGES
of return they can expect, and relate this to the size of the investment they have
to make.
Developers, in particular, may worry that user research will lead to letting the
users (or the UR champion) design the games instead of the developers. On the con-
trary, user research should support and enable the developers’ vision for the game
rather than take away responsibility and competence, and this should be commu-
nicated clearly to the developers. Furthermore, as project schedules are often very
tight on time, a reasonable worry on behalf of the developers is that user research
will add more hours and stress to an already heavy workload. Therefore, the UR
champion needs to present arguments that user research—although naturally requir-
ing some investment of time—enables the developer to identify necessary design
changes much earlier than without user research, and thus saves time in the end.
Such an argument fits developers as well as management.
Another persuasive argument is hidden in including developers in the prepara-
tion, execution, and analysis of user test sessions. This will demystify user research
and help developers understand what user research methods are, what kind of
results they can provide, and which questions they might help answer. Because
of time constraints, it may not be easy to convince developers to take part in user
research. This makes it all the more important to emphasize that the developers’
knowledge about the game can be invaluable for the analysis of the research data,
which calls for the developers’ active participation. Furthermore, from a psychologi-
cal point of view, developers are more likely to act on evaluation results when they
have contributed to creating them (Benton, Kelley, & Liebling, 1972; Schindler, 1998)
which makes the involvement of developers in user research even more important.
Whereas developers primarily will focus on the production side of the game,
management will additionally be interested in how user research can help the com-
pany in the marketplace. The UR champion could, therefore, seek to document
current industry trends—such as a diversifying market with new types of users,
escalating production costs etc.—and use these as an argument for user research.
A well-supported argument that states that user research can align the game better
to the market, as well as cut costs, is an efficient argument that states: We cannot
afford not to implement user research if we are to remain competitive.
As a last persuasive factor, it is important that results start rolling in fast after
the first user tests, and that these results are both easily communicated, relatively
uncontroversial, and easily translated into action points. For instance: A lengthy
ethnography-inspired field-work study—although potentially yielding interest-
ing and insightful results—is difficult to validate, hard to understand for non-
ethnographers, may require deep (and thus complicated) intervention in the game
design, and prolongs the time between investment and return. So, introducing user
research through thorough ethnographic studies will make it harder for develop-
ers and management to accept user research as adding tangible value to develop-
ment. Instead, much can be gained by some amount of strategic planning. Initially,
the UR champion could focus on methodologies, such as basic usability testing,
which focus on objective data collection criteria and/or relatively isolated parts of
11
CHAPTER TWO • ORGANIZATIONAL CHALLENGES FOR USER RESEARCH
the game. As these methods gain momentum, the UR champion could then start
expanding the user research toolkit in order to gradually expose colleagues to other
user research methods and train colleagues to think in terms of user experience.
Skeptics may object that it is not the job of the user researcher to pick and
choose strategically from the pool of results or tools, and that UR champions have
an obligation to present whatever results they uncover, despite any practical or
political complications. While this is certainly valid from a purely academic stand-
point, we do advice practitioners to at least consider the option of a more pragmatic
approach After all, firing all your artillery and using all your ammunition at level 1
may not be the best strategy to secure success for user studies in the long run.
Key takeaways:
● Collect real-world examples of successful user research practices in the game
industry and share them.
● Tailor return-on-investment arguments to fit key stakeholders’ individual and
professional needs and goals.
● Be realistic and choose battles wisely: Start off by implementing user research
methods with focus on data and objectivity, as well as a high and reliable
success rate. This will enable you to build return-on-investment credibility fast
and open doors to introducing new user research methods.
12
2.3 THREE WELL-KNOWN CHALLENGES
13
CHAPTER TWO • ORGANIZATIONAL CHALLENGES FOR USER RESEARCH
showstoppers that will leave the game entirely unplayable. Instead, they describe
issues that, if resolved, will improve on more intangible aspects of the game such
as the overall player experience. That being the case, unless formalized work proce-
dures are in place before results start coming in from user research, there is a great
risk of the issues being lost in translation or down-prioritized because they are con-
sidered less important than bugs. Ultimately, this may very well mean that usability
or user experience issues will end up not being resolved.
Related to this, specifications and best practices on how the UR champion shares
his or her results with colleagues are needed to improve user research’s impact on
the product. Current literature confirms that the means by which results from usa-
bility evaluations are presented and communicated to developers are highly deter-
minant for how they are received (Nørgaard & Høegh, 2008; Nørgaard & Hornbæk,
2008). This will vary from organization to organization, and from team to team, so
there will be an element of trial and error and gut feeling connected to this. One
way of helping such processes along is to agree on who is responsible for and has
the mandate to make decisions about usability priorities, who can instigate user
research, to whom the results are handed over, and how these results are handled.
It is not the authors’ opinion that user research results should automatically warrant
a fix as would the discovery of a bug; developers may have good reasons for reject-
ing a proposed redesign. Nevertheless, there should be a clear work-flow for the
handling of user research results and recommendations for redesign. To make this
easier, we recommend implementing one method and workflow at a time.
Key takeaways:
● Identify key development components and milestones that user research should
connect to.
● Build standardized procedures for user research: Make it an integral part of the
development process, not just an add-on that can be dismissed when time is tight.
● Use best practices and gut feeling for which format should be used to share the
results. Remember that user research should be supporting the developers’ goals
and the overall company strategy.
● Start slowly, integrating in tiers or one method at a time.
14
2.3 THREE WELL-KNOWN CHALLENGES
the successful UR champion has an eye for strategic planning, lobbyism, and for
spotting influential colleagues.
However, the relations with influential managers are not the only ones that UR
champions need to nurse. Because user research ultimately will impact most of the
development processes, the UR champion needs to develop fruitful relations with a
whole range of professionals. For example, because game developers’ visions for a
game are rarely entirely documented, and because UR champions depend on know-
ing these visions to understand which challenges in a game are intended and which
are actual problems, close cooperation with game developers is important.
Having said that, getting the relevance of user research acknowledged by game
developers may be fairly difficult. So, apart from justifying the return on investment,
UR champions should also pay close attention to the professional and personal rela-
tionships that exist between themselves and other stakeholders in an organization.
When seeking to nurse cross-professional relations, UR champions should
pay attention to the fact that different professionals have different aims and job
roles, and make an effort to build tight relationships with stakeholders bearing
that in mind. Personal relationships are—obviously—also very important because
good personal relations help bridge conflicting interests and generally facilitate on-
going informal communication, the latter being very helpful from a proactive point
of view.
To a critical eye, teaming up with influential colleagues and making alliances
might seem a little too Machiavellian. However, the point is not to trick people or to
force an opinion upon someone, the point is to build and nurse good relations with
colleagues in order to aid the development of successful games.
Key takeaways:
● Think strategically: Do lobby work and strive to make alliances with influential
colleagues. Remember, an influential colleague is not always the one with the
fancier job title.
● Talk with the game developers and listen to their thoughts and ideas. This may
sound trivial, but to succeed with user research you need to understand their
visions for the game and you will not if you solely correspond per email.
● Nurse professional and personal relationships continually—not only when you
need favors or support. An informal chat once in awhile will help get attention
and goodwill when push comes to shove.
Based on our own research and experiences, we have presented some key organ-
izational challenges for UR champions working to introduce user research in the
game industry. These challenges are not unlike the challenges that any software
company will encounter in the process of maturing its view on usability and the
processes for conducting user research. Further discussions of such themes can be
found in Helander et al.’s Handbook of Human Computer Interaction (Helander,
Landauer, & Prabhu, 1997).
15
CHAPTER TWO • ORGANIZATIONAL CHALLENGES FOR USER RESEARCH
16
2.4 THE PUBLISHER AND THE DEVELOPER
for independent developers to get into the console game market without a powerful
publisher to get them through the screening process.
From an organizational perspective, this may be considered a cornerstone in the
relationship between publishers and developers: To get a game onto the market,
developers now had to go through a publisher (at least when it comes to AAA con-
sole games). In this respect, the videogame industry is actually closer to the music
industry than the software industry. From the nineties and on the bond between
developer and publisher tightened, and today many development studios are
owned by a publishing company that manages the distribution and marketing of a
videogame.
In terms of games evaluation, such an organizational set-up often entails that
the publisher will handle the user-centered evaluation (via marketing methodolo-
gies) and the developer most of the technical evaluation (quality assurance) through
functional tests or bug-testing (Kline, Dyer-Witheford, & De Peuter, 2003). Such
a distribution of responsibilities seemingly leaves many critical decisions about user
research in the hands of the publisher’s marketing department. This is not a bad
thing per se, but what happens when someone decides to evaluate usability or user
experience at the developer site?
Based on the assumption that no one likes to give away power or influence we
expected that such actions might not be welcomed by the publisher’s marketing
department and that some rivalry might occur between publisher and developer
on that account. At the very least, we assumed there would be an increased need
for coordinating the user research efforts at the developer studio and the publisher
respectively.
To investigate this and to better understand the reality and challenges for user
research at the developer site we conducted an informal survey in the videogame
industry.
17
CHAPTER TWO • ORGANIZATIONAL CHALLENGES FOR USER RESEARCH
great interest in the topic and regretted not to have time to participate. Since user
researchers and other professionals in the game industry hold myriads of job titles
we dare not comment on our sample size or the quality of the answers. However,
we find that the answers cover both large corporations with many well-known titles
in the past and smaller ones with less experience. Furthermore, informal “off the
record” conversations with people from the industry confirm the findings.
Participants were offered to comment on our findings and discussion in order
to increase the relevance and validity of these. Only one participant provided com-
ments and ideas for improvements.
On average, the participants had worked 7.9 years with user research or related
work in the game industry. Ten of eleven participants had a background in univer-
sity studies like history, physics, engineering, computer science, cinema, or psychol-
ogy—though some had never finished their degree. One had other education.
Seven of the participants were third-party developer studios, that is, a game
developer that works under contract with a publisher for each game. Four partici-
pants were publishers/developers or mainly publishers. Table 2.1 shows a descrip-
tion of the companies and participants. The average age of the companies/game
departments was eleven years, the youngest having existed for five and the eldest
for twenty-two years.
18
2.4 THE PUBLISHER AND THE DEVELOPER
Developers site BK
Mostly developer, but some at the publisher site CGI
Shared equally between developer and publisher DEHJ
Mostly publisher, but some at the developer site F
Publisher site A
Developer BCGJK
Publisher EA
Both FHI
No answer D
The responses suggested that user research procedures in the game industry
are quite diverse, but that publisher and developer in most cases share the work
between them (see Table 2.2 and Table 2.3 for details).
We also asked which findings or issues the participants look for in the
user research they had knowledge of. Table 2.4 shows which focus areas were
described by participants. UI, game play, and concept are the focus areas of
most user research. It is interesting to see that developer B, which has very lim-
ited cooperation with their publisher, and thus has all responsibilities for user
research, also deploys methods with traditional marketing foci, such as market
analysis.
Table 2.5 shows the multitude of methods UR champions use to answer their
research questions. What is apparent about the answers is that most participants had
difficulty describing the methods they use. We expect that “usability test” describes
some sort of practice related to the think-aloud protocol, whereas “playtest” may
mean observing or otherwise monitoring users play. Thus, “conducting playtests”
may be the same as “observing play sessions.” If this is true, observing users
play the game is the most commonly used method deployed and, in fact, the only
method used by some of the participants. The lack of clarity in terms of describing
the methods used to conduct user research may be because participants were not
familiar with research terminology or simply because of a lack of generally agreed-
upon naming conventions. However, we are more prone to explain it with user
19
CHAPTER TWO • ORGANIZATIONAL CHALLENGES FOR USER RESEARCH
Focus area ID
Method ID
research practice being improvised and hardly ever formalized or put into system.
Notable exceptions are C and K. They specifically mentioned the aim to triangulate
methods and combine qualitative and quantitative methods in order to obtain both
objective and subjective data. As discussed earlier, we urge UR champions to for-
malize their procedures, describe the methods they use, and which questions these
particular methods can help them answer. Such work will yield the most reliable
results and thus boost the credibility of the user research. Without this formaliza-
tion work, the results of the user research are more vulnerable to invalid “common
sense” objections.
Because we wanted to investigate the relationship between publisher/marketing
and developer, we asked participants if and how user research results were shared.
Five participants answered that they hardly have any communication with the pub-
lisher’s marketing department about user research results. One developer mentioned
20
2.4 THE PUBLISHER AND THE DEVELOPER
being very interested in getting data from the marketing department and another
that the publisher was unlikely to be interested in the developer’s user research.
With regards to sharing of results, developer B mentioned how they mostly
communicate early user research results to the publisher as an attempt to make
them “join the adventure.” Such a sharing of results thus seems mostly motivated
by the wish to land a contract. Along the same lines, developer D described how
both developer and publisher manipulate their user research results before sharing
them with the other party. J described how user research results from the publisher
are shared with the developer studio and vice versa, but also suggested that not
all results were to be shared with everyone. Related to this, K described how user
research results were kept from the marketing department on purpose—the ration-
ale behind this was that marketing tended to misinterpret preliminary results and
base marketing and approval decisions on them, thus effectively causing develop-
ment teams to not want to work with the user researchers. Similarly, K describes
how development teams only listened to marketing user research results (such as
focus groups) so as to please marketing with the ultimate goal of ensuring a market-
ing budget for the game; not really to make any changes in the game design based
on the results.
These answers may suggest that some of the communication and relationship
between developer and publisher is not primed to actually facilitate better collabo-
ration on the shared goal, that of making a good and successful game. Rather, they
seem to suggest that developers do not always consider the publisher a friendly
colleague but rather a partner that needs to be maneuvered to fit the developer’s
goals. And that the same goes for the publisher. On the other hand, it is only to be
expected that developer studios and publishers see the world from different perspec-
tives, and therefore it is no surprise that they have different goals for user research.
Nevertheless, this points directly to a need to coordinate user research efforts.
Participants described how the relationship between developer and publisher
isn’t all roses. The lack of knowledge about what occurs on the other side of the
fence impedes and slows down production and coordination. Some suggest this
affects creativity and probably, in particularly unfortunate cases, ultimately sales.
One developer explained how the very nature of being a third-party developer
means that the publisher has the most rights to the game. And this is suggested to
cause some imbalance in the relationship. Conversely, a publisher described being
helplessly dependent on the developer to implement the changes that arise from
for example focus tests. This is also suggested to cause unevenness in the relation-
ship, mainly because the timing of user research is hugely important to the relation-
ship and that fights are bound to break out if user research results are forced into
the development at too late a stage in the development process. As an example,
changes that will require large investments are mentioned as an issue giving rise to
severe challenges for publisher-developer cooperation.
In all fairness it should be emphasized that three participants from publish-
ing or publishing/developer companies generally were very satisfied with their
communication with the developers and the planning of user research. However,
21
CHAPTER TWO • ORGANIZATIONAL CHALLENGES FOR USER RESEARCH
the three companies are fairly large and experienced publishers, and this may
explain why they pay attention to and enjoy success implementing effective work
procedures around user research. Since the developers in the study seemed more
concerned about the state of the communication and collaboration between devel-
oper and publisher, we do speculate whether developers in general feel more inse-
cure or unsatisfied simply because they are the less powerful party of the two.
One publisher explained that user research results rarely get completely ignored,
and that developers often have a good reason for putting results on hold. Such
a comment shows a rare and valuable understanding for colleagues’ points of view,
and confirms that much is accomplished by trying to understand colleagues’ moti-
vations and goals. This supports the importance of building and nursing the cross-
professional and personal relationships.
Another publisher specified how not being able to communicate directly with
a third-party developer was a huge challenge. Direct, informal, and frequent com-
munication was claimed to be crucial to the publisher, who needs to be up to date
with the development process and recent game builds. “Getting to know each other”
secures that colleagues are accessible, that they listen, and that they are honest in
their communication, the publisher suggested, emphasizing the value of personal
relationships. In this context, the building of cross-professional relationship should
be seen both in an intra-organizational and trans-organizational context.
A developer explained how there seems to be a semantic gap between devel-
opment and marketing: that the developer seemingly has difficulty understanding
what exactly marketing does and vice versa. This was confirmed by other partici-
pants that mentioned a need for creating a better understanding for user research
methods on each side. Such efforts should provide greater transparency for what
research is being done in each camp and what questions it is supposed to answer.
Related to this, one participant suggests that the marketing department needs
a higher level of methodological rigor in their user research and an increased aware-
ness of what methods can assess what questions: Focus groups should not be used
to validate design, but instead function as a point of departure for brainstorming
design ideas.
22
2.4 THE PUBLISHER AND THE DEVELOPER
separation between developers and users—as in cases where, for instance, a market-
ing department monopolizes user contact—presents a major organizational obsta-
cle for design in contract development. Gould and Lewis discuss similar issues in
their classic paper on key principles of design (Gould & Lewis, 1985). Other records
describe how marketing departments are reluctant to share the opportunity to get
in firsthand contact with users, or perhaps forbid other departments to do it all
together (Grudin, 1991; Frøkjær, 1987).
Based on some of the anecdotes we have heard in the game industry, we won-
dered if the same was true for the relationship between a publisher’s marketing
department and a third-party game developer. While our study clearly contains
examples of it, the results are not univocal: The horror story frequency in the
answers was in fact very low. However, some of the results as well as informal com-
munications we have had with participants suggested that perhaps the developer-
publisher relationship is a bit more complicated than described by the answers we
received. We have come across anecdotes that imply that it may be a challenge for
some UR champions to get to do user research on the developer site at all. Some of
the developers in this study have also described their relationship to the publisher’s
marketing department as being a bit tense, and it was suggested that user research
results sometimes were kept away purposefully from the marketing department.
Some also implied that the publisher’s marketing department considers a videogame
the publisher’s property, and behaves jealously if attempts are made from the devel-
oper site to take control of user research. In this way, the historical structures that
lie behind the publisher/developer relationship, where the publisher often decides
the fate of the games, potentially makes it harder for the UR champion to imple-
ment user research at the developer site, since prior experience with user research
methods such as focus groups (performed by marketing) in some cases has created
mistrust against user research methods in general.
Once again, the overall challenge as we see it, is that the developer’s UR cham-
pion and the publisher’s marketing department both work with user research, and
determine which methods should be used for what insights. However, even though
they may share the goal to produce a good and successful game, their focus areas,
methods, challenges, and timing are different. This should be crystal clear to any-
one who does user research, but unfortunately it is not always.
The developer is often basically interested in how the game works, how fun it is,
how difficult it is, and so on. The publisher’s marketing department, on the other
hand, is basically interested in how the game fits the target audience and the market
in general, how it is presented to potential buyers, and so on. Before commencing
on a new game marketing may thus choose the customer segment, conduct focus
group interviews with potential users, and perform other surveys related to users.
When the game is close to being finished, it will then conduct more user tests.
To reach its goals the marketing department will also involve users when creating
a marketing strategy or settling on a name for the game.
But, while the publisher’s marketing department may investigate issues that are
closely related to usability and user experience it does not conduct user research in
23
CHAPTER TWO • ORGANIZATIONAL CHALLENGES FOR USER RESEARCH
24
2.5 CONCLUSION
2.5 Conclusion
In many ways, the challenges UR champions encounter when striving to do user
research in the game industry are similar to the ones they would encounter if they
were developing traditional office-ware or other task-oriented systems. Such chal-
lenges include justifying the investment made in user research, creating company
work procedures that support user research, and developing professional and per-
sonal alliances with key stakeholders. However, since many third-party developers
are either owned or tightly affiliated with a publisher, some organizational chal-
lenges for user research in the game industry are, in some aspects, quite unique.
Our survey amongst eleven developers/publishers from the game industry sug-
gests that a close cooperation between a third-party developer and the publisher’s
marketing department is crucial, but also that UR champions need to pay attention
to some of the obvious dangers of doing user research in two separated camps.
One danger is that the publisher’s marketing department confuses marketing related
user research with HCI-related user research and—thinking it is all the same thing—
miss the HCI-perspective on a game, and accordingly ignores great opportunities
to link the development of a game close to potential users and to the people who
develop the game. Another danger is inefficient work procedures caused by the geo-
graphical distance and perhaps also mismatching ideas about how, when, and by
whom user research should be carried out. There will be variations as to how the
described challenges will manifest themselves in different organizational settings,
but we expect the basic mechanisms behind the challenges to be present in most
game development settings.
Because the success of user research at the developer site ultimately rests on the
UR champion’s shoulders, we have presented some key take-aways that we believe
will help anyone who is interested in conducting this work and navigate through the
most common organizational challenges.
25
CHAPTER TWO • ORGANIZATIONAL CHALLENGES FOR USER RESEARCH
2.6 Acknowledgments
We wish to thank those who took the time to participate in our study and share
their thoughts. Also, we thank the editors of this book, Erik Frøkjær and other col-
leagues for valuable discussions and comments.
2.7 References
Barr, P., Noble, J., & Biddle, R. (2007). Video Game Values: Human-Computer Interaction and
Games. Interacting with Computers, 19, 180–195.
Bateman, C., & Boon, R. (2005). 21st Century Game Design. Rockland, MA: Charles River
Media.
Benton, A.A., Kelley, H.H., & Liebling, B. (1972). Effects of Extremity of Offers and
Concession Rate on the Outcomes of Bargaining. Journal of Personality and Social
Psychology, 24, 73–83.
Bernhaupt, R., Eckschlager, M., & Tscheligi, M. (2007). Methods for Evaluating Games: How
to Measure Usability and User Experience in Games? Proceedings of the international
Conference on Advances in Computer Entertainment Technology (ACE’07), Salzburg, Austria.
Cusumano, M., & Selby, R. (1997). How Microsoft Builds Software. Communications of the
ACM, 40(6).
Frøkjær, E. (1987). Styringsproblemer i det offentliges edb-anvendelse. Politica, Tidsskrift for
Politisk Videnskab, 19(1), 31–56.
Furniss, D., Blandford, A., & Curzon, P. (2007). Usability Work in Professional Website
Design: Insights From Practitioners’ Perspectives. In E. Law, E. Hvannberg, & G. Cockton,
Maturing Usability: Quality in Software, Interaction and Value (pp. 144–167).
London: Springer.
Gould, J.D., & Lewis, C. (1985). Designing for Usability: Key Principles and What Designers
Think. Communications of the ACM, 28(3), 300–311.
Gould, J., Boies, S., & Ukelson, J. (1997). How to Design Usable Systems. In M. Helander,
T. Landauer, & P. Prasad, Handbook of Human-Computer Interaction. New York, NY, USA,
Elsevier Science.
Grudin, J. (1991). Interactive Systems: Bridging the Gaps Between Developers and Users.
Computer, April issue, 59–69.
Grudin, J., & Markus, M.L. (1997). Organizational Issues in Development and Implementation
of Interactive Systems. In M.G. Helander, T.K. Landauer, & P.V. Prabhu, Handbook of
Human-Computer Interaction, (second ed. Vol. 1, pp. 1457–1474). Amsterdam: Elsevier
Science B.V.
Gulliksen, J., Boivie, I., & Göransson, B. (2006). Usability Professionals—Current Practices
and Future Development. Interacting with Computers, 18, 568–600.
Helander, M., Landauer, T.K., & Prabhu, P.V. (1997). Handbook of Human Computer
Interaction. New York, NY, USA, Elsevier Science.
Iivari, N. (2006). “Representing the User” in Software Development—A Cultural Analysis
of Usability Work in the Product Development Context. Interacting with Computers, 18,
635–664.
Juul, J. (2005). Half-Real: Video Games Between Real Rules and Fictional Worlds. Cambridge,
MA: MIT Press.
26
2.7 REFERENCES
27
CHAPTER
THREE
Interview with Tobi
Saulnier, Founder and
CEO of 1st Playable
Productions
Interviewer:
Katherine Isbister
Tobi Saulnier,
after earning a B.S., M.S., and Ph.D. degrees in
Electrical Engineering from Rensselaer Polytechnic
Institute, spent five years overseeing product develop-
ment at respected game developer Vicarious Visions
before founding 1st Playable Productions. At VV, she
delivered over sixty game titles ranging from Blues
Clues GBC to Doom III Xbox, establishing a track
record of being able to build and train diverse teams
to deliver high-quality games on time. She led a product development team that
grew over five years to ninety artists, engineers, designers, and project manag-
ers, as well as a number of established subcontractors. Tobi is active in the game
industry, a frequent speaker at industry conferences, and has delivered seminars
on topics ranging from kid testing, to IP rights, to the application of new soft-
ware processes to improve industry quality of life through structured planning and
29
CHAPTER THREE • INTERVIEW WITH TOBI SAULNIER
Tobi, you run a small game company (1st Playable), and some small studios say
they have no time for usability. How is it that you find the time, and why?
That’s an interesting question, because I have never considered the alternative.
To me it is such a waste of one’s time to make something your intended audience
doesn’t enjoy or can’t use. I have learned that while you can develop some intui-
tion for what an audience needs or wants, you don’t really know until you put it
in front of them. There’s always a surprise awaiting you. What’s more, its a great
way to solve debates of opinion about what the player wants—like so many situ-
ations the best thing to do is try to use data to make decisions, whenever you can
collect or find some.
30
CHAPTER THREE • INTERVIEW WITH TOBI SAULNIER
What sorts of interesting outcomes/lessons have you gotten from doing usability?
Usability *always* affects our design. Sometimes it just helps us fix some user inter-
face or player information problems; other times it has caused us to entirely throw
out an approach and rethink the basics. One example is a game for four-year-old
girls, which started with some cool gesture-type mechanics. But once we had the
game in front of players we found they just wanted to fly around and make spar-
kles, and a four-year-old likely wants to scribble without a specific required shape.
So the game embraced that aspect and took out the gesture aspects. After all, you
really can’t explain to a four-year-old what they should or should not like, or even
what they can or cannot do (unlike an older audience who is going to be somewhat
responsive to instructions). Another example is a game design we were develop-
ing for non-gamer women over twenty-seven. It was eye-opening to see their vari-
ous responses to the challenges and feedback of some other games. Situations we
might think of as motivating or exciting (time pressures, game responses to failure),
quickly became frustrating or upsetting.
Would you recommend doing usability to other small studios? If so, why? In what
situations?
Yes! Always! The overhead of implementing of play testing program is quite small,
for the value you get back for your game. Furthermore it’s a great way to be an
ambassador to your community, letting them learn more about this exciting media.
Are there types of usability, or times for doing usability, that you would recommend
against? Why so?
There are two types of usability we avoid, for different reasons. For one, we don’t
have the scale or resources (or knowledge) to do statistical studies on usabil-
ity, where you are trying to determine what percent of players like what aspect.
Microsoft is a leader in this area, and many larger companies could probably afford
this, but not a small studio. I have used this approach in other industries, and my
takeaway was it’s better to not do this at all than to do it wrong.
The other type is focus-group testing, which is when you bring in a group of
similar people at one time, have them play the game in that group setting, and
then gather their responses as a group. Focus-group testing is used fairly often by
our publishers but is subject to many biases in data due to the group setting. You
can get wildly different results just based on the interplay of personalities and the
approach to moderation. Focus groups are good for some aspects of game devel-
opment, but are more likely to steer you wrong or just provide garbage data, in
game usability.
If someone wanted to get started with this, where would your recommend that they
turn? (books, first steps, etc)?
There are all kinds of resources on this topic, and often roundtables or other ses-
sions at conferences like the Game Developers Conference.
31
CHAPTER THREE • INTERVIEW WITH TOBI SAULNIER
32
CHAPTER
FOUR
Games User Research
(GUR): Our Experience
with and Evolution of
Four Methods
George Amaya has been doing usability work since
1989. He has a Ph.D. in cognitive psychology with a
minor in social psychology from Wayne State University
in Detroit, Michigan. He started in usability during grad-
uate school, working as an intern at General Motors.
Upon graduation he accepted a teaching position at
Seattle University and moved to the Pacific Northwest.
Since then he’s had the opportunity to work on many
different products at a few different companies, leading
to his current role as a user researcher for Microsoft Game Studios. His focus in
recent years has been casual and social/party games.
35
CHAPTER FOUR • GAMES USER RESEARCH (GUR)
36
CHAPTER FOUR • GAMES USER RESEARCH (GUR)
37
CHAPTER FOUR • GAMES USER RESEARCH (GUR)
● We’ll discuss some of the unique challenges and opportunities when performing
research on games.
● We’ll review a set of case studies of the application of research on games. For
each method we will discuss
● the context
● the research problem/question
● the research method for approaching the problem
● learnings for both the method, and the game design implications
● the broader context for this method and future developments
● We’ll conclude with our take on the state of user research on games and its
future.
38
4.2 THE OPPORTUNITY AND CHALLENGE OF GAMES RESEARCH
gaming must focus both on what users do and how they feel about what they are
doing. In contrast, productivity applications can focus primarily on usefulness
(does the application support required tasks) and usability (what are the costs,
ease of learning, efficiency of use, minimizing errors). For discretionary products
like games, these factors are secondary and only relevant to the extent that they
affect the likelihood of a rich and engaging experience. That is, the user’s aes-
thetic/emotional experience is paramount. While such considerations apply to
all consumer products (for example, cameras, cell phones, etc.), consideration of
the user’s experience is primary for games since their sole purpose is to provide
a positive emotional experience to the player.
2. Games are a highly competitive space. Several truisms reinforce this conclu-
sion. Most games (almost 80 percent) lose money. A few games (20 percent)
make most of the money (80 percent). Successful games are highly profitable.
For example, Goldeneye, the movie, cost $10 million to make and generated $200
million in gross revenue. Goldeneye (1997), the game, cost somewhere around
$25 million to make and generated $200 million in revenue. Games can also
drive additional sales ranging from specialized controllers to action figures.
3. There are many game studios and publishers. Unlike most other markets for
software (business applications, databases, etc.) in which a few producers have
established a dominant position in each market, there are many suppliers and
publishers of games. While the games industry is consolidating rapidly, it lags
behind most other software markets in which two or three suppliers dominate.
4. Games are extremely complex. Many modern games push the technology enve-
lope in terms of graphics, artificial intelligence, and system integration. For games
played over the internet, the games can demand a very high level of performance.
5. Games are very popular. According to Entertainment Software Association (ESA,
2007) 67 percent of American heads of households play videogames. Twenty-four
percent are over fifty. Thirty-eight percent are female. Internationally, the popu-
larity of game can also drive national network infrastructure. Some writers have
attributed the development of high-speed bandwidth in Korea to the popularity
of computer games such as Starcraft.
6. Games are big business. In 2006, the revenue for video and computer game sales
combined was $7.4 billion (ESA, 2007). These numbers do not include hardware
sales for consoles. The growth in games has been strong for the past ten years
increasing from 2.6 billion (1996) to today’s figure. Game sales from last year
have outpaced the growth of the U.S. national economy and have eclipsed film
box office sales for a number of years. In addition to new consoles, games are
increasingly appearing on new platforms such as cell phones.
7. Games have an increasingly wide range of application. Games are being con-
sidered as a platform not only to teach specific skills but also to teach transfer-
able skills, and impart values and attitudes.
39
CHAPTER FOUR • GAMES USER RESEARCH (GUR)
8. The essence of a game rests in creative design. Games do not attempt to auto-
mate an existing task the way many productivity products and applications do.
Instead, they attempt to create unique, novel, and appealing experiences for
their customers and users.
9. Gaming has a development cycle that is well-adapted to user research. A
typical development cycle can begin by creating tools to build the game. Then
the game is built in bits and pieces until a “playable” build is created. Once a
playable build exists, the team focuses on refining and tuning the game play.
This tuning period is an ideal time to conduct user research and a window
during which findings can be readily incorporated into the game play. In con-
trast, it is all too often the case for productivity applications that as soon as the
functionality is working and the product relatively crash free, management will
press to ship the product. So, in this respect, games development represents a
more fertile environment for user research.
10. Most game developers and designers are dedicated to providing a great user
experience. One of our design partners put it well when asked what convinced
him to work with user research—“we needed to know you cared as much about
the game as we did.” By “the game” he meant that the game is great for the play-
ers. Every studio we have ever worked with has been dedicated to their vision of
a great user experience. Unfortunately, all too often the designers and developers
of productivity products are satisfied when the functionality is present.
11. Gamers demand novelty. While there are exceptions, people who use produc-
tivity applications rarely welcome innovation in UI design. They know how to
do their work using a given tool and a new interface often represents both a
learning cost and a productivity risk. In contrast, gamers welcome new chal-
lenges and new capabilities.
12. There is an excellent framework for considering how user research can con-
tribute to games. Commercial games research does not have to live in a vacuum.
The Mechanics, Dynamics, Aesthetics (MDA) framework developed by Mark
LeBlanc (Hunike, Leblanc, & Zubek, 2007) postulates the experience of games is
based the game mechanics—the elements of the game, the rules by which they
interact and the goals of the game. As he put it, this is “what you can put in the
box” after that it is out of your hands. When the user interacts with the mechan-
ics he/she produces dynamics. These are patterns of behavior. These emerge from
the interaction of user with the game and the combination of game mechanics.
They are influenced by the player’s history and expectations. As the user plays
the game he or she will draw conclusions about the game. These conclusions
can take many forms. They could be judgments about the game, such as “this
game is too easy.” They also represent “emotional” conclusions, like “I am hav-
ing fun,” “I am frightened,” “I’ve accomplished something.” This framework is a
good starting point for thinking about research in games and any product that has
discretionary use.
40
4.3 RESEARCHING PLAY IN THE FIRST HOUR: PLAYTEST
41
CHAPTER FOUR • GAMES USER RESEARCH (GUR)
42
4.3 RESEARCHING PLAY IN THE FIRST HOUR: PLAYTEST
goal of Playtest focused on how to make a game better as opposed to evaluating how
good or bad a game was. Hence, early Playtesting and early iteration in a game’s pro-
duction cycle were stressed, and Playtests were focused on more experimental and
exploratory questions (for example, do users like weapon A better than weapon B;
which vehicle do users like the most, etc.) as opposed to strict evaluation.
The early pioneers of Playtest at MGS (Bill Fulton and Ramon Romero) envi-
sioned it to be less of a method and more of a research situation in which
several participants could be run quickly and cheaply. Unlike usability testing,
Playtest didn’t require a trained engineer to directly observe each individual par-
ticipant. Without that restriction, the User Research Engineer could easily, quickly,
and efficiently run numerous participants at once—enough to make quantification
of survey data possible. This was, and still is, one of the key utilities of Playtest: Fast
and efficient quantification of user opinions about a game.
43
CHAPTER FOUR • GAMES USER RESEARCH (GUR)
after the Playtest and a full report within a week. The full report is much more
extensive than the preliminary report and includes a point by point reference to
each of the issues uncovered in the Playtest as well a set of recommendations
from us on how to fix those issues in the game. Usually, after consulting with the
team and debrief sessions with the full report, we agree upon fixes and wait for
implementation into the game. Often, after the fixes have been implemented, we
will test again.
44
4.3 RESEARCHING PLAY IN THE FIRST HOUR: PLAYTEST
45
CHAPTER FOUR • GAMES USER RESEARCH (GUR)
been developing methods for Playtesting multiplayer game modes. These methods
have also been buoyed by our TRUE instrumentation (see Chapter 15 of this book
for an in-depth discussion of TRUE instrumentation).
Over the past decade, we have been continuously refining and improving our
Playtest methodology. The processes have changed, survey questions have changed,
scales have changed, and types of analyses have changed. But at the core, Playtest
remains the same as in its early instantiation; Playtest gives us the ability to get data
from users quickly, cheaply, and easily, in large enough groups to make quantifica-
tion of user opinions possible. It gives us a reliable way to get evaluative feedback
from users on various aspects of a game and allows us to compare user ratings of
one game to another. In the end, Playtest gives us a very powerful tool with which
to help improve games in development, before they are released into the real world.
For additional examples of how Playtest research was used to improve games, see
Pagulayan, Steury, Fulton, & Romero (2003) and Davis, Steury, & Pagulayan (2005).
46
4.4 RESEARCHING SOCIAL/PARTY GAMES
While providing support for an Xbox 360 trivia game called Scene It? Lights,
Camera, Action, we had an opportunity to address these questions. We learned that
it is, indeed, very important to test using groups of people rather than individuals,
and that traditional usability testing methods must be adjusted to be more effective
in a group testing setting.
47
CHAPTER FOUR • GAMES USER RESEARCH (GUR)
FIGURE 60%
4.1
49%
50%
40%
37%
30%
30%
26%
23%
18%
20%
10% 9%
6%
3%
0%
0%
Not fun 2 Somewhat fun 4 Very fun
Solo Group
FIGURE 80%
4.2 75%
70%
60%
54%
50%
40%
30%
23%
20%
14% 13%
11% 9%
10%
2%
0% 1%
0%
Much too slow 2 About right 4 Much too fast
Solo Group
48
4.4 RESEARCHING SOCIAL/PARTY GAMES
social/party games required us to answer two important questions. What group size
did we need for our research? Should these be groups of strangers or people who
know each other?
We found that the best group size for a research project depended heavily on
the type of game we were studying. A game like Scene It? Lights, Camera, Action
could, potentially, be played by a very large number of people at a party, depend-
ing on how many controllers they have and whether or not they form teams. We
decided to test groups of three to four as well as groups of six to eight players. The
three-to-four-sized groups represent people who are playing the game without
teams, while the six-to-eight-sized groups allowed us to observe the dynamics of
team play (Scene It only supports four player entities but many more controllers, so
once you have more than four controllers, they are limited to being on a maximum
of four teams). Studying team play was very helpful to the game designers because
it allowed us to learn about how players shared the controllers, how they communi-
cated with their teammates, and how the different teams interacted with each other.
Karaoke games, like Singstar, have different group dynamics than Scene It. Our
observations of people playing karaoke games revealed that groups with less than
four people are not as animated or interactive as groups with four or more. The
smaller groups are also quieter and less likely to move around. The nature of the
genre seems to be such that people are much more comfortable singing with larger
groups of people than they are in smaller groups. The average group size when
playing these games in real life is six people, according to our participants. As a
result, we decided to recruit groups of four to six people for any singing game stud-
ies, and also for any games in which social interactions are critical for the gaming
experience.
Karaoke games typically allow two to play at any given time. This means that
the remaining two to four people we recruited for the test session were sitting and
watching someone else play. A fun social/party game is often one which entertains
both players and observers, or facilitates entertaining social interaction so that these
people can entertain themselves. We decided that these observers are an impor-
tant part of the social/party game experience both from an ecological validity and a
game design perspective, and their behaviors may add to or detract from the fun of
the game.
49
CHAPTER FOUR • GAMES USER RESEARCH (GUR)
cheering, jumping up and down, etc. Players grouped with strangers are gener-
ally more reserved.
3. Emergent behaviors are more likely to be observed when the players are relaxed
and with their friends.
4. Players are more likely to speak out loud with people they know than with
strangers.
Once we decided to focus on recruiting groups of people who knew each other,
we found that the best way to do so was to differentiate between core players and
cohort groups. The core player is the person who the recruiters call. This person
must meet all recruiting criteria. We then ask the core player to come up with a list
of friends/family/acquaintances to act as the cohort group. The list of criteria for
the cohort group is usually much more lenient than those used for the core player.
The core player is like a party host, while the cohort group is the set of people the
core player has invited to the party.
50
4.5 RESEARCHING PLAY IN THE “REAL-WORLD”: BETA
Free play at the beginning of a test session has turned out to be a good way to
allow the test group to settle down and adjust to the test environment. For example,
when testing music games like Singstar, we found that allowing the participants
to choose their own play mode and song in the beginning helped them relax and
adjust to the lab environment. Once everyone had a chance to sing a song of his
or her choice, the moderator could then ask them to play using specific modes or
songs that we were interested in observing.
Interrupting a group during a play session with questions about their expe-
rience or comments very often seems to distract the group from the game play
and decreases their comfort with the lab setting. Such interruptions also seem to
decrease the frequency of the banter that occurs when groups of acquaintances play
a game. We decided that the comfort of the group and the banter were important
data sources, so all of our moderators were instructed to note any questions that
came up during the play session and ask them after group play was over. When we
were done with our testing the researcher would go into the participant room and
facilitate a discussion of questions and issues, very much like a focus group. We
have found that spending time on discussion at the end of the test session allows
the group to play comfortably without interruption, while still providing us with the
data that we needed from the test session.
51
CHAPTER FOUR • GAMES USER RESEARCH (GUR)
in a massively multiplayer online game you may need to study many permutations
or combinations of variables, often over an extended period of play. Sometimes you
need many players interacting simultaneously to test matchmaking processes or
other aspects of the game that require a critical mass of players. Beta testing can be
very useful in these circumstances.
A beta version of a product (sometimes in the productivity world called com-
munity technology previews) is usually the first public, though often restricted,
release of the product to consumers. Developers typically release beta versions of
products towards the end of the development lifecycle, when the software is nearly
feature complete but still needs fine-tuning. Typically, the test (or QA) departments
on development teams have been responsible for running and collecting feedback
during beta programs. Test teams employ betas to investigate hardware/software
compatibility, to identify difficult to find bugs or other technical issues with the
software. Putting prerelease versions of a product into the public’s hands can help
assess compatibility with a much wider range of computer hardware and software
configurations than testers could reasonably replicate in a lab. Even a large test
team cannot cover the same ground that hundreds or thousands of beta testers can.
While the public beta testing of PC games and productivity software is relatively
common, the beta testing of console games is relatively new. Newer still is the use
of beta not just for finding bugs, but for improving the core experience of the game
itself, through either the tweaking of game mechanics or for game play balancing,
to ensure that the game is neither too hard nor too easy. At Microsoft Game Studios,
the user research team manages the beta program and we are developing methods
to make beta a valuable consumer feedback tool, like usability and Playtest, for
improving games. There are, however, unique benefits and challenges when collect-
ing consumer feedback in a beta test. In this section, we describe some of these and
present a short example of using beta on a game in development.
Besides the number of players you collect feedback from, beta testing differs in
several other ways from the methods of collecting consumer data that we have dis-
cussed. First, the game has to be more stable. Because the game is provided to play-
ers outside of our testing facilities, usually downloaded from a secure website or
through Xbox Live, we have much less control over the testing environment. This
means that the game should not crash very often and there should be few bugs that
frustrate or confuse players. Further, because it is the first exposure of the game to
the public, it has to be sufficiently polished and fun to play. For these reasons, beta
testing usually occurs late in the development cycle.
Second, because the testing occurs in the players’ home, not in the Playtest or
usability labs which are onsite, it is more difficult to collect qualitative feedback.
Therefore, when analyzing the instrumentation data we get from beta participants
we often do not have a lot of the context for interpreting the data that we can collect
during in-house testing.
Third, because the testing occurs outside the lab, we have less control over the
test itself. We have no control over how (or even whether) the beta participants
play the game. In-house Playtesting enables us to provide instructions and monitor
52
4.5 RESEARCHING PLAY IN THE “REAL-WORLD”: BETA
53
CHAPTER FOUR • GAMES USER RESEARCH (GUR)
FIGURE 70%
4.3
60%
50%
40%
30%
20%
10%
0%
Submachine Sniper Rifle Shotgun Katana Minigun
gun rifle
Weapon
Further, participants would not be able to report much of the data we needed, such
as how far they were from opponents they “killed,” how often they shot a particular
weapon, etc. To collect this data, we used a system that automatically recorded par-
ticipant behaviors (Schuh et al., 2008). To collect data relevant to our question about
weapon use and effectiveness, the game automatically logged every time a player
purchased a weapon, every time a player scored a “kill” with a weapon, as well as
logging the coordinates of the players in the game world when these events occurred.
When examining the data at various intervals during the beta test, we discovered
that the majority of time the players were purchasing only one gun, the subma-
chine gun, and there were some weapons that players rarely ever purchased. Could
this be a matter of players not having learned to use the other weapons? Were the
other weapons too expensive? To begin answering this question, we broke down the
analysis of weapon selection by the number of games a participant had played. As
shown in Figure 4.3 (Weapons purchases by games played), the pattern in weapon
preference was consistent across different experience levels. In fact, it appeared that
the preference patterns for the weapons were even more exaggerated as players
had more experience with the game. This allowed us to rule out inexperience with
the game as a plausible explanation for purchase patterns and effectiveness with
particular weapons.
Was the submachine gun overpowered, or perhaps it was more popular because
they were cheap to purchase? To investigate further, we calculated a kills-to-pur-
chase ratio, the number of “kills” logged for that weapon divided by the number
of times players purchased that weapon. As shown in Figure 4.4 (Kills-to-purchase
54
4.5 RESEARCHING PLAY IN THE “REAL-WORLD”: BETA
0.5% FIGURE
4.4
0.39%
0.4%
0.3%
0.20%
0.2%
0%
Submachine Sniper rifle Shotgun Minigun Katana Rifle
gun
Weapon
Kills-to-purchase ratio
ratio), not only was the submachine gun the most popular purchase, it also
appeared to be the most effective weapon in the game. For every 100 submachine
guns purchased, there were 39 “kills,” compared to 6 “kills” for every 100 rifles
purchased. These types of analyses would be difficult or impossible using Playtest
or usability as players need extended interaction with the game.
In the end, the beta test was useful for answering the weapon balance questions
we had, as well as many other questions we had going into the beta test. In the
end, we were able to provide this data to the development team, which then tweaked
the weapon parameters to achieve their desired result—a variety of different weapons
effective in different combat situations. However, the beta test provided many chal-
lenges we had to overcome that are not typically present in usability or Playtesting.
55
CHAPTER FOUR • GAMES USER RESEARCH (GUR)
Invariably, however, our experience has been that once teams see the benefit of
consumer feedback, they wish they had started beta testing earlier in the develop-
ment cycle. One of the main benefits of early beta testing is the opportunity for
iteration. As we collect feedback from participants and designs are changed, we can
distribute new builds to verify fixes or to iterate on new issues that arise.
Second, as we move towards focusing on game design and balance in beta,
and relatively less so on technical issues such as stability and bug finding, how
we recruit and select participants for beta becomes more important. Traditionally,
beta testing has been conducted by technically savvy consumers who understand,
and can cope with, instability and less user-friendly beta builds. This makes a lot
of sense if you are primarily interested in technical feedback. However, when you
start using beta to collect gameplay feedback, such as difficulty levels, you need to
be more cognizant of who you recruit for your betas. If you want to look at balanc-
ing the game for a wide variety of different types of players, such as those with
little experience to those who play a great deal, you will need to come up with a
recruiting plan to ensure that these different gaming profiles are represented in your
participant pool.
Another consideration for participants in a beta is attrition. In Playtest and usabil-
ity, participants typically participate in one test: they come to the lab, play the game
for a few hours, and the test is complete. Depending on the type of data you are
collecting in a beta, like the weapon preference question for Shadowrun, you may
need participants to keep playing anywhere from a day to several months. Further,
some tests require a critical mass of simultaneous testers, while others only require
a particular number over the course of the beta. How you keep players engaged in
the beta and participating in the feedback process will vary, but managing the beta
community to keep participants playing the game is critical for data collection.
One of the methods we have developed to minimize attrition is to invite exist-
ing communities of gamers and friends-of-friends. We have found that having other
familiar players to game with helps maintain interest and participation in the beta.
Other tools include contests and tournaments, opportunities to play with and inter-
act with the development team, and so on. Lastly, it is very important to be respon-
sive to participant feedback. Participants who feel like the developers care about
their participation and feedback are much more likely to continue participating that
those who do not.
56
4.6 THE IMPORTANCE OF FIRST IMPRESSIONS: TRIALS AND DEMOS
through traditional retail channels (for example, selling a game off the shelf or
through an online storefront) and nontraditional ones, like the Internet (in other
words, downloading a game to a PC or playing through a Web browser).
Until recently, console-based games were sold exclusively through retail chan-
nels. However, the latest generation of gaming consoles has opened up a new
purchase path for the console audience, one in which consumers can digitally
download games and game-related content (for example, game expansion packs)
directly to their console hard drives through broadband Internet connections.
Microsoft’s Xbox Live Arcade (XBLA) was the first console-based service to allow
consumers to download and try a free, limited trial version of a game, and then
immediately purchase and download the full version if they wished. For all practical
purposes, this was the same basic model that publishers and consumers had used
for many years in the PC space.
Using this model, Xbox Live Arcade found a sweet spot with the Xbox 360
customer base and exceeded almost every business expectation in terms of sales.
A high percentage of consumers tended to buy the full version of the Xbox Live
Arcade game after they had played the game’s trial. These high “conversion rates”
far exceeded those typically seen in the downloadable PC game market, even for
XBLA games with the lowest conversion rates on the platform.
As the XBLA service matured and more and more games were made available to
consumers, conversion rates began to flatten out with instances where games that
were considered very good, in terms of consumer feedback during production and
or positive industry reviews post release, occasionally experienced low conversion
rates and sales that did not meet expectations, even though the game’s trial was
downloaded in very high volumes. Put another way, the game design was solid,
marketing and merchandising were effective in ensuring downloads but the trials
were not “closing the deal.”
Potential purchasers of retail console videogames can gather a lot of informa-
tion before they decide whether or not to purchase (for example, read online game
reviews, play games on an in-store kiosk or using a game demo disk, or rent a game
through an online service like GameFly.com or at the local Blockbuster Video).
Xbox Live Arcade customers, on the other hand, have to rely almost exclusively on
the game’s trial experience to collect information about the game to inform their
purchased decisions. Consequently, even a great game is at risk of selling poorly
if the trial experience is not good. The clear, critical connection between the trial
game experience and consumers’ purchase decisions on XBLA led to an important
question for the business: “How do we create good trial experiences for Xbox Live
Arcade games that lead to better sales?”
4.6.2 Methodology
The XBLA team understood that many factors, including the quality of the trial
experience, figure into a consumer’s purchase decision when considering whether
57
CHAPTER FOUR • GAMES USER RESEARCH (GUR)
to buy an XBLA game. Is the genre interesting? Is the game known or is it brand
new? How much does it cost? Are there other new games on the market? Was the
game marketed well? The team also understood that of all these variables, the one
that they have the most control over is the quality of the trial game.
When User Research was asked to look into the question of trial quality and
how to improve it, there were already several dozen XBLA games in the LIVE eco-
system. This already-available information gave the team the opportunity to look
at both the games’ trial experiences and sales data to understand how each game
performed in the market. The existing data allowed us to focus on particular games,
such as those with sales and conversion rates that either exceeded or fell short of
expectations.
However, before attempting to understand how and why specific games per-
formed well or poorly, we sought to better understand the elements that make a
good trial game by first hashing out an understanding of the general goals of a trial,
and then thoroughly reviewing the trial experiences for existing XBLA games. The
goals for trial games include showcasing some of the “cool” and interesting features
of the game, and giving the player enough content and features to give him or her a
sense of what the full game would be like, while not giving away so much that the
player will not be enticed to buy the full version. We have seen that in games which
tested very well, won awards and then failed to meet projected sales numbers.
While we can’t be sure of the cause, one highly plausible hypothesis is that the
demo gave away too much. Armed with a better understanding of what trials are
supposed to do and a comprehensive knowledge of the content of existing XBLA
trials, we developed a set of “trial game design heuristics.” The heuristics, which
were intended to be used as a “first pass” assessment of the quality of trial games,
were based on principles culled from publisher best practices, observations from
past usability tests and Playtests that had included game trials, longstanding usabil-
ity guidelines, and designer intuition.
Two sets of trial game heuristics were created. The first set contained questions
about the trial that were designed to be objective and unambiguously answered
for each trial game (for example, “Does the trial contain a controller map?”). The
second set of heuristics was less objective, but each heuristic contained clear
goals and examples (for example, “Is the trial’s upsell message written in a posi-
tive tone?”). The heuristics were then informally tested for inter-rater reliability by
having several user research engineers apply the heuristics to an identical set of
games. Each of the individual heuristic questions was then assessed to identify those
where evaluators differed in their assessments and modify them, remove redundant
questions and refine the descriptions and examples of how each of the heuristics
should be applied.
While the heuristics offered a way for user researchers to perform a “first cut”
professional evaluation of trial games, a concurrent effort was underway to col-
lect data from consumers to gauge their perceptions of trial experiences. To do so,
user research engineers modified existing MGS Playtesting methods to make them
more appropriate for a trial experience, which differs in important ways from a full
58
4.6 THE IMPORTANCE OF FIRST IMPRESSIONS: TRIALS AND DEMOS
game. Both full games and trial games should be “fun,” appropriately paced, and
appropriately challenging. Trial games, however, are unique in that they are also
sales tools that, like advertisements, must contain enough game content (for example,
features and gameplay) to allow the player to have a good idea about what the
full game will be like. In addition, the trial game must provide clear “upsell”
messages that describe the important (and valuable) features and gameplay ele-
ments that the full game will have that the trial does not, and an easy path to pur-
chase the full game. Modifications to our traditional Playtest questions included
paring down the number to be more appropriate to shorter trial games. We also
included questions designed to tap the trial-specific aspects of the experience, such
as whether players felt they had adequate information to allow them to make a
purchase decision and how they felt about the frequency and content of the upsell
messages.
In our trial Playtests, players were given a choice of trials that they could play as
much or as little as they wanted. After playing a trial, they then answered questions
about some basic elements of the experience (for example, fun, challenge, pace). In
addition, they answered several of the trial-specific questions described earlier (for
example, “Did the trial provide you with enough information to make an informed
purchase decision?” and “Did the upsell messages provide clear information about
what additional features were available in the full version of the game?”). The goal
was to create a tool to measure the overall quality of a trial from the consumer’s
perspective, and its effectiveness at meeting its goals as a “sales tool” for the full
version of the game.
The modified Playtest method was then piloted using ten trial games that had
already been released to market via XBLA. There were several goals for the pilot
test. First, we wanted to assess the testing method to ensure consumers were expe-
riencing and evaluating the trial games in a manner that mimicked, to the extent
possible in a lab environment, how they might at home. Additional goals of the pilot
test were to assess the specific trial games in the Playtest, extract general issues that
consistently arose in the trials and recommendations to address them, and to assess
the validity of the heuristics described earlier.
An additional goal for collecting data from consumers was to determine whether
the multiple metrics could be combined and presented to the development team as
a single score to make it easier to consume and understand. A factor analysis of the
questionnaire responses from consumers established that there were four compo-
nents composed of related items (these were Fun/Engagement; Challenge; Enough
Information; Up-sell quality/frequency). Weightings for each of the factors was
assigned based on the amount of total variance each accounted for, the weighted
factors were combined and converted to a score out of 100 (for example, Game
“Foo’s” combined trial score is 67); this score was referred to as the game’s “C-
Score” or Consumer Score. Each individual metric was also reported, but providing
the data in this fashion helped the team quickly understand the results because it
was in a format familiar to them from Gamerankings and Metacritic-measures that
are generally considered industry standards of a game’s quality.
59
CHAPTER FOUR • GAMES USER RESEARCH (GUR)
2.00%
1.00%
0.00%
Fun Pace Engagement Challenge Features Clarity of Up-sell
available goals quality
Reporting the C-Score and metrics together allowed the game development team
to quickly understand the current state of a game’s trial user experience or if the
trial had been tested multiple times, if its user experience was improving.
Figure 4.5 shows how the second iteration of a trial showed significant improve-
ment on both the core Playtesting metrics as well as the trial’s overall score.
By the end of the method development phase, the user research team had cre-
ated and refined two sets of heuristics that could be applied to any XBLA trial, a
new Playtesting method designed specifically to determine the trial’s quality, and
a composite score for trials that was easily distributed to and understood by the
development team. In addition, the team generated a set of general trial design best
practices based on the themes that emerged from the consumer data and had a
running start on creating a rich comparison database of consumer assessments of
a trial games. The next step was to continue to apply these tools to released games
in order to collected additional data on trial experience to further grow the com-
parison database, refine the consumer trial game quality metric and expand on the
trial game best practices document that can aid developers as they create trial game
experiences. Importantly, the methods were then applied to in-development games
while there was still time in the schedule to iterate on them to help ensure that their
trials were the best they could be.
60
4.6 THE IMPORTANCE OF FIRST IMPRESSIONS: TRIALS AND DEMOS
able to create a set of best practices and iteratively evaluate a game’s trial expe-
rience prior to release, it became evident that a well-developed trial could not
guarantee that a game would sell well; after all, if the game itself is not fun or
well-designed, a good trial experience is lipstick on a pig. We also discovered that
certain games seemed robust enough to overcome a non-optimal trial. Games
with very well-known intellectual property or an existing following can overcome
a poor trial. Although those types of games may sell well, they likely could have
sold even better if their trials were better; those games probably left money on
the table.
On the flip side of the games that sell well regardless of the quality of the trial,
is the reality that certain games are at a higher risk of poor sales if they have a
non-optimal trial. Specifically, original games or games that fit into more of a niche
market have a more critical need to showcase their wares in a good, if not spectacu-
lar, trial. As the trial experience is likely to be the consumers’ only mechanism to
determine the game’s “fun” or “quality,” there is very little room for error for these
unfamiliar games.
61
CHAPTER FOUR • GAMES USER RESEARCH (GUR)
4.7 Conclusion
User research at Microsoft Game Studios has evolved over the last ten years, and
we will continue to explore methodological innovations that produce reliable, valid,
and timely data that informs design and makes for exciting products. This chapter
briefly describes four of our methods at a “high” level. Our accompanying chapter
on instrumentation covers that topic in depth. Throughout our evolution we have
been guided by some simple yet fundamental tenets which we call “the golden
six.”
1. Empirical data about users is the core of our contribution to product success.
2. Partnership with designers is essential when the product goal is a compelling
experience.
3. Judicious investment in tools and techniques pays off by allowing us to generate
large amounts of empirical data in a timely way.
4. Creating, evaluating and then standardizing methods is the key to reliability,
validity, and efficiency.
5. Addressing management goals, for example, how much fun are our games and
how can we make them better is the path to long term success.
6. Methods and their output must map onto both the culture and the development
processes of your partners and your organization.
The past ten years have been highly rewarding for all of us involved in the devel-
opment of tools and methods for Games User Research. We have been especially
pleased not only by the success of MGS games but also by the recognition in the
games and popular press:
“Still the best in terms of developer support understanding how to make great
games and a strong vision. Their usability is by far the best in the industry.”
Quoted from Game Developer magazine discussing Microsoft Game
Studios in their annual publisher poll issue (Wilson, 2007, p. 12).
Halo 3 How Microsoft Labs Invented a New Science of
Play” (Wired magazine, cover story September 2007)
We look forward to the next ten years.
4.8 Acknowledgments
We want to thank all the members of the games user research group (past and
present). They all contributed to the evolution of our thinking about game evalu-
ation and research. We also want to thank Microsoft Game Studios Management,
especially Ed Fries and Shane Kim, who supported this work over the years. Thanks
62
4.9 REFERENCES
to David Holmes, our director, who has led us effectively over the last five years.
We also thank the numerous studios and designers we have worked with over the
years. We have learned much from them and enjoyed close collaboration. Finally,
we thank the thousands of people who have participated in Playtests, our modera-
tors who have actually run the Playtests, and the gamers who have loved our prod-
ucts and given us constructive feedback.
The opinions expressed here are those of the authors and do not necessarily
reflect the views of Microsoft Game Studios or Microsoft Corporation.
4.9 References
Davis, J., Steury, K., & Pagulayan, R. (2005). A survey method for assessing perceptions of a
game: The consumer Playtest in game design. Game Studies: The International Journal of
Computer Game Research, 5. Retrieved February 7, 2008, from http://www.gamestudies.
org/0501/davis_steury_pagulayan/
Ehn, P. (1988). The Work Oriented Design of Computer Artifacts. Stockholm:
Arbetlivescentrum.
Entertainment Software Association (2007). Report Essential Facts about Computer and
Videogame Industry. http://www.theesa.com/archives/ESA-EF%202007 %20F.pdf
GoldenEye 007 (1997). [Computer Software]. Redmond, WA: Nintendo Co., Ltd.
Hunicke, R., LeBlanc, M., & Zubek, R. (2001). MDA: A formal approach game design and
research. Workshop at the AAAI (American Association for Artificial Intelligence) 2001
Conference, North Falmouth, MA. Retrieved February 2, 2007, from http://www.cs.
northwestern.edu/∼hunicke/pubs/MDA.pdf
Karaoke Revolution (2004). [Computer Software]. Tokyo, Japan: Konami.
Kim, J.H., Gunn, D.V., Schuh, E., Phillips, B., Pagulayan, R.J., & Wixon, D. (2008, April).
Tracking Real-Time User Experience (TRUE): A comprehensive instrumentation solution
for complex systems. Proceedings of the SIGCHI conference on Human factors in computing
systems, Florence, Italy.
Nielsen, J. (1993). Usability engineering. San Francisco, CA: Morgan Kaufmann.
Olson, J., & Moran, T. (1995). Mapping the method muddle: Guidance in using methods
for user interface design. In M. Rudisill, C. Lewis, P. Polson, & T. McKay (Eds), Human
Computer Interface Design: Success Cases Emerging Methods and Real World Context (pp.
269–303), San Francisco: Morgan Kaufman.
Pagulayan, R., Keeker, K., Fuller, T., Wixon, D., & Romero, R. (2007). User-centered design
in games. In J. Jacko, & A. Sears (Eds), Handbook for Human-Computer Interaction in
Interactive Systems: Fundamentals, Evolving Technologies and Emerging Applications (2nd
ed., pp. 741–760), Mahwah, NJ: CRC Press.
Pagulayan, R.J., Steury, K.R., Fulton, B., & Romero, R.L. (2003). Designing for fun: User-
testing case studies. In M. Blythe, K. Overbeeke, A. Monk, & P. Wright (Eds.), Funology:
From Usability to Enjoyment (pp. 137–150), New York: Springer.
Rock Band (2007). [Computer Software]. Redwood City, CA: Electronic Arts.
Romero, R. (2008, February). Tracking attitudes and behaviors to improve games: Successful
instrumentation. Presentation at the annual meeting of the Game Developers Conference,
San Francisco, CA.
Scene It? Lights, Camera, Action (2007). [Computer Software]. Redmond, WA: Microsoft.
63
CHAPTER FOUR • GAMES USER RESEARCH (GUR)
Schuh, E., Gunn, D.V., Phillips, B., Pagulayan, R.J., Kim, J.H., & Wixon, D. (2008). TRUE
instrumentation: Tracking real time user experience in games. In K. Isbister, & N. Schaffer,
(Eds), Game Usability: Advice from the Experts for Advancing the Player Experience (pp.
235–263), San Francisco: Morgan Kaufmann.
Shadowrun (2007). [Computer Software]. Redmond, WA: Microsoft.
SingStar (2004). [Computer Software]. Foster City, CA: Sony Computer Entertainment.
StarCraft (1998). [Computer Software]. Irvine, CA: Blizzard Entertainment.
Thompson, C. (2007). Halo 3: How Microsoft Labs Invented a New Science of Play. Wired,
15, 140–147. September.
Wilson, T. (2007). Top 20 Publishers. Game Developer Magazine, 6–16. October.
64
CHAPTER
FIVE
Let the Game Tester
Do the Talking:
Think Aloud and
Interviewing to
Learn About the
Game Experience
Henriette (Jettie) C.M. Hoonhout is a senior scientist
at Philips Research in Eindhoven, the Netherlands.
Her research focus is on user interaction technologies
(including applications in electronic games and toys)
and user-centered research methodologies. Games and
toys have her interest because of their captivating and
motivating power, as a potential model for interaction
design of other consumer electronics applications, and
as an inspiration for modeling enjoyment. She graduated
with a degree in cognitive psychology at the University of Utrecht (the Netherlands),
and after graduation, started working in the experimental psychology department
of that university. She was involved in various contract research projects in the field
of process control (including work on simulation tools to train operators of chemi-
cal plants), HCI, and design of instructions. Next, she worked at the University of
Maastricht, in the psychology department. Her work involved, among other things,
65
CHAPTER FIVE • LET THE GAME TESTER DO THE TALKING
developing parts of the psychology curriculum, in particular for the cognitive ergo-
nomics program.
5.1 Introduction
When conducting a usability test, independent of the product (game) or service that is
being tested, the researcher ideally would like to be able to look “into the head” of the
participants in the test: what are they thinking, what is their reasoning to select par-
ticular options and ignore others, what attracts their attention, and what not, how do
they interpret the different labels, colors, icons, and other elements being presented
to them? Current technology, unfortunately, does not (yet) provide us with the means to
have a detailed record of the cognitive processes taking place in participants’ brains
while interacting with a device that would allow one to answer such questions.
However, there are ways to catch at least a glimpse of what might be going on
inside the head of a participant. Asking participants to think aloud while working
through the tasks presented to them in a test situation is one approach often used
in usability testing. The reports of such verbalizations are called verbal protocols.
Alternatively, the researcher could also interview participants about their experiences
during the test, usually after having completed the test tasks. The basic connection
between the two approaches is that in both cases the participants are asked to
verbalize their experiences, providing annotations on their interaction with a device.
Typical issues that the researcher may want to address in a test of a game are:
● Does the game pose an interesting and adequate challenge to the intended target
group of players (Malone, 1982; Prensky, 2002; Fontijn and Hoonhout, 2007a)?
Not too difficult, nor too easy? And is the challenge to be found in the game con-
tent, rather than in the game controls? The latter would almost certainly indicate
a usability issue.
● Will it stay challenging until the end of one test session? And will it remain chal-
lenging after playing it for a long time, say for several sessions?
● Finding out whether an application is fun, or not, is only part of the answer;
more important is how did the different elements in the application contribute to
the experience (Malone, 1982; Fontijn and Hoonhout, 2007a)?
● How easily and effortlessly can a player learn how to “work” with the applica-
tion? Can the player easily grasp what is expected from him or her? How much
support is needed?
● In case of multiplayer games, how does the (social) interaction develop? Which
elements in the application support social interaction, abd which interfere with it?
● Is use of the game controls easy and not hindering game play?
Basically any question that has to do with usability aspects of the game inter-
face can be addressed in verbal protocol analysis. However, verbal protocol analysis
66
5.2 APPLICATION OF THINK-ALOUD
seems to be much less suitable to address the level of enjoyability of the game,
to investigate the potential engaging power of a game: Having to think aloud is
“killing” the experience, or at least changing it. Thus, participants might indicate
what they like about the game, and what potentially might thrill them, but they
will not be able to have the full experience and talk about it at the same time. This
means that any test of a game will require a multi-method approach, with methods
addressing different aspects of the game experience.
Think-aloud and verbal protocols (re)gained widespread attention after a publi-
cation from Ericsson and Simon (1984). They clearly stated in their framework, in
which they make a distinction between various levels of verbalizations, that any
inferences that the participant makes about their own cognitive processes or opin-
ions should not be considered for analysis, because these would not represent reli-
able data. Instead, only verbalizations that refer to what the participant is attending
to, and in what order, should be considered.
Nisbett and Wilson (1977) had already earlier collected data that suggest that
humans are not very good at reporting the deep underlying factors that influence
their decisions. They concluded that humans might be aware of the outcomes of
their cognitive processes (for example,“I find this game boring”), but might not be
aware of how they came to this judgment. With respect to verbal protocols Ericsson
and Simon stated that this implies that these will mainly contain observed facts and
results of decisions that then can be interpreted by the researcher, in order to come
to an idea of the possible underlying cognitive processes.
Think aloud as applied in the usability evaluation context has in most cases
diverted from Ericsson and Simon’s notions of how to conduct such studies; Nisbett
and Wilson’s recommendations are also only rarely heeded (Boren and Ramey, 2000).
However, originally verbal protocols were applied in experimental cognitive psychol-
ogy studies to study, for example. problem-solving tasks, in order to learn about the
underlying cognitive processes. In evaluating systems and interactions, other issues
are addressed (Boren and Ramey, 2000), ranging from aspects that Ericsson and col-
leagues would consider reliable verbalizations (what is the participant attending to, in
what order is the participant using the device and conducting the task), to aspects that
are not in the scope of Ericsson’s framework, but that are highly important for product
evaluation, such as appraisals and opinions about interface elements. And even though
verbal protocols might be less suitable for asking participants to describe in detail the
cognitive processes that they employ during game interaction (in other words, asking
them to make inferences about their own cognition), in a usability evaluation context
adopting a less strict approach to think-aloud studies than advocated by Ericsson and
Simon does result in usable and useful data about important product aspects.
67
CHAPTER FIVE • LET THE GAME TESTER DO THE TALKING
verbalize their thought process while they proceed. In many cases, participants will
be asked to think aloud during task performance, usually called concurrent think-
aloud. However, since there have been concerns about the possibility that thinking
aloud while working on the task might influence task performance, change it, or may
distract users from proper task performance, some researchers advocate the use of
retrospective think-aloud. Typically, participants carry out the tasks and their interac-
tion with the game, while their behavior is being recorded on videotape; then next,
in the second part of the session, the participants watch this video recording and
try to verbalize the thoughts they had during the interaction. How easy or difficult
this will be for the participants will depend for example on the length of the ini-
tial session, or on the speed of the interaction. Comparing both approaches, both
have benefits and drawbacks. Studies have indicated that both approaches produce
comparable usability evaluation results in terms of number and relevance, although
concurrent think-aloud seems to lead more often to the “detection” of problems that
can be observed as well, with the verbalizations underlining or explaining these
problems, and retrospective think-aloud resulting in slightly more often revealing
problems that are not observable, thus addressing issues that can only be detected
through verbalizations of the participants (Van den Haak, de Jong, and Schellens,
2004). With regard to testing games, retrospective think-aloud seems to be the pre-
ferred choice, in order to preserve the experience during game interaction, and col-
lect feedback on that game experience that cannot be inferred from observations
alone; however, researchers may want to opt for concurrent think-aloud when usa-
bility of the game interface is of primary concern, because this is likely to point out
more detailed aspects in the interface that hinder or help the player compared to the
number of aspects and details that participants will be able to recall in retrospective
think-aloud. Of course, one should take into account that retrospective think-aloud
will result in a substantial increase in test session duration.
In all cases, it is important to think carefully about which participants to select
for the test; they should be representative for the target group of the game to be
developed, which is the most important criterion. But it can be helpful to recruit
people who are relatively at ease with thinking aloud while performing a task. For
the first tests of a game, it can be very welcome to invite participants who have had
some experience with taking part in such usability tests.
Verbal protocols are an appropriate tool to let participants describe in which
order items are considered, how they approach the interface, and the deliberations
of the participant while using the interface (“what is the next step I should take?
What would that button do?”). Verbal protocols potentially provide a rich source
of data, and can offer very useful insights into the cognitive processes that guide
the interaction of participants. And although it is a time-consuming activity, it is
relatively easy to learn how to conduct. Also, no specialist devices are necessary—a
recording device (preferably video recording, in order to capture the device and the
setting in which the tasks are performed as well), and spreadsheet software for the
data analysis would already be sufficient. Software packages are available that could
further support the analysis of the protocols.
68
5.3 LIMITATIONS OF THINK-ALOUD
69
CHAPTER FIVE • LET THE GAME TESTER DO THE TALKING
Thinking aloud is for most people an uncommon behavior, and it might make the
participants feel awkward. To overcome this awkwardness, the researcher could con-
sider letting people work together, although that also depends on the type of game
as to whether or not this is at all possible. In tests with children, this is a commonly
used approach (Als, Jensen, and Skov, 2005). Whether or not such an approach will
work also depends on the composition of the team—if the participants differ greatly
in experience and skill related to the application that is being tested, verbalizations
might not at all result in an account that is useful for evaluating the interface. The
verbalizations might quickly become more like instructions that a teacher might give
to a pupil than a joint exploration of the device. Similar problems might arise if the
participants differ greatly in verbal skills, or if one participant is much more domi-
nant in social situations than the other(s). However, research has also indicated that
pairs of children who are acquainted perform significantly better in detecting prob-
lems, both in amount and severity of the usability issues (Als et al., 2005).
70
5.5 ALTERNATIVE APPROACH
during game play, then the categories should reflect aspects related to social beha-
vior, such as communication, negotiation, turn taking, etc. In order to allow consis-
tency in the analysis, it is important to provide sufficiently detailed explanation with
the categories to ensure that the analyst can check during the analysis if the raw
material is still processed according to the same rules, in order to come to a consis-
tent processing (and hence ensure intra-rater reliability). It is good practice to do a
pilot analysis, and revise/extend categories as found necessary.
The process starts with dividing the raw material into chunks, followed by divid-
ing the chunks over categories; then in a later stage the meaning of the different
categories is analyzed. The researcher could also look at sequences of chunks that
repeatedly occur.
The best approach (but also the most resource-intensive) is to have more than one
researcher analyze independently of each other the raw material, and then compare
analyses. If the analysis process has been well prepared, and the different raters
have been well-briefed and trained, inter-rater reliability should be sufficiently high
(see for example Landis and Koch, 1977, for a discussion on inter-rater reliability
scores). The next step is then to analyze the structure of the processed results
(for example, frequency of occurrences etc.), possibly linked to data collected via
other means.
An excellent handbook on how to design and conduct verbal protocol studies, with
many examples, is published by van Someren et al. (1994). Although this handbook
is approaching verbal protocol studies from an experimental cognitive psychology
viewpoint, and not from the viewpoint of usability studies, it contains much practical
advice, and ample examples on how to process and analyze protocols.
71
CHAPTER FIVE • LET THE GAME TESTER DO THE TALKING
competitive behavior. But also other aspects of the interaction with the toys could
be studied based on the utterances, in combination with the observation data.
In another study (Fontijn and Hoonhout, 2007b), we again did not specifically
ask the participants in the test of a game to think aloud continuously, but still
recorded all comments uttered by them. Also in this study, the participants tested
and played the games in pairs. Their comments during the gameplay were used to
collect feedback on potential unclear elements in the game interface, issues with the
game controls, and more generally feedback on the enjoyability of the game. And
again, these data were combined with data collected in a closing interview, ques-
tionnaire data, and an analysis of observed behavior.
Independent of the approach adopted around verbalizations—the more for-
mal approach as advocated by Ericsson and colleagues, the modified approach as
often seen in usability testing (Boren and Ramsey, 2000), or an informal approach
as described above—it is very prudent to combine any of these approaches with
other techniques, such as questionnaires, observations, logging of game interactions
and interviewing. Several of these techniques are described elsewhere in this book.
But because of the link with verbal protocol analysis approaches, especially when
it comes to analyzing the raw data, interviewing will be briefly discussed in the
remainder of this chapter.
5.6 Interviewing
One of the most natural things to do after a usability test is ask participants how
they feel about the experience, which is basically the key issue researchers want to
address and is, in fact, the start of any closing interview. Conducting an interview is
a flexible means of gathering information about the experience the participants just
had, about their opinion regarding the application, about previous experiences and
how this one compares to earlier ones, about their perceptions, attitudes, thoughts,
ideas, etc. It is also an opportunity to complement data collected via other means,
for example, observations collected during a usability test, verbal protocol, logging
of system use. Interviews allow the researcher to collect potentially a rich set of
qualitative data, regarding opinions, attitudes.
The one-to-one character of an interview session, enabling direct and interactive
contact with a participant, results in both benefits and risks. Interviews collect data
on the participant’s individual concerns, and let them voice their ideas, opinions, and
issues. Any mistakes and misunderstandings in the question-answer flow can be more
easily detected and corrected. Participants can be asked to further elaborate on answers
that are not completely clear to the interviewer, or that are interesting enough to war-
rant more detailed treatment. An interview usually provides ample opportunity to
address many facets of a topic, and discuss them in some depth (although a researcher
might be tempted to bring up too many topics, resulting in a too-lengthy interview
duration, and a too-confusing mix of topics). However, the success of an interview ses-
sion largely depends on the skills and experience and preparation of the interviewer.
72
5.6 INTERVIEWING
73
CHAPTER FIVE • LET THE GAME TESTER DO THE TALKING
be addressed. If the objectives of the interview are not clearly worked out, the ques-
tions being formulated may lack coherence, and the overall structure of the inter-
view is bound to miss direction, potentially resulting in data that turns out to be
useless after all. In all cases, it is useful to conduct a pilot trial of the test and the
interview, with two to three participants, to see how “workable” the procedures are.
The interview should be conducted in a comfortable, quiet location, preferably
with the game interface within reach, in case the participant wants to illustrate a
comment. In addition to recording the interview (audio, but preferably video, to cap-
ture non-verbal behavior as well), it is good practice to take notes as well, which can
be later used in the analysis of the material. The recordings can then be processed
as described earlier for verbal protocols.
Processing and analyzing interview data can be cumbersome and is time-
consuming. Basically, the same process as described under analyzing verbal proto-
cols could be adopted, that is, transcribing, dividing into chunks, categorizing, and
then analyzing.
74
5.7 DISCUSSION AND CONCLUSION
And quite often, data collected in an interview appear to be in conflict with data
obtained via other means. For example, participants might indicate in the interview
that they thought that the interface was easy to work with, whereas the observation
data clearly indicate otherwise. One should be careful to confront participants with
such inconsistencies, but rather try to find out whether or not for example experience
with other similar devices plays a role, or that they are very happy with certain aspects
of the device, e.g. the aesthetic aspects, resulting in a much more friendly overall rating
of the device than one would expect on the basis of observed usability issues alone.
The interested reader who wants to learn in more depth about how to design
and conduct interview studies is referred to Oppenheim (2000), Wilson and Corlett
(2005).
75
CHAPTER FIVE • LET THE GAME TESTER DO THE TALKING
protocols will provide more information about details of the interaction in a particu-
lar context.
Conducting a think-aloud study generally requires that a prototype is available,
or at least some form of a mockup that allows the participant to get an idea of the
flow of interaction. However, this also means that any feedback collected comes
quite late in the development process, which in most cases will mean that no longer
fundamental changes can be made. Again, this points out that it is important to
adopt a multi-method approach, in combination with continuous feedback sessions
throughout the development cycle. And even more importantly, that adoption of
a certain method or tool for evaluation should be guided by the question(s) one
wants to address.
5.8 References
Als, B.S., Jensen, J.J., & Skov, M.B. (2005). Comparison of think-aloud and constructive inter-
action in usability testing with children. In: Proceedings of the Conference on Interaction
Design and Children, Boulder, Colorado. ACM Library, pp. 9–16.
Bainbridge, L., & Sanderson, P. (2005). Verbal protocol analysis. In J.R. Wilson, & E.N. Corlett
(Eds), Evaluation of Human Work. London: Taylor and Francis, pp. 159–184.
Boren, M.T., & Ramsey, J. (2000). Thinking aloud: reconciling theory and practice. IEEE trans-
actions on professional communication, 43(3), 261–278.
Ericsson, K.A., & Simon, H.A. (1984). (A revised edition was published in 1993) Protocol
Analysis: Verbal Reports as Data. Cambridge, MA: MIT Press.
Ericsson, K.A. (2006). Protocol analysis and expert thought: Concurrent verbalizations
of thinking during experts’ performance on representative tasks. In K.A. Ericsson,
N. Charness, R.R. Hoffman, & P.J. Feltovich (Eds), The Cambridge Handbook of Expertise
and Expert Performance. Cambridge: Cambridge University Press, pp. 223–241.
Fontijn, W.F.J., & Hoonhout, H.C.M. (2007a). Functional Fun with Tangible User Interfaces.
In Proceedings of Digitel 2007; 1st IEEE Int. Workshop Digital Game & Intelligent Toy
Enhanced Learning, 2007, Jhongli, Taiwan, pp. 119–123.
Fontijn, W., & Hoonhout, J. (2007b). Real balls, virtual targets: on the benefits of hitting a
wall. In Proceedings of PerGames, 11–12 June 2007, Salzburg, Austria, pp. 135–142.
Hoonhout, H.C.M., & Stienstra, M.A. (2003). Which factors in a consumer device make
the user smile?. In D. de Waard, K. Brookhuis, S. Sommer, & W. Verwey (Eds), Human
Factors in the Age of Virtual Reality. Maastricht, the Netherlands: Shaker Publications,
pp. 341–355.
Landis, J., & Koch, G. (1977). The measurement of observer agreement for categorical data.
Biometrics 33, 159–174.
Malone, T.W. (1982). Heuristics for designing enjoyable user interfaces: Lessons from compu-
ter games, In Nichols, Jean A. and Schneider, Michael L. (eds.) Proceedings of the SIGCHI
Conference on Human Factors in Computing Systems. March 15–17, 1982, Gaithersburg,
Maryland, United States, pp. 63–68.
Malone, T.W., & Lepper, M.R. (1987). Making learning fun: A taxonomy of intrinsic motiva-
tions for learning. In R.E. Snow, & M.J. Farr (Eds), Aptitude, learning and instruction.
Hillsdale, NJ: Erlbaum. pp. 223–253.
76
5.8 REFERENCES
Nisbett, R.E., & Wilson, T.D. (1977). Telling more than we know: verbal reports on mental
processes. Psychological review, 84(3), 231–241.
Oppenheim, A.N. (2000). Questionnaire Design, Interviewing, and Attitude Measurement
(2nd edition) London: Continuum.
Prensky, M. (2002). The motivation of gameplay. On the Horizon, Vol. 10, no.1.
Stienstra, M. (2003). Is every kid having fun? A gender approach to interactive toy design,
Enschede, the Netherlands: Twente University Press.
Van den Haak, M.J., de Jong, M.D.T., & Schellens, P.J. (2004). Employing think-aloud proto-
cols and constructive interaction to test the usability of online library catalogues: a meth-
odological comparison, Interacting with computers, 16, 1153–1170.
Van Someren, M.W., Barnard, Y.F., Sandberg, J.A.C. (1994). The Think Aloud Method. A
Practical Guide to Modelling Cognitive Processes. London: Academic Press. Also available
via: http://staff.science.uva.nl/⬃maarten/Think-aloud-method.pdf (last accessed at 15
April 2008).
Wilson, J.R. and Corlett, E.N. (Eds.) (2005). Evaluation of Human Work. Taylor and Francis,
London
77
CHAPTER
SIX
Heuristic Evaluation
of Games
Noah Shaffer is an M.S./Ph.D. student at Rensselaer
Polytechnic Institute, where he’s focusing on usability and
interaction design in games. Noah became a Certified
Usability Analyst (CUA) in 2004. He’s also completed
internships doing usability evaluation and institution-
alization at game companies, including Mobile2Win in
India and SNDA in China.
6.1 Introduction
Does your team have the time and money required to do a formal usability study to
find your usability problems? It’s typical for game design teams to have tight budg-
ets and ultra-tight timelines. There are other methods of usability testing that can be
quite effective. These “discount methods” exist to find usability problems quickly
and cheaply. They sacrifice some degree of thoroughness and statistical certainty for
improvements in speed and cost.
Wouldn’t it be great if usability evaluators could have a list of types of usability
problems to guide their evaluation? A list of guidelines like this would allow for
rapid, inexpensive usability evaluation. Such a tool would help with expert evalua-
tion, which is what we call the process of an expert doing a review of an interface
to find usability problems. Guidelines like these could even give useful results when
used by novice evaluators. This guidelines-based method has been around for about
twenty years (Nielsen & Molich 1990), and it’s called heuristic evaluation.
The word “heuristic” means shortcut. To avoid confusion, I’ll begin by noting
how some other fields use the word, to help identify the difference when we use
the word in usability. In the field of psychology, heuristics refers to shortcuts people
use to solve complex problems with incomplete information (Kahneman, Tversky, &
Slovic, 1982). Though the field of psychology acknowledges that heuristics are
79
CHAPTER SIX • HEURISTIC EVALUATION OF GAMES
adaptive and useful most of the time, the focus of psychology ends up being on
ways in which heuristics cause problems or mistakes or biases. This can lead to
a perception of heuristics as negative, which is entirely different from how we see
heuristics in usability. I want to stress this distinction with reference to psychology
specifically, because of the difference with regard to that negative connotation. The
word “heuristics” is also used in the field of computer science, where it’s a method
computers use to make a best-guess at finding solutions. This sacrifices some accu-
racy to gain speed. This other use of the word heuristics is similar to the way we
use it in usability, because in all cases it means some kind of shortcut.
In usability, heuristics are tools we explicitly learn to use for usability evaluation.
Usability heuristics are shortcuts to finding usability problems quickly and cheaply.
Nielsen’s 10 Heuristics
Visibility of system status
Match between system and the real world
User control and freedom
Consistency and standards
Error prevention
Recognition rather than recall
Flexibility and efficiency of use
Aesthetic and minimalist design
Help users recognize, diagnose, and recover from errors
Help and documentation
80
6.2 UNDERSTANDING HEURISTICS
81
CHAPTER SIX • HEURISTIC EVALUATION OF GAMES
categories of heuristics: Game Play, Game Story, Mechanics, and Usability. The HEP
is longer and more specific, which is promising. However, like Federoff’s heuristics,
many of the HEP heuristics focus heavily on issues of game design. Additionally,
most of the usability heuristics are targeted at the learnability of the game. This
leaves room for improvement in terms of guidelines for the user interface, intuitive-
ness, and sticking points.
Recently, a list of heuristics was designed for games for mobile phones at Nokia
(Korhonen and Koivisto, 2006). The list they began with was only eleven items
long, so it naturally suffered from some lack of specificity. However, they had bet-
ter results when they expanded the list to twenty-nine items under three categories:
game usability, mobility, and game play. These heuristics for mobile phone games
are particularly interesting when compared to the white paper which is discussed
later in this chapter (Schaffer, 2007), because both lists came from work on games
for mobile phones. Games for mobile phones have special limitations, such as the
need to allow short-duration play, small awkward control inputs, and small low-
resolution screens. The Korhonen and Koivisto heuristics (2006) indeed address
some of these kinds of issues with heuristics such as “interruptions are handled
reasonably.” For more comparison of these lists of heuristics, see the position paper
from the ACE 2007 conference workshop (Schaffer and Isbister, 2007).
As part of an internship at Mobile2Win in Mumbai, India in 2006, I made a list
of usability guidelines. The guidelines were made by identifying usability problems
that frequently came up. These guidelines evolved into a set of usability heuris-
tics which are written with the intention of allowing novice evaluators to conduct
usability analysis of games. These heuristics have been released as a white paper
(Schaffer, 2007) to promote the development of usability evaluation tools for dig-
ital games. The white paper includes a list of twenty-nine specific heuristics. These
heuristics are more specific, including suggestions such as “use natural control
mappings” and “don’t make it easy for players to get stuck or lost” (Schaffer, 2007).
In addition to the descriptions of the heuristics, every heuristic has an example
given.
All these lists of heuristics are thorough and useful, though certainly none are
perfect. As mentioned before, some lists include heuristics that aren’t centered
on the subject of usability so much as on game design. Some lists include rather
broad heuristics like “create a great storyline” (Federoff, 2002) and “a good game
should be easy to learn but hard to master” (Desurvire, Caplan, & Toth, 2004).
Some lists include heuristics that are hard to predict until the game’s production
is almost completely finished, like “one reward of playing should be the acquisi-
tion of skill” (Federoff, 2002). The good news is that these lists of game heuristics
haven’t been confined to the traditional rule of using just ten heuristics. This leaves
room for greater specificity. Even if some of these heuristics miss the target a little
bit, they’re still valuable and useful for usability evaluation. Table 6.1 is a more
detailed comparison of various lists of heuristics. (Editors’ note: There is also a list
of heuristics to be used in expert evaluations included in the next chapter in this
book.)
82
6.2 UNDERSTANDING HEURISTICS
83
CHAPTER SIX • HEURISTIC EVALUATION OF GAMES
6.3 Implementation
6.3.1 When To Use Heuristics: Test Early, Test Often
Test early. Heuristics are especially valuable very early in the design process,
because they can be used without a functioning game. Problems found early are
cheaper and easier to fix. However, information from user testing sketches of screen-
shots will tend to give results that aren’t very useful. With early sketches, very little
of the actual game experience is there for a user to try out. Heuristic evaluation can
be used very early to help find problems and avoid wasting any time with coding
things that will later need to be changed for usability reasons.
Test often. Usability evaluation methods miss some problems. Fixes to usability
problems can create new problems. And there are always other updates happening
to elements of games throughout the design process. For all these reasons, it’s criti-
cal to understand that one usability test is not enough. Several tests should happen
throughout the design process, and heuristic evaluation is just one of the types of
evaluation that can be done. This is why usability professionals often repeat that
“usability is iterative.”
Both heuristic evaluation and expert evaluation can be used earlier than user
testing. Although a relatively functional game is needed, user testing has higher
validity. Use heuristics for the earlier tests to scan for as many problems as possible,
especially the more severe problems. User testing should be used for later evalua-
tion, both to find severe problems that were missed or newly created and also to
help find and polish subtler or more minor problems.
Understand the value of heuristics, but also the limitations. Heuristics are a dis-
count usability method and will not find all usability problems. A game can easily
have 200 usability problems, and heuristic evaluation with 5 evaluators will typi-
cally find about 150 (75 percent) of them (Cockton and Woolrych, 2001).
84
6.3 IMPLEMENTATION
heuristics to games, the methods for applying them is fundamentally the same. While
there have been some criticisms of heuristic evaluation (Cockton & Woolrych, 2002,
Connell & Hammond, 2003), the methods for implementing the heuristics haven’t
changed since Nielsen’s book (Nielson, 1993, Usability Professionals’ Association,
2007).
Three to five evaluators are given the game for independent evaluation.
Evaluation is done separately, using the heuristics. Evaluators will take each heuris-
tic one at a time and look for violations of that heuristic in the design, like a check-
list. So at the end of the first step of evaluation, each evaluator will have a separate
list of usability problems. Each problem will have a severity rating and a note about
which heuristic it relates to. Evaluators do not collaborate or coordinate during this
first phase of evaluation.
The next step in heuristic evaluation is for the three evaluators to combine the
three lists into one rough master list. Problems that at least two evaluators agree on
stay on the list. Some problems may be identified by only one evaluator but stay on
the list because other evaluators agree that those problems are present.
The final step is to ready the report. This will include basic editing of the list of
problems, organizing the problems in order of severity, giving some recommenda-
tions for fixes for each problem, and perhaps adding screenshots for illustration. Note
that the heuristics that relate to each problem remain listed with each problem, for
the purpose of helping with understanding and lending credibility to the recommen-
dations. The usability experts specialize in identifying the problems, not the solutions
to those problems. So the design team may have better fixes to the problem than
what the usability experts suggest. A viable discount alternative to a written report
is a presented report. It’s up to your team to decide which is most appropriate for
your circumstances. With a verbal report, the team lead would show each problem
with screenshots and talk through the problems and severity. This helps speed up
the process of applying findings. (A few small suggestions about presenting results: if
there are plenty of severe problems, consider skipping the less severe ones. You don’t
want to overwhelm your audience or make them feel attacked. Also, be sure to open
your presentation by praising the work the team has done. Don’t underestimate the
value of encouragement, especially in helping with accepting constructive criticism.)
Evaluation is somewhat different depending on whether you use expert evalua-
tors or novice evaluators. Nielsen differentiates between types of experts as single
experts and double experts. Single experts have expertise in either the subject field
(games) or in usability. Double experts have expertise in both fields (Nielsen, 1993).
Each novice evaluator will find approximately 22 percent of usability problems.
People who are double experts will each find about 60 percent of usability problems
(Nielson, 1993). What kind of evaluators you choose will depend on cost and avail-
ability, but it’s important to understand the difference.
Expert evaluators will tend towards using heuristics to categorize the usability
problems they find, rather than to find usability problems in the first place (Cockton &
Woolrych, 2002, Connell & Hammond, 1999). Even when this is the case, heuristics
are useful as a lexicon that can be used for communication of usability problems.
85
CHAPTER SIX • HEURISTIC EVALUATION OF GAMES
Additionally, heuristics give some credibility to the results evaluators find. Generally,
at least to some extent, heuristics will help to broaden the scope of expert evaluation.
Experts should be guided through a more thorough and far-reaching search through
heuristics.
It’s important to know how to find expert evaluators. Since the criteria you’re
interested in is expertise, some things to look for are degrees in usability-related
fields like HCI and Human Factors, certificates like the Certified Usability Analyst
(CUA) certificate offered by Human Factors International, and experience doing
usability work in industry. Experience or knowledge specific to games is valuable
as well, since usability for a factory is fairly different than for a video or computer
game. For a large game development firm, hiring expert evaluators may be feasible.
To find such experts, contact the Computer-Human Interaction (CHI) association
or Usability Professionals Association (UPA). Remember the difference that Nielsen
talks about between single experts and double experts (Nielsen, 1993). However, for
a smaller firm it may make more sense to hire a consultant company to do the eval-
uation. Expect an expensive hourly rate from a consultant, which is the downside
to not hiring someone on a permanent basis.
Alternatively, novice evaluators can be used. The methodology is basically the
same, except an expert should take over or at least be involved after the first phase,
making the master list and such. Recall the downside that novice users find consid-
erably fewer usability problems (Nielsen, 1993) so consider using more evaluators if
you’re using novice evaluators.
86
6.4 CONCLUSION
Don’t confuse heuristics with standards. Standards give designers rules to fol-
low, which speeds design time and reduces emergence of usability problems. With
standards, you have a highly specific set of rules that the designers in your com-
pany follow to maintain consistency in order to improve intuitiveness. Those rules
are set up with usability in mind as well, so that by following them, designers can
help to reduce usability problems. In contrast, heuristics are guidelines for evalua-
tion. They’re not really geared for design. It’s okay for designers to see heuristics in
order to help them understand what to watch out for, but heuristics and standards
are fundamentally very different.
6.4 Conclusion
Heuristics are a useful tool for analysis of usability in games. Just as with tradi-
tional heuristic evaluation of other interfaces, heuristic evaluation of games is a
87
CHAPTER SIX • HEURISTIC EVALUATION OF GAMES
“quick and dirty” approach. In the world of digital game design, speed is absolutely
critical. For instance, the lifecycle of the games made by Mobile2Win is sixteen to
twenty-five days. With such fast-paced production, the speed of tools like these heu-
ristics is extremely valuable.
For a complete usability solution, usability heuristics are just one tool among
many. Usability should be 8 to 12 percent of any design project (Nielsen & Giluz,
2003), and games are no exception. Other tools are also important in a complete
usability solution, such as expert analysis and especially user testing.
Usability in games is still a relatively young field, so it’s not surprising that
there’s still room for improvement of the available heuristics for games. There is,
presently, no definitive established list of heuristics for games. Instead there are sev-
eral strong available lists. The lists that are available are good, but there is still some
room for such lists to evolve. For instance, I’ve just published a study (2007) that
tested the effect of quantity of animation on enjoyment, which ended up indicat-
ing that quantity of animation should not be included in a list of heuristics. Such
empirical studies help to evolve a list of usability heuristics that’s relevant to player
enjoyment. Perhaps the evolution of heuristics will result in a longer, more specific
list. Perhaps such evolution will result in a short list with more general heuristics.
Or perhaps we’ll end up with continued different lists competing, and being used
on somewhat different types of games. It’s an exciting new field, but regardless of
which list you use, heuristics are a powerful tool at your disposal.
6.5 Acknowledgements
Thanks first and foremost to Katherine Isbister for all her guidance and assistance.
Thanks to Mobile2Win for their cooperation and collaboration. Thanks to James
Watt, Mike Lynch, John Sherry, and Steve Swink of Flashbang Studios for all their
help on the animation quantity project.
6.6 References
Cockton, G., & Woolrych, A. (2001). Understanding inspection methods. In A. Blanford,
J. Vanderdonckt, & P.D. Gray (Eds), People and Computers XV. Springer-Verlag.
pp. 171–192.
Cockton, G., & Woolrych, A. (2002). Business: Sale must end: should discount methods be
cleared off HCI’s shelves? Interactions, Volume 9(Issue 5). Publisher: ACM Press, New York,
NY. pp. 13–18.
Connell, I.W., & Hammond, N.V. (1999). Comparing Usability Evaluation Principles with
Heuristics. Interact’99, Proceedings of the 7th IFIP TC.13 international conference on
Human-Computer interaction Edinburgh, August–September 1999, pp. 621–636.
Amsterdam: IOS press
Desurvire, H., Caplan, M., & Toth, J.A. (2004). Using Heuristics to Evaluate the Playability
of Games. Conference on Human Factors in Computing Systems. New York: ACM Press.
Vienna, Austria. pp. 1509–1512.
88
6.6 REFERENCES
Federoff, M. A. (2002). Heuristics and Usability Guidelines for the Creation and Evaluation
of Fun in Video Games. Masters Thesis at Indiana University. http://melissafederoff.com/
heuristics_usability_games.pdf
Gerhardt-Powals, J. (1996). Cognitive engineering principles for enhancing human-computer
performance. International Journal of Human-Computer Interaction, 8(2), 189–211.
Kahneman, D., Tversky, A., & Slovic, P. (1982). Judgement under Uncertainty: Heuristics &
Biases. Cambridge University Press.
Kamper, R.J. (2002). Extending the Usability of Heuristics for Design and Evaluation: Lead,
Follow, and Get Out of the Way. International Journal of Human Computer Interaction,
14(3–4), 447–462.
Korhonen, H. & Koivisto, E. (September 2006). Mobile entertainment: Playability heuristics
for mobile games. Proceedings of the 8th conference on Human-computer interaction with
mobile devices and services MobileHCI ’06. ACM Press.
Laitinen, S. (2006). Do usability expert evaluation and test provide novel and useful data for
game development? Journal of Usability Studies, volume 1(issue 2).
Nielsen, J. (1993). Usability Engineering, 26. Morgan Kaufmann, San Francisco.
Nielsen, J. & Giluz, S. (Jan 2003) Usability Return on Investment. Private Report. Nielsen
Norman Group.
Nielsen, J., Molich, R. (1990). Heuristic evaluation of user interfaces. Proc. ACM CHI’90
(Seattle, WA, 1–5 April), 249–256.
Usability Professionals’ Association. (2007). Usability Body of Knowledge Website. http://
www.usabilitybok.org/methods/p275?sectionhow-to
Schaffer, N.M. (2007) Heuristics for Usability in Games. White paper. Available online at
http://friendlymedia.sbrl.rpi.edu/lab-papers.html
Schaffer, N.M. (2007). Animation Quantity in Computer Games. Gamasutra masters thesis
section. http://www.gamasutra.com
Schaffer, N.M. & Isbister, K. (2007). Heuristics for Usability Evaluation of
Electronic Games. ACE 2007 conference. Can be found at http://ace2007.
org/program/evaluating_games_ws.html
Shneiderman, B. (1998). Designing the User Interface: Strategies for Effective Human-Computer
Interaction (3rd ed.), Menlo Park, CA: Addison Wesley.
89
CHAPTER
SEVEN
Usability and
Playability Expert
Evaluation
7.1 Introduction
Given the previous chapter in this book, it is useful to note here that the terms
expert evaluation and heuristic evaluation are often used interchangeably.
The term expert evaluation is used, when we want to highlight that the evaluators’
experience and other sources of information, such as design guidelines, also play
an important role in the evaluation. Expert evaluation doesn’t always involve heu-
ristics, although the method I’ll recommend in this chapter includes heuristics.
In an expert evaluation, a group of evaluators review the game. They look for
potential usability and gameplay problems that may hinder playing the game.
After the review, the evaluators create a report in which they present the findings,
discuss the reasons behind them, and suggest solutions how the problems can be
addressed.
91
CHAPTER SEVEN • USABILITY AND PLAYABILITY EXPERT EVALUATION
The goal of the evaluation is to aid the game developers to ensure that the user
interface is easy to use and there are no challenges in the gameplay that the game
developers did not intend it to have. These goals are important for several reasons.
For one, poor usability may scare the players off, even before they get to play the
game. For example, if learning how to play the game is difficult then the players
may choose to play another game or they may decide not to even play any of the
game at all.
Another reason is that playing games is supposed to be fun. Even the smallest
glitch or hiccup in the game’s user interface may otherwise render a good game
into a rather annoying experience. For example, if managing the inventory of a role-
playing game is not fluent or the restarting of a race in a driving game is a tedious
and long process, the players are not likely to enjoy playing the games as much as
they could. The same can also happen if the gameplay is not well-designed. Having
to complete the same task too many times or not getting any meaningful and
interesting rewards for the achievements will make playing the game more boring
than fun.
There are also other reasons why good usability and a polished gameplay are
important. One of them is that modern games are relatively complex. Even the sim-
plest games tend to contain many features that the players are required to master
in order to enjoy the game to its full potential. Making complex games both easy to
learn and effortless to play, requires careful design and hard work.
Usability expert evaluation is an efficient and flexible method for achieving these
goals. In this chapter, it will be described how an expert evaluation is typically
done and what kind of results one can expect to get from it. It will also be dis-
cussed, what are the key strengths and weaknesses of the method and how it com-
pares with other user-centered design methods that are commonly used in game
design and development. Before moving on to these topics, the key aspects that are
reviewed in a typical expert evaluation will be discussed first.
92
7.2 WHAT IS BEING EVALUATED
other possible user interface elements that the player uses before, during and after
playing the game. The evaluators review the user interface using usability heuristics
and their knowledge of good design practices. Typical usability problems found in
the games include, for example, menus that are cumbersome to use, displays whose
meanings are not clear and controls that are difficult to learn.
The goal of evaluating the game usability is to make sure that the user interface
is easy to learn, fluent to use, and that it supports the interactions that are typical
for the game under evaluation. If these goals are met, the players can focus on play-
ing and enjoying the game that the developers have designed. Otherwise, the play-
ers may end up struggling with the user interface and not the real challenges that
they are intended to be fighting with.
7.2.2 Gameplay
Poor user interface may ruin a game that is otherwise good, but even perfect user
interface will not save a game that is just not fun to play. Because of this, it is not
enough to evaluate just the user interface of the game. The gameplay also needs to
be evaluated as part of the expert evaluation.
When reviewing the gameplay, the evaluators ignore the user interface and focus
on the game itself. They review the game’s mechanics and study the interactions
that occur within the game. The goal is to find and remove the challenges that
are not intended by the game developers to be in the game and to make sure that
the gameplay is as fluent and fun as possible. Typical gameplay problems include,
for example, boring and repetitive tasks, the next target that the player should
achieve not being clear, and a punishment for a failure in a way that cannot be
considered fair.
The gameplay is evaluated in a manner similar to a game’s user interface. The
main difference is that gameplay heuristics are used instead of the usability heuris-
tics. These heuristics are discussed later on in this chapter.
93
CHAPTER SEVEN • USABILITY AND PLAYABILITY EXPERT EVALUATION
7.3.1 Experts
In a typical expert evaluation, two or three evaluators review the game. The reason
why it is common to have more than one evaluator is that different evaluators tend
to find different problems. Increasing the number of evaluators will increase the
proportion of the problems found in the evaluation, without considerably increasing
the calendar time needed for the evaluation. Having more than one evaluator will
also improve the quality of the report. The different evaluators will bring in differ-
ent points of view, and being able to discuss both the problems and the solutions to
them will make the work easier for the evaluators.
On the other hand, experience has led me to believe that having more than three
experts seldom brings any benefits. The number of new problems found does not
rise considerably by having more than three evaluators, and the project will become
more expensive and challenging to coordinate. It is recommended to only have two
or three evaluators if there are no special reasons for having more specialists par-
ticipate in the process. More might be wanted, for example, if several people who
have special experience or knowledge about the game that you want included in
the expert evaluation.
94
7.4 WHEN TO EVALUATE
gameplay, and they best understand the expectations of the game developers. The
double expertise is also beneficial when reporting the findings and thinking about
the ways in how the problems can be addressed. The background in usability will
help to understand and explain the reasons behind the problems and the exper-
tise in gaming will help to come up with good and realistic workarounds for the
problems.
Unfortunately, it is often the case that there are not enough double experts avail-
able to make a full team. If this is the case, then it is often considered acceptable
to have evaluators who are experts in only one area participate in the evaluation
(see, for example, Laitinen, 2006). To ensure the quality of work, the person who
leads the evaluation and compiles the report should, if possible, be a double expert.
For more information about how the number and expertise of the evaluators affect
the results of the expert evaluation, see for example Nielsen’s (1993) book Usability
Engineering.
95
CHAPTER SEVEN • USABILITY AND PLAYABILITY EXPERT EVALUATION
out the most important parts of the game. This can be especially problematic if the
gameplay contains a lot of novel features or the evaluators are inexperienced in
evaluating the playability. Luckily, evaluating the game usability is easier. Spotting
usability problems, for example, from the menus and displays is often relatively
straightforward for people with some experience in usability.
These challenges can be tackled to some extent by complementing the evalua-
tion by studying possible earlier versions of the game and benchmarking the game
against other similar games. These help the evaluators better understand the game
designers’ intentions and to pickup problems that are not obvious based on the
documentation.
96
7.5 PROCESS
the given time, can also be used as background material when designing a possible
sequel for the game.
7.5 Process
The expert evaluation consists of several steps. First, the study is planned and the
work is organized, then the evaluators review the game and discuss the findings.
After that, the report is written and the results are presented to the game develop-
ers. These steps will be discussed in detail next.
97
CHAPTER SEVEN • USABILITY AND PLAYABILITY EXPERT EVALUATION
98
7.5 PROCESS
One way to reduce the time required for evaluating the game is to provide short-
cuts, so the evaluator can try out different aspects of the game without playing the
game all the way through. This, of course, has the downside that the experience is
not as realistic as it would be if the evaluators had the time to play the game as it is
intended to be played. This may hinder finding problems that become obvious only
over time, such as boring and repetitive tasks in the gameplay.
99
CHAPTER SEVEN • USABILITY AND PLAYABILITY EXPERT EVALUATION
Rating Important
Description Sometimes it happens that the player cannot pick up an item because
there is no room in the inventory. If this happens, the player is not
given any feedback. This is problematic as the player may not know
why they cannot pick up the item. It is likely that the player will figure it
out eventually, but the confusion and extra effort required are likely to
cause frustration.
Solution Give the user proper feedback in every situation where the user
interacts with the environment. If the item cannot be picked up, inform
the user of this with a sound and/or textual feedback.
Rating Important
Description If the player does not pay attention to the dialogue, it may happen that
a change of the mission objective may go unnoticed.
Solution Provide the users with a clear notification about a new mission
objective. It may not be enough to just inform the players about the
new objective in the in-game dialogue.
Rating Important
Description Movement speed can be considered slow, when the player is required
to walk over long distances to reach objectives. On the other hand, the
movement speed seems to be good when fighting with enemies or
moving short distances.
100
7.5 PROCESS
playing the game or will affect the experience so negatively that it risks players
abandoning the game. A poorly designed map display that is important for playing
the game is a good example of a potentially critical problem.
The difference between the moderate and important problems is not as clear cut
as the one between the minor and critical ones. Despite this, it is often a good idea
to split the intermediate problems into these two categories. The categories help
the developers prioritize their work when fixing the problems. A separate category
for the unclassified problems can also come in handy if the game is still at the very
early stages of development. The issues that cannot be evaluated before the game is
more complete can be classified to this category.
The written description of the problem is an opportunity to explain the prob-
lem and the reasons behind it in detail. These can be useful if the problem is so
complex that it cannot be described in the title only. Describing the reason behind
the problem can also help the developers understand the problem better and avoid
similar problems in the future. Screenshots can also be used to describe the prob-
lems. Screenshots make it easier to describe the problem and also often speed up
the reading of the report and make it visually more appealing.
The suggested solution should give a concrete example, how the problem can
be addressed. Most often this can be done in few sentences, but sometimes there
may be need for an illustrative picture that describes, for example, the new layout
of the screen or another aspect of the game that needs to be developed further. If
a large number of illustrations are required, writing the report may take longer
than usual.
The number of problems found in a typical expert evaluation varies from twenty
to fifty. The complexity of the game and the material used affects the number
of issues found. When reporting the problems, it should be kept in mind that the
quality of the findings is more important than the number of the problems.
In addition to listing the problems found in the evaluation, it is a good prac-
tice to provide a summary of the key findings. Listing for example the three most
important areas that should be improved at the beginning of the report will help to
set a context for reading the rest of the report. It is also recommended that the key
strengths of the game are briefly discussed at the beginning of the report. Listing
these will remind the developers that they have done good work and they will also
know to avoid changing these aspects.
101
CHAPTER SEVEN • USABILITY AND PLAYABILITY EXPERT EVALUATION
7.6 Heuristics
For a more in-depth discussion of the merits of heuristics, and an outline of various
prior approaches to using heuristics with games, see the previous chapter in this
book. In this chapter, I divide the heuristics I use into two classes: usability heuris-
tics and gameplay heuristics. Samples are provided for each.
Consistency
The user interface should be consistent both within the game and between the games.
Consistency is important as it facilitates learning how to play the game, reduces the
number of unnecessary errors and makes using the user interface more fluent.
102
7.6 HEURISTICS
Consistency within the game means that there are no unnecessary exceptions in
how similar functions are implemented. For example, the menus should function in
a similar way throughout the game.
Consistency between games refers to following the common conventions and
standards specific to the platform and the game type. Controls are a good example
of the importance of this. If the controls used in the game are unconventional, the
player must spend some time learning the controls before being able to start enjoy-
ing the game. From the player’s point of view, this can be especially frustrating, if
there is no obvious reason for breaking the convention. First-person shooter games
provide many examples of this. Different buttons are used for the very same actions
across different games in the genre.
Provide feedback
The game should provide immediate, adequate, and easy-to-understand feedback
after each action taken within the game. The action can take place either while play-
ing the game or while using the menus before or after playing the game. The action
can be for example, a single press of a button, complicated input sequence like com-
bos in fighting games or the character interacting with the environment within the
game world. Walking over an item to pick it up is a common example of the player
interacting with the game environment.
The feedback is important, so that the player understands that the action has
been registered and it also supports understanding the consequences of the action.
Being sure that the action has been registered will help to reduce unnecessary inse-
curities and providing good feedback aids in learning how to play the game.
Avoid errors
The user interface should be designed so that it prevents the player from making
mistakes that are not part of the gameplay. It is especially important to prevent the
103
CHAPTER SEVEN • USABILITY AND PLAYABILITY EXPERT EVALUATION
user from making irreversible errors that may seriously affect playing the game. This
can be achieved, for example, by designing the user interface so that it facilitates
making correct choices and does not provide opportunities for making mistakes.
Limiting the options available, providing help and automating actions where possible,
are other ways to reduce the number of errors the players make.
If an error occurs, provide an easy-to-understand error message that informs the
player about the consequences of the error and what the player can do to recover
from the error.
Provide help
Help and documentation should be provided within the game. Novice players need
to be aided in learning how to play the game and experts may want to have more
information about the details of the game. Providing help within the game is impor-
tant, because the players do not often read the manuals or the manuals may not be
available.
If a tutorial is used to provide the help, care should be taken that it is entertain-
ing and it does not slow down the experienced players who do not need extensive
support to start playing the game.
104
7.6 HEURISTICS
105
CHAPTER SEVEN • USABILITY AND PLAYABILITY EXPERT EVALUATION
challenges that were not intended by the game developers. The gameplay heuris-
tics created by Korhonen and Koivisto (2006) are presented in this chapter, as their
list of heuristics is relatively compact and covers the key aspects of the gameplay
well.
106
7.6 HEURISTICS
strategy. This is because if there is one strategy that overshadows the other possible
strategies, it is likely that the players will use this and the gameplay never reaches
its full potential.
The pace of the game should also be correct. If the player needs time for think-
ing, then the player should be provided adequate time for doing this. The intense
and relaxed phases of the game should also be paced correctly. Intensive phases, for
too long periods, will exhaust the player and too long periods of relaxed phases run
the risk of boring the player.
107
CHAPTER SEVEN • USABILITY AND PLAYABILITY EXPERT EVALUATION
If the game mechanics and the game world have counter parts in the real world,
these should be consistent too. For example, a small fence should not be used to
limit the area that the player can explore in a real game. Consistency between the
game and real world will help the players to understand the meaning of the objects
and actions and to use them correctly.
108
7.7 SUMMARY
what the heuristic rules mean in practice and help to find issues that are not
covered directly by the heuristic rules. The experience will also help in coming up
with good and feasible suggestions on how the problems can be solved.
The experience gained, when observing and analyzing playability tests, is espe-
cially useful when conducting an expert evaluation. This is because the playability
tests give valuable information on how players play the games and what they think
of the different design practices. This information can be highly valuable when
reviewing the game for usability and playability problems and thinking about pos-
sible ways that problems could be solved.
7.7 Summary
The expert evaluation is a flexible and efficient method for finding and address-
ing the usability and gameplay problems that may hinder playing the game. The
evaluation can be done at different points of the development process and it can be
focused on the issues that are of the most interest to the game developers, at that
point of the time. The method is efficient because the number of people needed for
conducting the evaluation is small and the results that consist of both the problems
and the recommendations on how to tackle them are delivered quickly. For the very
same reasons, expert evaluation is also a very cost efficient way for improving the
quality of the game.
Conducting an expert evaluation or evaluations will also encourage the develop-
ment of the game in an iterative manner. Scheduling time for the gathering of feed-
back and refining the game based on it will help to deliver a better game. Employing
usability and playability experts will also bring in new expertise that may be benefi-
cial, if there are no user interface designers already in the team or the user interface
of the game is complex.
When compared with user-centered development methods, the expert evaluation
has the benefit that no players are needed in the evaluation process. This is benefi-
cial when evaluating the early prototypes of the game. It is easier for the experts to
ignore the missing features and bugs, than it is for the gamers who represent the
target audience of the game. The other benefit, that of not having players involved
in the evaluation, is that the risk of information leaks is minimal.
The fact that the players from the target group of the game are not involved in
the evaluation process is also the main weakness of the method. The experts’ views
on the usability and playability of the game do not represent what the players think
about the game. If feedback is desired on the players’ opinions about the game,
a playability test should be conducted (see Chapter IIA for more information).
A playability test will also help to make sure that there are no major usability or
playability problems in the game that have gone unnoticed in the expert evalua-
tion. This may happen, as knowing how the players really play the game and what
the exact issues are that cause the players difficulties can be challenging, for even
the most experienced evaluators. Luckily, expert evaluation and playability testing
109
CHAPTER SEVEN•USABILITY AND PLAYABILITY EXPERT EVALUATION
rather complement than exclude each other. They are suitable to be used at differ-
ent parts of the game development and they represent different points of views that
are both useful to the developers.
The expert evaluation should not be confused with the traditional quality
assurance testing. The goal of the expert evaluation is not to find bugs, but to
participate in the user interface and game design in a creative and collaborative
manner. The evaluators, who conduct the expert evaluation, take an active part
in the design process and provide the developers with feedback that they can use
to improve the user interface and game designs. The result of this work is not
fewer bugs in the game, but a better game that will be more fun to play than it
otherwise would have been.
7.8 References
Bartle R. (1996). Hearts, clubs, diamonds, spades: Players who suit MUDs. Retrieved May 28,
2008. form the World Wide Web: http://www.mud.co.uk/Richard/hcds.htm.
Koivisto, E. M. I. and Korhonen, H. (2006). Mobile Game Playability Heuristics, Forum
Nokia. Available at www.forum.nokia.com/info/sw.nokia.com/id/5ed5c7a3-73f3-48ab-
8ele-631286fd26bf/Mobile_Game_Playability_Heuristics_vl_0_en.pdf.html.
110
7.8 REFERENCES
111
CHAPTER
EIGHT
Interview with Eric
Schaffer, Ph.D., CEO
of Human Factors
International
Interviewer:
Noah Schaffer
I can understand the usefulness of interface design standards for a phone company
or a bank. But how can you say standards are useful for game designers? Game
designers need to have flexibility to make things interesting and challenging.
Flexibility is certainly important. Game designers need flexibility in crafting story
line, and in optimizing the level of challenge and excitement in a game to create a
113
CHAPTER EIGHT • INTERVIEW WITH ERIC SCHAFFER
sense of flow and immersion. But players shouldn’t have to get creative in trying to
figure out which button the designer decided to make the FIRE button in this level
of the game, or where the designer decided to show the player’s health meter on the
heads-up display.
The main application of standards for games is to the “container” of the game—
the navigational and operational methods included in the heads-up display and
in the control mappings of actions to input devices such as keyboard or control-
ler. This part of the game is successful if it disappears and lets the user feel she is
directly interacting in the virtual environment. This container will not disappear if it
is inconsistent. So the most critical standards will refer to things like the shell menu,
the HUD, and interactive controls.
Can you describe in more detail what an usability standard is, for those new to the
term?
First when people talk about standards they can mean three different things.
There are design principles. This is fatherly advice about interface design. Badly
done principles sound like “Write Clear Error Messages” (I’ve run across this bril-
liant “standard” a lot). Good ones read “Use short words, use short sentences, and
write in the active voice.” I was one of many that helped Sid Smith write the famous
MITRE guidelines. This was several hundred pages of this type of advice, and I’m
not sure if anyone ever read it all. It is just hard to integrate a large body of these
114
CHAPTER EIGHT • INTERVIEW WITH ERIC SCHAFFER
principles. This is why I would suggest using training instead. You can find such
rules today in www.usability.gov for example. But you are better off taking a course.
There are design standards. These are very specific. If they are successful, the
user sees a screen and thinks “I’ve seen one like this before” and then later finds
“YES! It works just the way I expected.” This might SEEM easy, but it takes some
very hard work to make sure that that experience happens.
Finally, there are methodological standards. These are systematic processes of
user-centered design. These are all very valuable guides to help the usability profes-
sional work efficiently and avoid forgetting key steps. But they do not inform the
design at all.
So for game designers, which is the most important of these types of standards?
You need all of them! You need a methodology for systematically and reliably cre-
ating effective game user interfaces. You should also have access to the research-
based principles and models for designing human-computer interactions. This
means knowing the rules and methods for navigation design, information architec-
ture, and detailed design. But the game designer needs more. On top of the princi-
ples that will make a design that the user can use, we must also apply the principles
that ensure that the user will want to use the design. These are the hottest area of
research in the user-experience field.
But the main focus of this discussion is the design standard. I’m a strong advo-
cate of the benefits of including standards in the design process, and that includes
game interface design.
115
CHAPTER EIGHT • INTERVIEW WITH ERIC SCHAFFER
to twelve weeks in the effort to create standards for productivity software and web-
sites. But it pays off fast. The Royal Bank of Canada reported that their standard cut
their overall system development cost by over 10 percent. So it’s a good business
decision.
If you have a very small operation, you can sacrifice the detailed documentation
and just make a set of examples that everyone can follow. This means a lot more
effort to communicate what the rules really are. But this is better then nothing for
sure. A small operation can put together good examples in a couple of weeks.
Tell me more about the process you use for making standards.
As with all usability work, management buy-in is critical. You’re about to pull some
key people away from other work they could be doing, and whoever controls the
money needs to be fully supporting you. But the next level of getting buy-in is also
critical. You don’t just sit in a room and draw up the standards yourself, because no
one will use them.
Instead, the creation of the standards is collaborative. You make the standards
by committee. And the people on that committee have to be carefully chosen. The
committee will be about eight opinion leaders. So these are the people on the design
team who everyone else in that team comes to when they’re uncertain or have a
question. This is absolutely critical, because these are the people who will go out
to the teams and say to follow the standards. And they’ll push for it because they
personally had a hand in creation, so they own it. Maybe by using a committee
the standards have more problems and mistakes. But even if the standards end up
being only 70 percent correct, that’s still far better than a perfect set of standards
that no one uses.
So once you have your committee, you’re going to sit and design the heck out
of each type of screen. On one particular interface, maybe it makes sense for one
person on the design team to work on a screen for an hour or so. But because these
designs will be used again and again, we have a bigger group spend more time.
Maybe the whole committee spends a full day on that same screen. So after you’ve
spent some weeks making the standards, you test the standards out by using them
to make a game. Once you have that game, you do usability evaluation methods
and find mistakes. And, of course, you generalize those mistakes to the standards as
much as possible so that you can edit them. This leads to an improved, edited set of
standards which gets turned into a document which is distributed through the com-
pany. Generally we follow this with one to three showcase interfaces made using
the standards. But that’s not the end.
At this point, everyone has the standards and you have opinion leaders pushing
to use them. But you really need some education so that people really understand
the standards. You’ll need to make a revolving class, maybe a day or so, designed to
teach the standards.
Then, even after all those other steps to be sure the standards get used, some-
times some people won’t use them. Then comes. . . the big E. . .
116
CHAPTER EIGHT • INTERVIEW WITH ERIC SCHAFFER
Euthanasia?
Close. Enforcement. We hope that no heads actually have to roll for violating stand-
ards, but sometimes it comes to that. Standards are not gentle suggestions. They’re
law. And when laws are broken, there must be law enforcement.
Are there exceptions? Such as a design firm that focuses on just one narrow category
of interface? Some game studios are like this, developing in one genre exclusively.
As I said before, in a small operation, it may make sense to have a few basic screen
examples for people to follow, if only to maintain some consistency within the few
titles that company is producing. But this approach is wildly inefficient and imprac-
tical with a larger game design operation with many titles. I hope I’ve made the
case that it’s worthwhile to take the time to develop standards.
Thanks very much for your time and attention today. I think many people in the
games industry will appreciate what you’ve had to say.
117
CHAPTER
NINE
Master Metrics: The
Science Behind the
Art of Game Design
Chris Swain is an assistant professor at the University of
Southern California School of Cinematic Arts, Interactive
Media Division. He is a game designer, and co-author
of the textbook Game Design Workshop. He co-directs
the EA Game Innovation Lab at USC. His game design
research specializes in matters related to original system
design and new kinds of play. The lab seeks to change
conventional wisdom about what games are and can be.
9.1 Overview
This chapter describes eight metrics-based game design techniques that can be used
to help make better play experiences. The techniques have been culled from some
of the leading game designers in the world via a series of personal interviews. All
techniques are presented as “theory for practitioners”—meaning they are intended
to be practical and hands-on for working game designers.
9.2 Background
Game design is the art of crafting player experiences. Creating good player experiences
involves an iterative process wherein developers: (a) make game prototypes, (b) watch
people play them, and then (c) revise the work. Developers repeat this process mak-
ing tweaks and additions to the prototypes in each successive iteration. Historically
designers have relied almost entirely on creative judgment to decide how to tweak
their games to make them play better with gamers. Creative tinkering and trial and
119
CHAPTER NINE • MASTER METRICS: THE SCIENCE BEHIND THE ART OF GAME DESIGN
error has been (and still is) the norm in game development for making tweaks
between tests. Recently however more sophisticated techniques have bubbled up from
the design community. These techniques have come about largely out of necessity to
deal with the dramatic increases in complexity in our medium. This complexity has
emerged because of rapid increases in processor power, media storage capacity, broad-
band connectivity, game complexity, production costs, team sizes, and other factors.
To create the list of techniques many developers allowed themselves to be inter-
viewed and provided hard data from their work. Gathering the info is somewhat
significant because in today’s developer culture design techniques are generally not
codified in writing but rather passed along verbally and refined through constant
collaboration among individuals in a company. In cases where techniques are codi-
fied the info is generally kept internal to a company for competitive reasons.
Interestingly the techniques detailed below all share the fact that they involve a
tangible metric, for example, something that can be measured. Sometimes the met-
ric measures playtester behavior in some way. Sometimes the metric measures an
element internal to the game itself. Sometimes the metric measures neither of these
things. The fact that they all involve measurement and that they come from a cross
section of our community is important because it indicates that our art form is
figuring out how to embrace scientific method.
It is an exciting time to be designing games because there are so many rich
uncharted areas to explore. Games can and will have increasingly meaningful
impact on entertainment, education, journalism, the arts, and many other fields in
the coming years. A creative embrace of scientific method and metrics-based tech-
niques will help take us to these new places.
An Important Note
Metrics-based design techniques are tools to assist the creative process. Period.
I don’t believe you can or should put a formula on game design.
(Editors’ note: Several of the techniques introduced in this chapter are described
in greater detail in other chapters of this book. Where a method is mentioned that
is included later in the book, we’ve indicated which chapter to turn to for more
information.)
120
9.4 FEATURE DESIGN: LISTEN TO METACRITIC BUT DON’T BE A SLAVE TO METACRITIC
● Can you analyze games on Metacritic score and derive guidelines about what
feature sets or other qualities tend to work and what tend to fail?
● As a creative person would you want to do that?
121
CHAPTER NINE • MASTER METRICS: THE SCIENCE BEHIND THE ART OF GAME DESIGN
122
9.4 FEATURE DESIGN: LISTEN TO METACRITIC BUT DON’T BE A SLAVE TO METACRITIC
6. Engaging story/characters
7. Quality interactive world/artificial intelligence
8. Responsive camera
outward
focus Designer Engineer
move move
Source: Rich Gold, 2001
minds moecules
123
CHAPTER NINE • MASTER METRICS: THE SCIENCE BEHIND THE ART OF GAME DESIGN
The chart is relevant to this chapter because we as game developers walk in all
four quadrants. Notice that the Artist-Scientist row describes inward focused ven-
tures—meaning these are people who explore their passion in a very personal way.
Think about Pablo Picasso. It would be inconceivable for him to take his art to a
focus group for feedback.
Conversely, notice that the Designer-Engineer row describes outward focused
endeavors—meaning these people explore their passion with users and customers
in mind. Although a game designer can technically be in the Artist-Scientist row
the fact that we are so reliant on feedback from users to build our games argues the
point that game design is an outward focused venture. Thus in nearly all cases—
even for very artistic games-game designers operate in the Designer-Engineer row.
This outward focus on players helps validate the idea of using Metacritic informa-
tion (as well as the rest of the techniques below) when creating games.
9.5 Take-Aways
1. There are things to learn from the lists above, however it’s important to not be a
slave Metacritic data. In fact, doing so too literally would violate the first point
on the list—“undifferentiated from similar titles.”
2. There’s a “know the rules so you can break them” effect. Hopefully, the analysis
in this technique will provide some insights into what has proven to be func-
tional so you can springboard off it.
124
9.6 FEATURE DESIGN: MORPHOLOGICAL ANALYSIS IS ANALYTICAL CREATIVITY IN ACTION
Here are two examples of how this process has been adapted by game
developers.
Example 1
Character Design for Jak and Daxter
When designing the hero for a new title Dan Arey and the team at Naughty
Dog, Inc. broke down potential characters parameters into six categories and
then brainstormed creative ideas under each. Here is what they came up with:
Once the team had developed the lists they highlighted ideas they found
interesting for their hero. The team’s highlights can be seen above in italics.
This process was the beginning of the character design for Naughty Dog’s
original character Jak. When the brainstorm began they did not have a con-
cept for a hero with an integral sidekick. However, through the Morphological
Analysis technique process the idea for a sidekick emerged. It is worth not-
ing here because the process generated ideas that had impact far beyond the
original intent of a character design brainstorm, for example, gameplay in
which a sidekick can be a weapon/comic relief/etc. This was the genesis of
the Naughty Dog character Daxter.
125
CHAPTER NINE • MASTER METRICS: THE SCIENCE BEHIND THE ART OF GAME DESIGN
Variations on this process have been used by designers across time including
Leonardo da Vinci. Da Vinci created notebooks of a hundreds of facial features
ranging from beautiful to grotesque—for example, mouths, chins, noses,
brows, eyes, etc.—which he would reference and use in different combina-
tions to create interesting characters in his paintings. (Michalko, 1991)
Example 2
Reverse Deconstruction Brainstorming
David Perry (founder of Shiny Entertainment and lead designer on dozens of
games including the Earthworm Jim titles, and The Matrix titles) uses a vari-
ation on the above which he calls “reverse deconstruction brainstorming”.
(Perry, 2006)
His process works like this:
● Step 1—Choose the area you want to innovate in.
● Step 2—Deconstruct the area from macro to micro by generating ever more
granular lists of elements. (Note: keep and update these lists for career long
use)
● Step 3—Scan list and combine for inspiration whenever you want to inno-
vate in the selected area.
In David’s experience this process consistently enables teams to generate
dozens of fresh ideas. A key to making it work is developing the granular lists.
David shares his on his website at www.dpfiles.com.
Let’s try it. For Step 1, the area we want to innovate in is: “a weapon never
before seen in a video game”. For Step 2, let’s reference one of David’s granular
lists. Here is his list entitled “Ways to Die” (Perry, 2007)
Direct Causes of Death (in alphabetical order) Indirect Causes (in alphabetical order)
126
9.6 FEATURE DESIGN: MORPHOLOGICAL ANALYSIS IS ANALYTICAL CREATIVITY IN ACTION
Direct Causes of Death (in alphabetical order) Indirect Causes (in alphabetical order)
Capture and Slow Death Boredom
Chemical Weapons Broken Heart
Conversion/Transformation Deafness
Critical Hit Hallucination
Crushing Death Insanity
Dehydration Loyalty Death
Deletion Neglect
Disappearance Sleep
Disease (bacterial, viral, plague) Smell
Disembowelment Stupidity (Darwin Awards)
Drowning (in any liquid) Taste (indirect death)
Elemental Causes/Natural Disasters Touch
Execution/Assassination
Exhaustion
Explosion
Freezing to Death
Friendly Fire
G-Forces
Gravity
Grinding Death
Impact
Impalement
Imploding
Internal Invader
Laughing to Death
Life Force Removed
Liquefaction
Magic & Supernatural Causes
Mechanical Failure or Malfunction
Medical Failure
Melting or Vaporization
Metaphysical Revelation
Fooled You!
On his website, hyperlinked under each of the categories above, David has written
more granular descriptions. For instance, here are the granular descriptions under
“Animal Attack”.
Animal Attack
Defined as: a deadly attack by any non-sentient creature.
● Attacked by killer bees (or other stinging insects)
● Eaten alive by army ants or by piranhas or sharks
● Bitten by a deadly spider
127
CHAPTER NINE • MASTER METRICS: THE SCIENCE BEHIND THE ART OF GAME DESIGN
9.7 Take-Aways
1. Morphological Analysis is a tool which can generate a ton of ideas very quickly.
2. It provides an analytical approach to creativity.
3. Most importantly, it is a repeatable tool unconcerned with muse or inspiration.
128
9.8 MECHANICS DESIGN—QUANTIFY TYPES OF EMOTIONS EVOKED—OFFER THREE OR MORE
Developers can craft their games to evoke three or more different types of emo-
tions by including a targeted mix of game mechanics and types of choices. This is
because game mechanics tend evoke specific emotions. Nicole maps different game
elements to the emotions as follows:
1. Fiero
Fiero is an Italian word that means roughly “personal triumph”. Mechanics that
involve mastery tend to evoke Fiero. Other game elements that evoke this emo-
tion include: goals, challenge, obstacles, strategy, power-ups, puzzles, score, lev-
els, and monsters.
2. Curiosity
Curiosity implies imagination, surprise, wonder, and awe. Game elements that
tend to evoke this category of emotion include: iconic situations, exploration,
experimentation, fooling around, role-playing, ambiguity, details, fantasy, and
uniqueness.
3. Relaxation/Excitement
Game elements that tend to evoke this category of emotion include: repetition,
rhythm, completion, collection, meditation, working out, simulation, and study.
4. Amusement
Nicole says choices with other people increase emotions and social bonds.
Game elements that tend to evoke amusement include: cooperation, person-to-
person competition, communication, performance, spectacle, characters, and
personalization.
Given the framework above let’s look at some case studies of commercially suc-
cessful games and show the emotions they evoke. The ratings in each category are
judgments by the author.
129
CHAPTER NINE • MASTER METRICS: THE SCIENCE BEHIND THE ART OF GAME DESIGN
9.9 Take-Aways
1. To reiterate: the take-away here is that games that evoke three or more of the
emotions: Fiero, Curiosity, Relaxation/Excitement, and/or Amusement tend to
do better in the marketplace.
2. Nicole Lazzaro believes this is because those titles offer more options for the
player to feel.
For more information about Nicole Lazzaro’s research on player experience see
her website at http://www.xeodesign.com (Editors note: see also Chapter 20).
130
9.12 LEVEL DESIGN—CRAFT A BALANCED MIX OF ACTIVITIES USING “TIME SPENT” REPORTS
9.11 Take-Aways
1. Custom instrumentation software can be written to measure how play testers are
playing a game.
2. Keeping user testers independent from developers helps ensure unbiased analy-
sis of play test data.
131
CHAPTER NINE • MASTER METRICS: THE SCIENCE BEHIND THE ART OF GAME DESIGN
Version 1 Version 2
Mass Effect—Noveria Time Spent Report (avg) Mass Effect – Noveria Time Spent Report (avg)
Viewing Cinematics: 4 minutes Viewing Cinematics: 5 minutes
Engaged in Combat: 22 minutes Engaged in Combat: 32 minutes
Engaged in Conversation: 28 minutes Engaged in Conversation: 20 minutes
Viewing Maps/Journals: 19 minutes Viewing Maps/Journals: 8 minutes
Walking: 124 minutes Walking: 57 minutes
Driving Vehicles: 11 minutes Driving Vehicles: 10 minutes
Total: 207 minutes Total: 209 minutes
To be clear: the minutes in the report refer to the average number of minutes
playtesters spent while playing the level. Notice in Version 1 that the play testers
were in Combat for 22 minutes and Conversation for 28 minutes. BioWare develop-
ers looked at this and decided that this time spent breakdown did not feel right for
Noveria because it was supposed to create an exciting and action-filled experience
in the Mass Effect story. They decided they needed to increase the number of min-
utes spent in Combat (e.g. more action) and decrease the number of minutes spent
in Conversation (e.g. less talking). To achieve this they tweaked the level by adding
more combat obstacles and editing out some conversation nodes. See the results
in Version 2: Combat increased to 32 minutes and Conversation decreased to 20
minutes. The developers made a creative judgment that Version 2 felt right and the
game shipped with this version of the level.
An important point here: the time spent metric provided analytical data that the
developers used to make creative judgments. Some people wonder why, if game
tweaks always come down to a creative judgment, developers need analytics at all.
In other words, why don’t they scrap the analytics and just observe testers playing?
Why not just make creative judgments based on how playtesters react? BioWare’s
Iain Stevens-Guille provides two reasons why analytic data is valuable:
132
9.14 LEVEL DESIGN—TRACK ENGAGEMENT WITH BIO-SENSORS TO QUANTIFY PLAYER EXPERIENCE
whole level) or a bug in the game (which would skew them to think negative
things about the whole level).
2. There’s a difference between designer perception of what happened (to a play-
tester) and the actual numbers. Designer perceptions are frequently skewed
because they become infatuated with their designs and lose objectivity. For
example a designer might be in love with a particular scene and dialog exchange
that he has crafted. He may not be able to accurately perceive that it is the dia-
log he has written that is making the experience drag.
Iain says that numbers from a time spent report provide objective information to
help a team assess what is happening in a level. Numbers are particularly useful in
a collaborative environment when different team members have different percep-
tions of the problem.
9.13 Take-Away
“One measurement is worth fifty expert opinions.”
Howard Sutherland
This technique comes from the San Francisco-based technology firm EmSense, Inc.
EmSense has created cutting edge biosensor technology that measures a player’s
physical response to game experiences. They have a headset that measures brain-
waves and other sensors that measure temperature, breath, heart rate, physical
motion, and eye movement.
They take the data that come in from these sensors and derive biometric
responses. Here are three of multiple things EmSense tracks:
133
CHAPTER NINE • MASTER METRICS: THE SCIENCE BEHIND THE ART OF GAME DESIGN
The rationale for using this technology versus just asking players what they think
is that this bio-sensor data is precise (and can map to specific places in a game
level) whereas humans often can’t explain their thinking and behavior in words.
Humans tend to focus on a bad or good event and that event skews their percep-
tion of the whole play experience. This is similar to what we learned in the BioWare
example above.
EmSense works with many different game companies, including large publish-
ers like THQ, testing game levels. In addition EmSense has tested the top fifty con-
sole games in the market on their own in order to build a database of comparative
media. And they are continuously adding data from more games to the database.
Here is a description of the physiological data generated by a boss battle in
Zelda: Twilight Princess. This pattern is typical for battle sequences.
134
9.16 CONTROL DESIGN—SIMPLIFY CONTROLS THROUGH MEASURED COMPLEXITY MODELS
9.15 Take–Aways
1. Designers can use biosensor data from their games to track player engagement.
2. Designers can tune the length of time between high and low engagement events
to create a satisfying flow experience for players.
135
CHAPTER NINE • MASTER METRICS: THE SCIENCE BEHIND THE ART OF GAME DESIGN
Tetris Half-Life
CD 1.5 CD 7
Calculated as: movement on 1 dimension .5 Calculated as: movement in 2 dimensions
for the embedded action rotate (left-right, in-out) view in 2 dimensions
(left-right, up-down) .5 for shooting, .5
for jumping .5 for ducking, .5 for weapon
change, 1 for swimming.
Note that there are minor additional control
options that have been subjectively ignored.
136
9.16 CONTROL DESIGN—SIMPLIFY CONTROLS THROUGH MEASURED COMPLEXITY MODELS
Notice from the examples that calculating CD involves some degree of subjectiv-
ity. This should not skew the metric as long as when a developer calculates CD for
different games in a genre the subjective rules are utilized consistently.
Here is a graph that shows the maximum Control Dimensionality for controllers
on consoles going back to the Atari 2600. The Atari 2600 has a joystick (2) and one
button (.25). A modern controller like the Xbox 360 has two joysticks (4) plus
ten buttons (not counting start, back). The only thing the graph is really showing is
how controller complexity has steadily increased over time. Designers can choose to
layer on more complexity by doubling or tripling up on buttons, or they can use less.
Max Controller Dimensionality
10.00
9.00
8.00
7.00
6.00
5.00
4.00
3.00
2.00
1.00
0.00
k)
ES
ES
64
st
ox
3
al
ck
is
PS
ub
36
PS
ca
tic
Xb
in
N
es
N
SN
ho
ec
ys
am
rig
en
ls
am
(jo
(o
re
G
ua
G
1
0
(d
PS
60
1
i2
PS
ar
At
Finally here is a table that lists Activision game titles with their Control
Dimensionality ratings. The important thing to examine is the far right column. It
shows the CD differential for each title from the competing titles in the same genre.
The number is expressed in standard deviations. Notice that the first ten titles on
the list are labeled “high-risk.” This means high-risk because the CD is higher for
those titles in relation to competing titles in the same genre. The bottom two titles
are labeled as “low risk” because their CD is lower in relation to competing titles in
the same genre. Activision se eks to keep the CD for their titles low in relation to the
market because buyers prefer simpler controls.
Activision Titles Sorted by Differential from Competing Products (in Control Dimensionality)
Risk Factor Title Control Genre Differential
with Control Dimensionality from Competing
Scheme Products in
Genre (in
Standard
Deviations)
137
CHAPTER NINE • MASTER METRICS: THE SCIENCE BEHIND THE ART OF GAME DESIGN
9.17 Take-Aways
1. Control Dimensionality is a metric that helps developers understand the com-
plexity of their control schemes in relation to competing titles.
2. In general, developers should strive for the simplest control scheme possible.
138
9.20 REFERENCES
9.19 Take-Aways
1. Embrace rapid early prototyping, playtesting, and revision throughout
development
2. Apply metrics to player experience to help make informed creative judgments
Special Thanks
My USC colleague Dan Arey who conceived of the concept of studying metric-
based design and conducted multiple developer interviews for our joint talk
on the topic at the 2008 Game Developers Conference.
9.20 References
“About Metascores” Metacritic.com 2008, http://www.metacritic.com/about/scoring.shtml
EmSense, Inc., Corporate presentation, 2008
Michalko, M. (1991). Thinkertoys. Berkeley: Ten Speed Press.
Perry, D. (2006). David Perry’s Game Designer’s Reference Guide, http://www.dpfiles.com/
dpfileswiki/index.php?titleDAVID_PERRY%27S_GAME_DESIGN_REFERENCE_GUIDE
139
CHAPTER NINE • MASTER METRICS: THE SCIENCE BEHIND THE ART OF GAME DESIGN
Lazzaro, N. (2007). The 4 Most Important Emotions of Game Design, Game Developer’s
Conference 2007 presentation
Bateman, C., & Boon, R. (2006). 21st Century Game Design. Hingham, Massachusetts: Charles
River Media, Inc.
Fullerton, T., Swain, C., & Hoffman, S. (2004). Game Design Workshop: Designing,
Prototyping, and Playtesting Games. San Francisco, CA: CMP Books.
140
CHAPTER
TEN
The Strange Case of
the Casual Gamer
Nick Fortugno is a co-founder and President of Rebel
Monkey, a NYC-based casual game studio. Before Rebel
Monkey, Fortugno was the director of game design at
gameLab, where he was a designer, writer, and project
manager on dozens of commercial and serious games,
and served as lead designer on the downloadable block-
buster Diner Dash and the award-winning serious game
Ayiti: The Cost of Life. Nick teaches game design and
interactive narrative design at Parsons The New School
of Design, and has participated in the construction of the school’s game design
curiculuum. Nick is also a co-founder of the Come Out and Play street games festi-
val hosted in New York City and Amsterdam.
Casual games present a new frontier for the game designer. The success of casual
games across a variety of platforms—from the standard PC-downloadable format
to the infiltration of casual games into hardcore consoles such as XBox 360 to the
incorporation of casual game thinking into new console design (in the case of the
Wii)—made it clear that a well-made casual game could likely be played by peo-
ple of all ages, all genders, and all levels of game play experience. The opportunity
to design a game that could be played by anyone, another Tetris, is an incredibly
appealing one, but also one fraught with design challenges, even for designers flu-
ent in making games for more restricted audiences.
Of course, the term “casual games” is one used in different ways by a variety of
people from every side of the game industry, so clarification is in order to determine
exactly what we are discussing. There are parts of the game world that limit casual
games merely to games available online, while other commentators will point to
short RPGs and shooters. What makes something a casual game? Is it a question of
medium, of play length, of content, or market reality? All of these things are factors
143
CHAPTER TEN • THE STRANGE CASE OF THE CASUAL GAMER
in the design of a casual game, but when thinking about design, which is the most
critical?
In my own design practice, I always begin by thinking about the intended con-
sumer of the design. The intended (or emergent, as we will see shortly) audience
brings with it a host of expectations and experiences which inform what they
consider intuitive, challenging, and fun. So questions of physical interface, core
mechanics, and overall interactive design stem first from who the player is and what
they desire from a game.
But the primacy of audience in the design process makes casual games a unique
challenge in the game industry, because rather than designing a game for an audience
reared on a particular set of game experiences, you must design a game for every-
one. The important thing about designing for everyone is that you are not designing
a game for a particular class of dedicated gamer. I will argue that dedicated gamers
have a set of metaskills that transcend the particular games that they enjoy and in
which they excel. Gamers are something with which game designers are quite familiar.
The critical thing to consider in the casual game is that significant portions of
the audience, and the majority of that audience in the billion-dollar downloadable
game market, are non-gamers. The demographics of this non-gaming segment of
the downloadable market are women in their forties and fifties; that segment makes
up something from 50 to 70 percent of the purchasing audience of downloadable
games. Given the typical consumer of games since the inception of digital games
(young and male), it is an accurate assessment that many casual game players have
little or no experience with digital games and that, independent of games played
with children, have limited exposure to games at all as an adult.
While PC downloadable games make up a large portion of what we think of as
casual games, we see similar audience segments (although not in as large numbers)
in other platforms where casual games appear. A significant number of mobile game
users are women focused exclusively on casual games. Microsoft strategy on the
development of the Live Arcade system specifically targets how the primary gamer
of the house buys the Xbox 360 console, but that non-game playing family members
will also use the console for casual game play through the Arcade downloadable
service. And perhaps most strikingly of all, the Nintendo Wii has become a gaming
device for retirement centers, and people in their 60s, 70s and 80s, perhaps the least
digitally savvy age demographic, are active participants in Wii Bowling tournaments
and Wii Golf games. Across multiple platforms, casual games have been a gateway
for non-gamers to engage in digital play.
The question then for usability in casual games is what design considerations and
constraints arise given the relative lack of game experience of the significant portion
of that audience. Considering the expectations and play history of that portion of the
audience, there are a number of important parameters to the design of casual games:
● Casual game players do not approach games with the same skill sets as hardcore
gamers, and thus have different levels of self-motivated exploration and patience
for failure.
144
10.1 HARDCORE GAMERS ARE FROM MARS…
● The interfaces of casual games are less informed by prior games than by other
digital and real-world experiences.
● Information display and feedback that casual games provide require an extremely
high level of clarity.
Each of these points is a key component of successful casual game design, and
each is explored in more detail below.
145
CHAPTER TEN • THE STRANGE CASE OF THE CASUAL GAMER
More complex games offer more challenging and deeper learning environments, but
the basic learning principle exists regardless of complexity. Nonetheless, there are two
significant ways in which hardcore games differ from casual games in terms of learn-
ing, and this set of differences informs the difference in these two-player groups, and
thus the differences in usability that must be considered in these design processes.
First, the primary teaching technique of hardcore games is trial and error disci-
plined by failure. Taking the Mario example a second time, the way a player learns
how to defeat a Hammer Brother is by trying to do it. Most likely, the player fails the
first few times to figure out how to avoid the hammers, and Mario dies. The player
is expected to try the attack multiple times, and fail multiple times, before figuring
out the proper approach. This pattern is repeated for every enemy and every board,
which means in a typical Mario game, Mario can die many, many times. Mario is
not unique in this regard, and Mario is nowhere near the most complex or unfor-
giving game one could play. This means that a typical hardcore gamer has played
through a number of tough games, and has become accustomed to harsh penalty as
a component of game learning.
This conditioning to failure as a part of the learning process leads to the second
significant difference: the hardcore tolerance for frustration. Since failure is such an
integral part of so many digital games, these games both select for and train gamers
who accept and enjoy the challenge of threat of loss as a result of experimentation.
This breeds a kind of player who is interested in exploration and experimentation,
who is patient through periods of confusion or difficulty, and who is often willing to
struggle with a clumsy control scheme or unclear interface in a play experience. In
fact, part of the fun is that very struggle.
We can sum this up with a series of observations about what a history of play
creates in a hardcore and core base:
● Gamers are regularly tasked with learning complex, unforgiving systems that
require exploration and repeated experimentation for victory.
● Playing many of these games exposes players to a number of gaming conventions
in terms of physical interfaces and information displays.
● In addition to this specific game knowledge, players with this background have
been trained to experiment with control schemes and explore the geographies
and world laws of these spaces. This is understood to be a core part of the expe-
rience of playing a game.
● Alongside this expectation of experimentation is an expectation of failure. Players
with a good deal of game experience tend to be more patient with failure and
more tolerant of frustration and struggle.
● While this kind of player conditioning allows for an increase in the amount of
complexity and difficulty a game can have, it also can allow a game to get away
with significantly less intuitive and sometimes even broken interactive elements.
Players may tolerate a certain degree of faulty interactive design as part of the
challenge of the experience.
146
10.2 CASUAL GAMERS ARE FROM VENUS
147
CHAPTER TEN • THE STRANGE CASE OF THE CASUAL GAMER
internet perspective, the idea that a game would present frustrating obstacles and
confusing interfaces would not be acceptable as a play experience. Instead, the very
struggle that a more hardcore gamer found to be enjoyable would instead be seen
as exasperating, off-putting, or simply too difficult. Frequently dying, particularly in
the early part of the game, would be a flaw in the game experience rather than an
incentive to try harder.
This different set of expectations would lead to different set of design crite-
ria. The games would be significantly less complex and less punishing. But more
important for the question of usability, the control schemes, feedback structures,
and information architecture would all be designed to create a more intuitive sys-
tem. The game would still be a learning environment, but the learning methodol-
ogy would be less open-ended and more guided. The combination of these factors
would lead to shorter, simpler games that were more easily accessible to people
who had not played digital games before—in other words, casual gamers.
As the audience for casual games continued to grow and more casual games
were developed, a set of parameters around casual game design and usability in
casual games began to emerge. These usability issues cluster around a few different
topics: game control schemes, user feedback, and game instruction.
148
10.3 GAME CONTROL SCHEMES: FAMILIAR INTERACTION
they would gravitate to games that used the control scheme they used for that
web browsing. By doing so, casual games avoided the learning curve that the con-
trollers in other games regularly face: just figuring out what one of a multitude of
buttons does.
That said, the use of the mouse in casual games takes a variety of forms. In
games such as Bejeweled and Collapse, the mouse is used in a fashion typical to gen-
eral internet use; the left mouse button interacts with, by grabbing or deleting, the
object the mouse is over. In other games such as Ricochet: Lost Worlds or Feeding
Frenzy, an in-game object is mapped to the mouse’s movement, so that the paddle
and the fish (respectively) move in a similar way to the cursor. Click management
style games including Diner Dash and Cake Mania use the player’s left mouse but-
ton click to determine the position an avatar will move to. Still other games, such as
Lemonade Tycoon or Fairy Godmother Tycoon, are about menu navigation and the
mouse is used in an entirely typical way, selecting an object from a list. These are a
handful of kinds of interactivity seen in PC downloadable games, but almost all of
these games are exclusively left mouse button play.
It is interesting to note the dominance of the left mouse button as the sole con-
trol even when that style of interactivity is less efficient than other methods. For
example, the game Turtle Odyssey is a game of the platformer genre—the player
controls an avatar that needs to navigate a two-dimensional space by jumping over
holes and defeating enemies (often by jumping on them). Turtle Odyssey is fun-
damentally identical to Mario games and the literally hundreds of platformers that
have appeared since, almost all of which relied on a directional pad control for
movement. The directional pad is a good control for these kinds of games, because
precise position of the avatar is often important, and a directional pad can, in skilled
hands, provide a player with the ability to make small tapping movements for fine
control. Nonetheless, despite the obvious utility of arrow keys to this kind of play,
Turtle Odyssey also includes a method of playing the game using only the mouse.
Even though the use of the mouse in this game is awkward and players would have
more fun with the game if they were forced to use the arrow keys, the developers
of the game are attempting to broaden the game’s appeal by sticking to the physical
interface that its player base understands.
Of course, there are a handful of successful games in the downloadable
market that deviate from the left mouse button formula. The Super Granny brand
is another set of downloadable platformers that force players to use the arrow keys
to play. There are also a few successful games that use the right mouse button. The
genre of inlay games, in which players solve tangram-like puzzles by using pieces
that appear on a conveyor belt, have a now standardized approach to interaction:
left-mouse button to pick up and place a shape, right mouse button to rotate the
shape. However, even successful games that attempt to use the right mouse button
sometimes fall into the same interactive trap. A classic example is the puzzle game
Zuma. In Zuma, the player can use the right mouse button to switch the current
ball with the next ball in the cue. But despite the fact that the game tutorial, help
screens, and game tips explicitly detail this functionality, many Zuma players never
149
CHAPTER TEN • THE STRANGE CASE OF THE CASUAL GAMER
use, or in some cases discover, this feature. So even games with elegant controls
that offer players real advantages can be ignored by casual gamers if they deviate
from the control schemes that players understand.
150
10.4 USER FEEDBACK: A DIFFERENT SENSE OF FAILURE
to reach an entirely new market of non-digital game players that were versed in the
real-world versions of these sports.
We can even point to games on hardcore systems that have had non-gamer
crossover which show a similar move to a more intuitive physical interface. Take the
recent success of Guitar Hero and Rock Band. Both of these rhythm-action games
are in many ways extremely similar to earlier Harmonix games Frequency and
Amplitude. All four games involve songs that are mapped into a series of colored
spots that are then organized into tracks that scroll towards the player. The player
must hit the button corresponding to the colored spot when the spot reaches a line
on the screen. Despite this nearly identical play, Frequency and Amplitude were
only played by hardcore gamers, whereas Guitar Hero and Rock Band enjoyed wide-
spread success with both core and casual gamers. The primary difference between
these games is that Frequency and Amplitude used the PS2 controller, while the
more successful games mapped the buttons on to a fake guitar and drum kit. The
move to a more intuitive physical interface made an identical gameplay accessible
to a whole new audience.
The point here is that beyond the simple lack of experience with the traditions
of game interactivity, non-gamers lack familiarity with even the most basic control
schemes that have become standard in game play. In the absence of this experi-
ence, non-gamers rely instead on the interactivities which they find intuitive: con-
trol methodologies from internet use and analogs to real-world actions. One key to
successful casual game design is remembering the experiences of the casual game
audience, and creating interactive systems that will be intuitive to that audience.
151
CHAPTER TEN • THE STRANGE CASE OF THE CASUAL GAMER
are the use of micro-rewards as encouragement and the emphasis on clear in-game
feedback.
152
10.4 USER FEEDBACK: A DIFFERENT SENSE OF FAILURE
saved only for longer-term victories such as level completion or the defeat of a par-
ticularly powerful enemy, and in some more immersive games (such as Bioshock or
Half-Life) even those moments are denied a powerful moment of reward feedback
in order to preserve narrative consistency. Failure, on the other hand, has dramatic
consequences. The game abruptly stops with an ominous or mocking tone while
new screen boldly declares “Game Over!” with the occasional accompaniment of
dripping blood or camera fade to black.
This is clearly a generalization about hardcore games; there are games that
are hardcore that offer more frequent rewards or less punishing failure. But many
hardcore games do follow this model, and in doing so, they teach players a lesson
about achievement. Short term achievement is something in which one should not
put much stock; reward is earned only for major and sustained victories. Failure is
noted and punished. This structure further conditions players in a hardcore mind-
set, where the game is perceived as a struggle to be completed in which rewards
must be earned.
Casual games, as exemplified by Peggle, take the opposite approach. Every minor
success is rewarded. Each time a brick is destroyed, a gem is eliminated, a customer
is served, or an object is found, the player earns points. The points are always dis-
played on screen along with reward sounds and graphics of the points being added
to the total or of the game object exploding from the screen. In a game such as
Diner Dash, points are given for every single action the player does, from seating a
customer to collecting their dirty plates, and in that sense every action is a positive
one. Casual games still have level-achievement awards with their own correspond-
ing effects, but the moment-to-moment play is filled with micro-rewards that regu-
larly congratulate the player on good (or, in some cases, lucky) play. On the other
hand, failure at the level of an individual action is often de-emphasized. Failure
sounds and effects are more minimal, and often losing a game is nothing more than
rapidly restarting the level.
A similar philosophy is at work in the greater reward structures of the game.
For example, the vast majority of casual games ask the player to move through a
series of separate levels to win. In games of this sort, the levels are almost always
short (approximately 3 to 6 minutes long) and displayed on a central map screen
as a kind of checklist. As the players complete levels, they are regularly brought
back to the level screen to see the list of accumulated stars. In many games, nota-
bly the click-management genre of games such as Diner Dash, each level also gives
the player an additional trophy for level completion: a new powerup, an additional
character, or even a purely visual improvement such as better wallpaper for the res-
taurant or a new outfit for the character. So just as there are an increasing number
of rewarded moments in an individual level, the short duration of levels, the use of
the level map as a literal progress checklist, and the often used unlocked trophies at
every level extend the focus on achievement to greater chunks of play.
By using these methods, casual games are attempting to diminish the sense of the
game as a frustrating struggle toward eventual victory. Instead, by spreading rewards
throughout the game experience, the game’s design changes the moment-to-moment
153
CHAPTER TEN • THE STRANGE CASE OF THE CASUAL GAMER
feel from one of struggle to one of regular achievement. The game is not a long
slog through obstacles; it is a series of short-term moves that the player regularly
achieves. And since the player is regularly achieving, the feeling is less one of irrita-
tion at not achieving goals than one of confidence at constant achievement.
Nonetheless, there is a final caveat. For all of the focus on achievement, the fun
of games still arises from challenge. If the game becomes too easy or the achieve-
ment too unrelated to what players actually did, players will become bored and dis-
enchanted. A good example of this problem can be seen in the original Arcadia
downloadable game. The game has an innovative mechanic in which the player
plays four simple mini-games at the same time. The player’s score is the product of
the scores of the four games. Of course, playing four games at the same time has a
steep learning curve, so the easy mode of the game was designed to be extremely
simple. Given that the four scores were multiplied, this meant that the player’s score
in the easy mode could be astronomical, reaching the quadrillions, on even a decent
game. While achieving scores this high was a thrill for the first group of players,
eventually everyone realized how easy it was to achieve to get a ridiculous score
and became embittered, claiming that the game was no fun because it was too easy.
So as much as achievement is important to casual gamers, it is equally important to
retain some degree of challenge and struggle to keep the game interesting.
154
10.4 USER FEEDBACK: A DIFFERENT SENSE OF FAILURE
in the level, we re-wrote the copy to emphasize the time limit and had a tutorial
bubble point out the timer. There was no change in player feedback. We then went
through several different designs of the timer. Still, players did not notice the timer.
We finally changed the timer to a clock and introduced text that popped up over
the screen to warn the player at two different points that time was running down
and that the level was almost over. It wasn’t until we took the drastic measure of
making the timer literal and effectively shouting warnings at the player that players
began to consistently recognize a core game element.
This experience taught me a number of lessons about casual game design. The
most important of these lessons was that hardcore gamers have training in handling
rapidly changing complex information that casual gamers do not. This means that in
the chaos of a reasonably complex game, a casual gamer will not necessarily remem-
ber information that is immediately evident, even if that information is essential to
successful play. As a result, casual games tend to have interfaces that dramatically
display the critical game elements. Huge time meters and giant flashing point dis-
plays are often the order of the day. The flipside of this strategy is to move as much
of the game information to the main play area as possible. Thought balloons of char-
acter desires, glowing squares to mark next moves, and (as in the case of Plantasia)
warnings and announcements flashed over the main game stage make it impossible
for the player to ignore what is going on. Casual games use these dramatic tech-
niques to ensure that the player does not miss a vital piece of information.
A similar design philosophy applies to the general user experience of a casual
game. For example, very few successful casual games allow players to navigate their
way through different game screens by their own whim. In games in which play-
ers do need to make decisions on different screens, the game usually controls the
movement between screens, determining which part of the game the player will
experience at a given time. A good example of this design can be seen in sim games
such as Insaniqarium or Fairy Godmother Tycoon. In both of these games, the player
uses a set of resources (pets and recipe ingredients) to navigate a level of play. The
screens in which the players choose their inventory are complex, containing many
different possible combinations. The games control this complexity by forcing the
player to make the decisions about the resources and then the use of the resources in
two distinct steps. Every turn, players first choose the resources they want, and then
they deploy the resources. Because players cannot go back and forth between these
screens at will during play (as they could in a more core sim such as SimCity),
the confusion of too much information is drastically reduced.
It is also important to point out that detailed instructions do not necessarily
make complexity more manageable. One way in which casual and hardcore gamers
are similar is that neither reads help text with any regularity. Assume immediately
that the help screen in a casual game is for a very small minority of the player base,
and that even in that small group, most are only looking at the help page for the
art. Tutorials are a much better solution, as they force the information in front of
the player, but if the tutorial is something that the player reads and clicks through,
there’s a very good chance that the player will click through it with only a skim, if
155
CHAPTER TEN • THE STRANGE CASE OF THE CASUAL GAMER
they bother to read it at all. The best tutorials for casual gamers are ones that lock
the game screen until the player performs the necessary action, but players will still
get frustrated if they are stuck in a limited tutorial for too long. It is always prefer-
able in a casual game to have something simple that requires less explanation than
to try for something complex and hope that text will make it clear.
The basic point is to remember that these are casual games. Casual game players
do not expect to keep of hundreds of minute variables in their minds at once, and
they focus most of their play attention on the most prominent game elements. This
means that casual games need to bring whatever information they require to convey
to that main stage and either deemphasize, compartmentalize or eliminate complex-
ity that cannot be displayed there.
156
10.5 THE RISE OF HARDCORE CASUAL GAMES
game. Hidden object games ask the player to look at a complicated, cluttered pic-
ture and find a series of particular objects. In the original downloadable version of
these games, I Spy, there was nothing more to the game than the above mechanic.
However, as more hidden object games appeared, additional elements began to be
added to the mix. Games such as Mystery Case Files and Mysteryville introduced
narratives to the mechanic, where the object finding became a vehicle for a mystery
story. These games also introduced other kinds of visual logic puzzle mechanics as
interstitial games, such as tile sliding puzzles or picture matching games. But the
most interesting evolution can be seen in the later games such as Azada, Dream
Chronicles, or Hidden Secrets: The Nightmare. In these games, a level will start with
a hidden object mechanic component where the player needs to find a handful of
objects that either change on mouse-over or that appear on a list. However, once
players find the objects, they can combine them to make new objects, or use them
on other objects in the environment to access new spaces. This play mechanic is
then combined with a series of simple logic puzzles, and success in these game
activities slowly unlocks a narrative. These games are more like Myst than I Spy;
amazingly, the hidden object genre is evolving into the old adventure game from
years ago! The fact that these games are among the highest-selling games in the
downloadable market means the audience has grown to the point where a fairly
complex (and perhaps even slightly hardcore) game genre is an acceptable choice.
At the time of this writing, these are all recent changes; only time will tell how far
down the adventure game road casual audiences are willing to go.
In addition to this evolution, the casual game space is also bearing more complex
game styles as successes. The game Build-a-lot is a more sophisticated sim than the
downloadable market has seen. In Build-a-lot, the player must build a set of houses
using seed money, and then use the rent from those properties to enhance and
expand their real-estate empire. The interface of the game resembles a more core
PC sim, with several tabs for different kinds of resources, and unlike the sims men-
tioned previously, Build-a-lot forces players to make purchases during the course of
their building play, so the player must weigh both the acquisition and deployment
of resources at the same time. This marks a significant increase in sophistication
from previous casual sims. Other games such as Tradewinds and Chocolatier have
brought a trading game mechanic to casual players. In both of these games, player
must pick up and trade different resources across a world map. Players must keep
track of several simultaneous quests, the ports where different commodities can be
found, and the markets where those goods can be sold for profit. The games require
players to retain information across several screens, each of which can have mul-
tiple displays. All three of these games require a greater tolerance for complexity
and information management than previous casual games, and thus demonstrate a
changing face of the casual game audience.
As the market is only seven years old, it’s hard to predict exactly how far cas-
ual games will evolve in complexity towards the levels of core and even hardcore
games. The arrival of casual games on platforms such as the Wii, Nintendo DS, and
XBox 360 certainly points to a bridge between these categories. And the success of
157
CHAPTER TEN • THE STRANGE CASE OF THE CASUAL GAMER
more sophisticated games, both as the evolution of early games and the arrival of
more core genres to the casual space, indicates that some of the “non-gamer” audi-
ence that makes up of the casual game market is becoming a type of gamer after all.
10.6 Conclusions
Casual games are at an exciting moment in their history. Casual games are present on
every game platform, and even in the hardcore console space, casual games rank
among the most successful products on sale. The audience for casual games is
growing each year, and the fact that these games are played by all kinds of gam-
ers, from the hardest core to the least experienced, means that a good casual game
could actually be played by people of all ages. The reach of casual games is by far
the widest of any part of the game market.
But as we have seen, the fact that casual games appeal to all kinds of players,
particularly non-gamers, creates a set of particular concerns and constraints that
influence the design of these games. Play cannot rely on the time-honed conven-
tions established in the rest of the game canon. This doesn’t simply mean that the
interactive conventions of hardcore games cannot be taken for granted; there are
also a series of tolerances and expectations that hardcore gamers share, but casual
gamers lack. This means that casual game design must first consider the least expe-
rienced players, and create systems and mechanics that are universally accepted.
Mechanics must be intuitive. Interfaces must be clear. Achievement is prized over
struggle. Above all, designers must create games that challenge players without con-
fusing or frustrating them, by using the devices, styles, and interactive languages
in which the players are fluent to come up with new manifestations of play and fun.
Of course, casual games will continue to evolve, and the more games these play-
ers try, the more refined their gaming skills will become and the more complex and
challenging system they will demand. It’s impossible to predict exactly what this
new breed of gamer will desire. Will today’s casual gamers gravitate towards the
core and hardcore genres that have already been established, or will they create a
whole new kind of “hardcore casual” experience? The evolution is inevitable, but
it’s up to today’s designers and players to determine what games we’ll be playing in
the next few years.
(Editors’ note: See the Interview in Chapter 11, for some thoughts from a cas-
ual games company about the merits user testing and their particular strategies for
when and how to do tests.)
158
CHAPTER
ELEVEN
Interviews about User
Testing Practices at
PlayFirst®
PlayFirst
PlayFirst is the leading publisher focused exclusively on casual games. The compa-
ny’s game portfolio includes the Diner Dash® series, Wedding Dash®, Chocolatier®,
and Dream Chronicles®.
We asked PlayFirst to give us perspectives from multiple team members about user
testing and how it works at their company, to provide readers with a multi-faceted
understanding of how one casual games company uses some of the methods
described in this book.
PlayFirst, Chocolatier, Diner Dash, Dream Chronicles, The NightshiftCode, Wedding Dash, and all
related titles, logos, and characters are trademarks & PlayFirst, Inc. PlayFirst, Diner Dash, Dream
Chronicles, and Wedding Dash are registered in the U.S.
159
CHAPTER ELEVEN • INTERVIEWS ABOUT USER TESTING PRACTICES AT PLAYFIRST®
Do you conduct usability studies of your games during production generally? What
kinds of studies? What about post-production? What kinds of studies there?
Each project uses the following means to handle customer feedback about our
games.
● Game Panel Testing—local users register to participate in short game play
sessions at our offices, in order to give input, evaluation, and comments on
in-progress games. These are one to one tests carried out between the game leads
and the end user.
● Third-Party Usability Testing—focus group participants attend a 30- to 60-minute
usability session moderated by an objective professional usability company. A
written analysis and discussion on player response and reaction is carried out.
● FirstPeek Testing Program—registered users of this program are sent a version of
each PlayFirst game that is near complete, evaluating it and providing feedback
through a survey response and measured game play metrics. Summary reports
of each set of feedback is compiled and used to make final decisions prior to a
game’s release to market.
● Website Forum—as each game launches, discussion typically ensues among
the active members of the PlayFirst website community, where game play reac-
tions and feedback on specific game features are shared. Game teams moni-
tor these forums for trends in customer feedback that would require attention.
Other means to solicit post-launch feedback come through our customer sup-
port department (reactive—typically used to respond to a technical problem) and
our marketing department (proactive—typically used to run surveys and solicit
response to a game feature).
What sorts of issues and opportunities have these sorts of studies allowed you to
uncover?
Depending upon the stage in the product’s development cycle, we can uncover a
wide variety of issues and/or opportunities. For example, during a usability study in
the earlier stages of development we might determine that a feature we designed for
the game needs to be improved, revised, or even removed. At a later stage in devel-
opment, for example during our beta testing, the focus of the feedback is typically
centered on how the game increases in difficulty from level to level and we’ll make
final adjustments based upon the trends shown by a large sample size of users.
● An example of the first type of feedback from our hit product Diner Dash would
be the noisy customers that were planned for the sequel in Diner Dash 2 as
another thing that Flo has to serve in her restaurant. During the study play-
ers reacted negatively to cell-phone yakking customers and did not know what
to do with them; yet players had the opposite reaction to noisy families. Even
though the two customer types worked essentially the same way, players loved
the families and immediately knew how to respond to them in the game. As a
160
AARON NORSTAD, SENIOR PRODUCER
direct result of XEODesign’s* study, PlayFirst swapped the order of these two
customer types so that the families appear right from the start to get players
excited about this new challenge in the game. XEODesign’s observations of these
subtle moment to moment player reactions to player actions are easily lost in a
post-play focus group or survey. Diner Dash’s one million paying customers is
further proof that emotion works well in casual games.
Would you recommend doing this to others in your genre space? Why so (or why
not)?
Usability is one of the most important parts of the game design and development
process for our company. PlayFirst has placed a great deal of emphasis on integrat-
ing best practices that focus on getting consumer feedback on our games. We would
fully endorse a well rounded program that solicits input in a number of different
formats from users at various stages of a project’s life cycle. The time and money
spent to listen to the people that buy our products is always well spent.
*Chapter 20 in this book outlines a theory of fun types by the CEO of XEODesign, Nicole Lazzaro.
161
CHAPTER ELEVEN • INTERVIEWS ABOUT USER TESTING PRACTICES AT PLAYFIRST®
Would you recommend doing this to other developers? Why so (or why not)?
Usability is invaluable. It’s not a question why or why not, but rather when. The
time, money, and resources spent on usability will easily pay off once your game
ships; provided you take the time to address design changes. PlayFirst has been able
to maximize the results and the effectiveness of usability by doing it early in alpha.
In alpha, we have a fairly solid and somewhat polished first hour of game play and
thus are able to go into usability and allow users to sit down and play through the
first five to ten levels, and even jump forward to test advanced features.
The great thing and perhaps the obvious advantage of usability is just dedicating
a full day to completely focus on usability issues. Informal play testing is great, and
the point isn’t that six or seven people performing a usability study will have all the
answers and uncover all of the problems, but the focused time of having the entire
design team altogether watching a professional third party team test and evaluate a
game is incredibly eye-opening.
Anything else you’d like to add about game usability and player testing in general?
Continuing from the point above, as a publisher who has a history of making great
games, it is second nature to push a developer to make changes with the goal
of improving the quality and ultimately increasing the sales potential of a game.
Conversely, it can be really difficult for designers to respond to design changes,
especially when they are so passionate about a project. Usability then provides the
opportunity for everyone to learn the unbiased feedback about what is working,
what needs to be tweaked, and what is just simply broken.
162
ANGEL INOKON, PRODUCER
mini-game, and simplifying the interface. The result: Wedding Dash is a mega-hit.
Wedding Dash’s success declares the importance of listening to our players and
working until it is right.
Would you recommend doing this to other developers? Why so (or why not)?
Usability helped take the blinders off. It’s easy for developers to get so close to a
game, they lose the player’s perspective. Also developers will get an intimate peek
into the minds of their audience. They will meet the core causal gamer that plays
15 hours a week and the newbie that fails the tutorial. At the right point in devel-
opment, it can be just the shock of reality you need to make your game a market
success.
Anything else you’d like to add about game usability and player testing in general?
A word to the wise, usability can be lethal to a project if not used properly. Enjoy
responsibly following a few of my usability Do’s:
● Do take time to really watch players and understand the context of the
comments.
● Do build time in your schedule to address changes.
● Do pick the changes that make sense for the vision of your game. Not every
problem will be solved.
● Do understand a test is just a snapshot. Your game is constantly changing, so
don’t wait too long to get it evaluated. Test it around alpha when the game is
functional.
163
CHAPTER
TWELVE
Interview with
Roppyaku Tsurumi,
Roppyaku Design
Interviewer:
Kenji Ono, game journalist,
IGDA, Japan
165
CHAPTER TWELVE • INTERVIEW WITH ROPPYAKU TSURUMI
which is synonymous among Japanese players for games that are “difficult to play.”
Exceptions to this rule are the highly rated Crash Bandicoot, Spyro the Dragon, Jak and
Daxter, and Ratchet & Clank series released by SCE, all of which have a high usability
evaluation and are often mistakenly regarded as games developed in Japan. The origi-
nal games themselves are excellent, and they were localized especially well for the
Japanese market. Compared to many games that are localized during the final stages
of development or after a product has already been released, these four series were
developed from the planning stage for release in the world market in close coopera-
tion with individuals in charge of localization from various regions. For the PS3 game
Ratchet & Clank Future (SCE, 2007), for example, data for fifteen languages was put
on one Blue-Ray Disc so that the game could be released to the world simultaneously.
Roppyaku Tsurumi was involved with the production of all four of these series
and supervised their localization as the producer for the Japanese language editions.
He was involved not only with the translation of data from English to Japanese, but
game details as well. As a localization specialist, Mr. Tsurumi was asked a number
of questions concerning the relationship between game localization and usability.
166
12.1 THE RELATIONSHIP BETWEEN LOCALIZATION AND USABILITY
Is that because the original expressions and representations do not convey their
intended meaning in different markets?
Exactly. They simply don’t work. I’ve been localizing games designed in Europe
and the United States for the Japanese market for twelve years now—I probably
should have mentioned that earlier (laughs)—and during this time I produced both
the Crash Bandicoot and Ratchet & Clank series, which garnered franchise recogni-
tion in Japan. I believe that our localization efforts contributed to this recognition.
Meanwhile, when I am playing other games I sometimes wonder why certain things
weren’t done during the localization stage.
At the time that was a rare exception among Western games for its high usability.
Even I was convinced for a long time that it was developed by a Japanese studio.
That’s because it was made after much discussion with the general producer, Mark
Cerny, and by taking into account the minutest details so that players would think it
was a Japanese game.
Japanese players have the impression that usability is the result of the Japanese game
industry’s accumulated know-how.
That’s true, but there’s more to it than that. What is understood as usability has
expanded overseas and in Japan as the two markets creatively influence one
another. But overseas makers will stop paying attention to the Japanese market if
it shrinks, which may result in a breakdown of emergent properties. So when look-
ing at games produced for different regions, it is important to keep in mind what
works, to identify those things that might be universally acceptable, and to continue
to make improvements.
167
CHAPTER TWELVE • INTERVIEW WITH ROPPYAKU TSURUMI
The importance of the Japanese market may have diminished from the perspective of
overseas publishers since 2003 during the end of the PS2’s run.
I agree. It’s generally believed that if overseas producers ask producers in Japan
about details concerning the vital points of the Japanese market, and if Japanese
producers provide the relevant information, better products will be made. But if
games are made without a consensus on what these vital points are, they’ll simply
arrive with orders to localize them for the Japanese market, which is extremely dif-
ficult. The result is usually a game lacking usability.
When a game is localized, some parts can be changed while others are not, right?
That’s right. Also, each console tends to have games that are received better by
certain demographics than others. Titles sold for the Japanese PlayStation3, for
example, tend to include elements favored by male users in their twenties and
thirties. Such games usually have sexually suggestive elements or appealing mate-
rial that makes them hot topics on the internet. Sophisticated male users won’t
dive into a game with a simple keyword like “adventure” in its title. But this sort
of thing can’t be changed with localization. It’s a game marketing problem, not a
usability problem.
So when I say that games should be localized after taking usability into consid-
eration, I mean those modifications that must be made to pure game contents, not
those tied to marketing, in order for a game to be accepted in Japan to the highest
degree. That’s why I am confident that users who are not convinced by marketing
elements such as packaging and advertisements will enjoy Ratchet & Clank Future
if they give it a try.
So the packaging and advertisements are not directly tied to pure game content?
The potential that a game might have for a certain target audience changes for each
region. This is true for things such as subject material, but the game value doesn’t
change much. Anyone can have fun dodging enemy bullets, shooting their gun, and
taking out the enemy. That’s why the first-person shooters that don’t sell well in Japan
are so much fun if you just give them a try. GoldenEye 007 (Nintendo, 1997) for the
Nintendo 64 was a hit in Japan, and the PC game Battlefield 1942 (EA, 2002) enjoyed
great reviews among gamers. So the game value is really not that different.
The interest factor and “enjoyability” are not the only things associated with
ge–musei (literally, game properties or “game-ness”), but how we deal with them
depends on the language and transition movies, the visual and verbal elements.
And we also consider tactile sensations as well. How to deal with such things is the
key to localization.
168
12.2 CAMERA ALGORITHM ADJUSTMENTS UNIQUE TO JAPAN
that sense, the pleasure derived simply from playing a game is not very different.
However, other factors vary greatly from market to market.
Games are more complicated now, so the importance of these “other factors” is
increasing.
That’s right. What we call ge–musei in Japanese is the reward offered for a play-
er’s discernment during game play. It’s the fundamental pleasure inherent to a
game. The structure doesn’t change, and the point is whether or not a reward is
recognized as a reward.
169
CHAPTER TWELVE • INTERVIEW WITH ROPPYAKU TSURUMI
Why is that?
I have no idea. But when I was watching people playing the game, I noticed some
who were not very good at certain operations and others who didn’t understand what
was going on. All of this depends on the individual, of course, but what causes such
difficulties is a mystery to me. All I really know is that, in general, Japanese players
are not very good at racing and flying games.
I for one felt that the space battle level in Ratchet & Clank Future was especially
difficult.
That’s just the sort of thing I’m talking about. The English version of Ratchet &
Clank Future has a challenging difficulty setting. It’s possible to clear the game, but
you need to work at it. The difficulty setting wasn’t changed for the Japanese ver-
sion. The adjustments made to the difficulty setting for Jak and Daxter 2 was not
good enough, and the racing parts were too difficult for Japanese players. I’ve heard
a lot of stories about players having an awful time trying to finish that game.
170
12.2 CAMERA ALGORITHM ADJUSTMENTS UNIQUE TO JAPAN
It sounds like this is something that requires further investigation. There must be
some reason for this handicap.
It’s impossible to make generalizations, because there are too many potential places
that should be fixed to make a game better. That’s why adjustments are made only
when they are needed. For example, many people lose track of the objectives in a
three-dimensional environment, including myself. This example might be too spe-
cific, but many people have trouble flying in reverse (by pressing on the “down”
key, flying vehicles turn up). This can be changed in the options settings, but some
people still can’t get used to it.
A major reason for this is that a lot of Japanese people tend to get motion sickness.
Now that’s an interesting topic. There are a number of reasons for motion sickness,
including lateral camera movements and how a camera follows a character. If the
camera moves according to the laws of inertia, the number of people who experience
motion sickness increases. People also feel sick when playing a game if the camera
moves too fast or too slow. People won’t get sick if the camera doesn’t follow char-
acter movements too fast, but a player will start to feel queasy if the camera doesn’t
move in a preferred manner. If the camera follows a character’s movements too
quickly, for example, the player will get dizzy and start to feel sick. These are the
two cases associated with character/camera movements, and the only way to solve
this problem is to determine the mean between both of these extremes.
A number of years ago it was necessary to make the default camera speed for
Japanese versions of foreign games obviously slower, because Japanese players
were that prone to feeling dizzy. The settings aren’t very different anymore, but the
camera speed for Japanese games is still slightly slower.
171
CHAPTER TWELVE • INTERVIEW WITH ROPPYAKU TSURUMI
when the image of the ground wavered as the camera continued to spin around little
by little, the resulting nausea was almost too much to endure. I am sure that people
who are prone to motion sickness will understand the sensation I am talking about.
So, camera movement became very important with the advent of three-
dimensional games. It’s fascinating that something that went unnoticed in two-
dimensional games suddenly became so important for three-dimensional games
and also differs from region to region.
That’s true. The first time this became an issue for us was back in 1998 when we
were developing Spyro the Dragon 1. The problem was how to make a camera for
the Japanese market that didn’t cause motion sickness. That’s why the Japanese
release was delayed until 1999, while the overseas version was already out in
1998. Also, people only started to feel sick when their character kept running, so
levels were designed in the Japanese editions to force characters to stop moving
from time to time.
We started placing hints throughout the game, like writing clues on bulletin
boards, at key locations. But bringing your character to a halt in the middle of a
game can be quite difficult. We later realized that placing road signs in the game to
prevent motion sickness actually resulted in a loss of interest in game play.
Since the camera provides the player with information during a game, would you
agree that it is fundamental to usability?
Yes. In the end, it’s all about how to present the player with visual information. To
be successful with this we need to make adjustments to games while keeping age,
172
12.3 TEXTUAL ELEMENTS AND THE INDISPENSABILITY OF USABILITY
player, and regional differences in mind. That’s really the first thing we are con-
fronted with during the localization process.
When I was playing Ratchet & Clank Future, I was so happy when I saw the large on-
screen text, but that it didn’t interfere with game play even on a small monitor. That
was such a bold innovation. But now that we are in the age of High Definition tech-
nology, it is getting difficult to read the on-screen text in more and more games.
You’re right. That’s so true.
Isn’t there an industry standard stating that text shouldn’t be smaller than a
certain size?
There are no industry standards. It’s all a matter of trial and error.
On-screen text started to get smaller after the release of the Xbox 360, right?
Yes. And games are now designed to be played on a TV connected with a High-
Definition Multimedia Interface (HDMI). The text in such games would be illegi-
ble on a Standard Definition (SD) TV if its screen layout were used. By the way, a
single DVD-ROM for Ratchet & Clank Future has data for 15 languages on it:
American English, British English, German, French, Italian, Korean, Spanish,
Portuguese, Dutch, Danish, Swedish, Finnish, Norwegian, Chinese, and Japanese.
When you change the language display setting on the PS3 console, the language
displayed in the game will change to match it.
That’s just what you would expect for PS3: worldwide implementation. But the time
and screen space required to convey the same message in Japanese and English is
different, right?
The spoken Japanese language conveys far less information than English in the
same amount of time. That’s why the movies for the transitional event scenes are so
difficult to localize. We need to choose words so that the length of the dialogue and
the movement of a character’s mouth match as much as possible while retaining
the same meaning. Scenes that are meant to be humorous, for example, continue to
be the most challenging. There are a lot of movies in the Ratchet series that include
jokes. But the Japanese and English languages are so different that it’s impossible to
include the same information. So we reduce the spoken parts or use different words
to create lines that convey the same meaning.
173
CHAPTER TWELVE • INTERVIEW WITH ROPPYAKU TSURUMI
Sometimes lines are added to the Japanese edition of a game even if there are
no spoken parts in the English version. So when Captain Quark spins around in the
Japanese edition, he says things like “kuru-rin” [something like “spin-spin ho!” in
English] or “yeah.” If we don’t do this, the humor will feel too flat.
What about the text? English requires so many more characters than Japanese to con-
vey the same meaning. Does this work OK with the same screen design?
Sure. Once things are decided for the English version, we can do whatever we want
to make the Japanese language work. Even though English requires more charac-
ters, alphabet letters are proportional (they have a variable width) while Japanese
fonts have a fixed width. Ultimately, however, the necessary space for text doesn’t
change that much in the Japanese and English editions of a game.
In Japan, the length of a message will change if Chinese characters are used. I was
surprised to learn that Ratchet & Clank Future was the first time they were used in a
Japanese game.
Good point! There is a limit to how many characters can be used so that a player
can catch the meaning of a message at a glance. It’s sometimes too cumbersome
to extrapolate meaning instantaneously if only kana characters (phonetic charac-
ters with no intrinsic meaning) are used. The Ratchet & Clank series only utilized
hiragana and katakana [the two phonetic kana alphabets indigenous to Japan],
but Chinese characters were integrated into the text for the first time with Ratchet
& Clank Future. It’s much more similar to the way English is displayed and read
on a screen this way. We tried to get around this problem in other ways before
finally settling on Chinese characters.
But doesn’t the use of Chinese characters limit the target age group, since the text
might be too difficult for younger children to read.
This actually wasn’t a problem. But it probably would have been best to provide
a phonetic reading guide for the Chinese characters. Take the word “weapon,” for
example. Younger children might not have learned the Chinese characters for the
word “weapon” at school yet, but they’re able to figure out the meaning from the
game’s context. We thought it might work if we used common words that would be
easier for children to recognize.
You can use Chinese characters for the user interface in a game, but I imagine this
wouldn’t work for messages that are integral to storytelling in RPG games.
Right. That’s exactly what I was trying to say.
174
12.3 TEXTUAL ELEMENTS AND THE INDISPENSABILITY OF USABILITY
well, but this really isn’t a problem since the characters are also speaking their lines.
I don’t think this would have worked at all if the characters didn’t talk. Writing
intermixed with Chinese characters and kana is probably too difficult for younger
primary school children, but it’s not a problem for kids in higher grades. Students
learn 1,006 Chinese characters at primary school in Japan, but there was a time
when we were unsure how to use characters they hadn’t studied at school yet in
games meant for children. We now know that this really isn’t a problem.
I understand that Japanese game developers are debating how to design user
interfaces that include text for Japanese games that are planned for release overseas.
Since it requires more words in English to say the same thing in Japanese, a text
box that is used to convey something in Japanese might not be able to contain the
English equivalent. As a result, there are many cases in which the Japanese char-
acters are intentionally made smaller to ensure enough space for the English text
when developing the Japanese edition of a game. But when this is displayed on an
SD television, the Japanese is difficult to read. Some Japanese players feel that it’s
unreasonable that there is such small text in a text box with plenty of extra space.
One way to solve this would be to change the design of the user interface for the over-
seas versions. The design of the message windows for Dragon Quest VIII: Journey of
the Cursed King (Square Enix Co., Ltd., PS2, 2004), for example, were modified for
the overseas version, right?
That’s not a simple task. Also, we didn’t do that for Ratchet & Clank Future since
we had to master it for a simultaneous world-wide release. But we had already
taken things such as the user interface into consideration, so it wasn’t necessary to
refashion them later on.
Different administrative game designers are in charge of the field, battle, and item
management scenes for RPG games, so the menu designs are not similar, which in
some cases leads to reduced usability. Recently some teams have a space allocated for
a game designer who specializes in user interfaces.
Whether or not there is a position on a team for that doesn’t really matter as long as
there is a step in the development process for looking into this sort of thing. We started
doing this for the Ratchet & Clank series with the first game. Insomniac Games actually
have data managers and programmers that specialize in localization. They wouldn’t
be able to make all those titles for more than ten different regions if they didn’t. Such
localization specialists are, of course, integrated into the development pipeline.
175
CHAPTER TWELVE • INTERVIEW WITH ROPPYAKU TSURUMI
I’m surprised that there is a specialist team in the development studio, because nor-
mally the overseas version of a game is not made until after the domestic version is
finished. And sometimes a publisher will just buy the rights to a game and do the
localization themselves.
That’s not enough to work with in our case. We’ve been doing localization for more
than 10 years now, and know what we are doing.
176
12.4 HOW THE RIGHT OR WRONG NAMES CAN GREATLY INFLUENCES USABILITY
level to the English “Rip ya a new one” since it included a play on the words “bee”
(hachi) and “eight” (hachi).
So sounds can viscerally convey meaning. I’m thinking specifically of the Dragon
Quest series. The basic healing spell for this game is “hoimi” and the stronger heal-
ing spell is “behoimi,” which helps to suggest a sense of group unity.
Exactly. That’s just the sort of thing we need to change for Japanese editions.
Especially when a player gets a weapon and it’s indicated by a movie or message.
If a player doesn’t know what kind of weapon it is, the sense of achievement will
be lessened. And that’s not what the designer wants to happen. The player’s motive
should not be misdirected during a game, which is why we create Japanese name
categories for things.
But we won’t change a name if the English word conveys the same meaning and
image to the Japanese player. There are two ways that people receive images from
sounds; one is universal, and the other is region specific. Choosing a name that will
evoke nostalgic emotions when it is heard or seen is extremely important in Japan,
such as the names of weapons in anime programs that kids grow up watching.
177
CHAPTER TWELVE • INTERVIEW WITH ROPPYAKU TSURUMI
Kind of like the fire spell names gira and mera for the Dragon Quest series? (In
Japanese, gira and mera are onomatopoetic for, respectively, reflecting light and
burning flames.
That very sort of thing. If we directly translate the English proper nouns, the impres-
sion of the game will change. In story- and character-driven games, players are con-
stantly mentally mapping information such as the relationships between character
and place names while proceeding through the game. As a player progresses through
a game, they develop the desire to fill in those blank areas of their mental map. And
it’s because players have this sort of desire that they are able to recognize hints that
will help them complete their mental map when they appear. Because the mental
maps of Japanese- and English-speaking players differ, the same hints will not be
recognized by a Japanese player when they play the English edition of a game.
Is this sort of technique commonly used in Europe and the United States as well?
Yes.
So, because I’m Japanese, I would have trouble playing the English edition of a game
because I wouldn’t be able to recognize hints when they appear?
That’s right. The experience is totally different. You wouldn’t be able to compre-
hend a game based on the Christian culture, for example, unless you were rooted
in that culture yourself. You wouldn’t find the subject matter stimulating at all, and
you wouldn’t be able to recognize important game items even if they were indicated
to you as such.
To put it another way, what we do is modify English words into proper nouns
that evoke subliminal images rooted in language and culture that are easy for play-
ers to internalize and incorporate into their mental maps. Hey, I think I stated that
quite nicely.
In that sense, not explaining everything is an essential element for enhancing usability.
Exactly. A player might understand a provided explanation, but they would never
pay it any attention. It’s necessary to alter names so that they are understood viscer-
ally. This isn’t just important for the game-player dichotomy, but for information
exchange amongst players as well.
178
12.5 USABILITY AND THE PLAYSTATION 3
In the Ratchet & Clank series, the color design work was enhanced so that things
appeared very realistic, such as explosion effects, for example. But the color satura-
tion of the entire screen was diminished as a result, and some Japanese players felt
that this made it difficult to look at the screen. Does this sort of thing have a major
effect on the usability of a game?
Definitely.
I don’t think this is limited to the Ratchet series, however, because many Japanese
players tend to feel that the screen is too dark in Western games. How was this issue
changed for the PS3 Ratchet & Clank Future game?
I thought things looked a lot better once the PS3 came out. And since the screen
resolution has increased so much with High Definition, it is now possible to see the
smallest details. HD can display things far in the distance, and the number of colors
and color resolution has increased. It’s therefore now possible to design the screen
so that explosions stand out really well against a realistic background. I think the
PS3 Ratchet combines color richness and effects to a degree far higher than the PS2
Ratchet. But this is outside of my particular field of specialization.
179
CHAPTER TWELVE • INTERVIEW WITH ROPPYAKU TSURUMI
I see.
You can notice the difference with an SD television as well, but the visuals look way
better in High Definition.
Is there any difference between the two systems in terms of sound quality?
The Ratchet series has been employing the 5.1-channel surround system for the
PS2 in order to emphasize the sound. Since Ratchet & Clank Future is for the PS3,
it includes a dramatically greater amount of sound information, which is why the
sound effects are so much more enjoyable for players to hear. If you’ve ever heard
the sound effect that is played when Ratchet is collecting bolts, then you know what
I mean. Sound helps to motivate players as well. I didn’t notice this when I played
the game using standard television speakers, but it was quite clear when I used 5.1
surround. I was totally enraptured by the sound.
Better hardware came make such things possible, which is extremely beneficial
for usability. Don’t you think the color for Ratchet & Clank Future is richer than the
original PS2 versions?
The backgrounds are indeed fantastic. The amount of information looks immense.
And I was glad that the text and icons were big enough to make the user interface
easy to read even with an SD television.
The text has to be large. The artists at Insomniac Games produce really high quality
work. It’s very impressive.
So when you’re making a single master disk, you take into account the prospective
ratio of players that have HD and SD televisions?
We can’t, really, because we’re in the middle of a transition period. That’s why
games are made to work with both. But when we’re making a game for HDTV, we
do keep in mind that there is a size difference between American and Japanese
living rooms. Don’t you think the size of a living room greatly affects game play?
I bought Wii Fit (2007, Wii, NINTENDO), and unfortunately the TV is too close so I
have to play it with the TV down low and at an angle to where I’m standing.
Localizers take player environments into consideration and make requests based on
this information to the developing team. How the main developing team of a project
accepts those requests for localization and comes up with a project plan is very
important. In that sense, Ratchet & Clank Future was well thought-out from the start.
It gets to the point where the main project team can begin to predict what sort of
requests they will get and start working on those issues before they’re even asked.
180
12.5 USABILITY AND THE PLAYSTATION 3
What sort of things have you found disappointing when playing an overseas game
localized for Japan?
Often the names. But there are some that sound great in the original English, of course.
Other than that I would say the learning curve. Every game has things a devel-
oper hopes players will do first with their game. They should first have players run
through the tutorial at the lowest difficulty setting to learn any difficult maneuvers.
After that players can advance to an intermediate level and learn to be more crea-
tive before finally moving on to the advanced operations. The number of challenges
built into a game determines the fun factor, after all.
Players are made to proceed through the stages appropriate for learning all the
game mechanics so that they’ll get the most out of a game. It’s not just a matter of
the difficulty level, but rather helping players understand things that might not be
clear initially. There are various steps and processes set up in a game to convey this
sort of information to the player. It is naturally more fun for players to get better and
play higher levels, so the learning process is fun for them. A game wouldn’t be fun
if players were suddenly supplied with all sorts of information without having to go
through certain learning stages to get it.
Because the information needed to proceed through a game is forced onto the players?
Yes. And there are many examples of this. So a potential field for localization is to
develop and incorporate into a game steps that will help Japanese players grasp dif-
ficult information in the game. If this isn’t done properly, it becomes unclear what a
player is supposed to do.
So there is a right and wrong way for providing players with information It sounds
difficult.
Still, giving players too much information is better than nothing. It’s just awful
when a player doesn’t have access to enough information. So in that sense, forc-
ing information on a player is still better, because as they get used to a game they’ll
start to realize its significance. If a player doesn’t experience this often enough over
the course of a game, then the game itself is no good.
More and more Western games are providing a greater number of game-play hints.
For example, it’s not uncommon for possible button operations to be displayed at
181
CHAPTER TWELVE • INTERVIEW WITH ROPPYAKU TSURUMI
the top of the screen on some levels, as is the case with Gears of War and Assassin’s
Creed. Many players appreciate this, but there are some games that don’t display
the appropriate button operations on the screen when they should, which can make
things very confusing.
In such cases, the game is not adjusted well enough. It’s something I want to be
very careful about as a producer myself, because it’s an extremely important factor
for a game’s usability. An ideal game is one that provides a player with information
without them realizing it.
Games involve a repetition of input and output, by which a player begins to estab-
lish an information model in their mind that makes them realize if they perform a
certain action a certain result will follow. So when it comes to what sort of hints a
player must be provided with for this process to be successful, game play becomes
muddled if hints are suddenly provided for some reason other than what has come
before or are not provided at all.
Yes, that’s exactly right. The Ratchet series, for example, has a Help Desk voice.
The voice actor who recorded the voice data for the first and second Ratchet games
changed, and we got a lot of questions wanting to know why this happened. Just
by changing the voice actor resulted in subtle changes in the way players extracted
hints from the game. Players wouldn’t miss important information when it was pro-
vided to them by the same person with the same tone of voice during a game. This
is very important.
It sounds so meticulous.
And it’s better if a really good voice actor is hired to do the recordings. We hap-
pened to employ Atsuko Tanaka, the voice actor for the character Motoko
Kusanagi in the anime Ghost in the Shell, for Japanese editions of games, and I am
always just amazed by how good she sounds when we’re working on the record-
ings. When you hear her voice in a game, the important information just sticks in
your brain.
I feel that games have become so complicated to operate now that it is impossible to
provide enough hints on the screen during a gaming session. I think that voice and
controller vibrations will become more necessary than visual information for guiding
players through a game as time goes on.
We began using voice- and text-based tutorials and help functions with the first
Ratchet game. Text messages were also necessary, because the voice functions were
inaccessible to people with hearing difficulties. But information provided with a
voice carries much more information than the textual equivalent. For button opera-
tions and icons, however, visual information is easier to understand. It’s very dif-
ficult to know what works best and cannot be categorically described. That’s where
the fine tuning of a game comes into play.
182
12.5 USABILITY AND THE PLAYSTATION 3
Both voice and text messages are used for major rules, but if we find a place dur-
ing the focus group testing process that is difficult to understand, we’ll add more
voice guidance specifically for that location. Unfortunately, this can result in a lot
more work if there are many places like that. Ultimately, the end result will differ
depending on the time and cost spent on tuning, but regardless this is a very impor-
tant step in a game’s development.
By tuning do you mean quality assurance measures prior to debugging (game testing)?
That’s right.
Quality assurance is a part of the development process in Japan and is not a section
in and of itself, but I’ve heard that development and quality assurance divisions are
independent overseas.
That all depends on how a producer chooses to divide up their team’s responsibili-
ties. SCE (Sony Computer Entertainment, Inc.) has a section that performs quality
assurance, which both debugs software and offers other suggestions. The size of
the team, volume of work, allotted time, and at what stage the quality assurance
reports will impact game development all depend on the producer and the game
in question. A game with a lower target age group or concerns that an overseas
game might be difficult for Japanese players will affect the focus of quality assur-
ance reports. It’s all determined on a case-by-case basis.
For example, there are no regional differences if you’re making a soccer game, but
cultural differences become apparent with story comprehension for RPG and adventure
games. Even in such cases, however, a developer might insist on some changes or con-
duct rigorous focus testing. It’s ultimately up to the producer to decide what is worth
spending budget allocations on, which can affect a game at the fundamental level.
But since Ratchet & Clank Future was developed overseas, the Japanese quality assur-
ance was performed when the game was already completed to a certain degree, right?
That’s right. But Ratchet & Clank Future was a sequel title, so it was comparatively
easy to make. It’s also the first in the Ratchet series made specifically for the PS3,
so there were a lot of basic conventions from the previous installments that we had
to consider and figure out how to present to the players. But most of the basic parts
were already figured out.
It sounds like the amount of required work is totally different for games with an
established franchise and original titles.
That’s right.
In that sense, was the first Ratchet game the most difficult?
The first Ratchet game was a lot of work. I thought I was going to work myself to
death at one point. I was practically living in the smoking room at SCE. I’d work
183
CHAPTER TWELVE • INTERVIEW WITH ROPPYAKU TSURUMI
in the smoking room, go to my desk to take a thirty-minute nap, and then start the
whole process all over again. When I had to fly to the United States on business, I’d
calculate the time left until departure and go home just to do laundry before rushing
off to the airport. It was the first game in the Ratchet series, so there were all sorts
of things I had to make first-time decisions about. I even had to design the Japanese
font for the game all by myself, one pixel at a time.
Wow! But I thought developers normally got licenses to use existing commercial fonts.
That’s the case now, which is why we’ve used commercial fonts for the latter
installments in the Ratchet series. But for the first game we didn’t know what kind
of font would display on the screen well and simultaneously convey a sense of the
Ratchet universe. So I had to make the font from scratch and determine on my own
if it would be easy to read. It’s important to know how to localize a game from the
very first stage of the developing process, such as designing the game’s structure.
The first Ratchet game was very challenging work.
When I’m working on an original title, there are all sorts of things I need to
investigate, such as camera algorithms, how hints are revealed, and making sure
a character doesn’t stray off course when they are supposed to travel along a
predetermined route. I can talk about matters related to usability in this way now,
but that’s only because I’ve learned so much since the first Ratchet.
I’m sure there will be more games to come in the Ratchet series for the PS3, and I
look forward to them and other original titles from you in the future.
I’ll do my best.
184
CHAPTER
THIRTEEN
Using Biometric
Measurement to Help
Develop Emotionally
Compelling Games
13.1 Introduction
To be entertaining and enjoyable, videogames need to evoke some heightened level
of emotional experience during play (Keeker et al., 2004). With this heightened
emotional experience comes the experience of being immersed in the game envi-
ronment so that the player’s attention is fully on the game and he/she is not easily
distracted from gaming. This immersion and enjoyment isn’t just a function of posi-
tive emotion. In fact, a common emotional experience in role-playing, action and
many other types of games is the build up of tension and negative emotion during
187
CHAPTER THIRTEEN • USING BIOMETRIC MEASUREMENT
challenge that is followed by a positive emotional spike when the challenge is over-
come. Different game genres will have different goals for the player’s emotional
experience, and varying emotional profiles that include both positive and negative
emotion (Ravaja et al., 2004). Most, if not all, games will involve shifting the player
from one emotional state to another, and that experience is what makes playing
the game so compelling. Whether the game is a causal game to pass the time on
the way to work, or an intense RPG that consumes hours of a player’s time, if the
game does not grab the player’s attention, shift their emotion, and lift them out of
their ordinary routine, the game will not be successful. Maximizing the emotional
power of a game during its development cycle requires ongoing feedback about the
emotional experience of players as they encounter the various features of the game.
In order to better understand how to assess the player’s emotional experience, we
must first briefly look at the nature of emotion and the issues related to emotion
measurement.
188
13.3 THE MEASUREMENT OF EMOTION
7 FIGURE
13.1
Fear Excitement
6 Awe
Anger Anxiety
Amusement
Surprise Attraction
Intensity/arousal
Disgust
5 Want/desire
Dislike
Despair Loving
Annoyance
Like Warmth
4 Interest
Shame
Sadness
3 Hope
Boredom
2
2 3 4 5 6 7 8 9
Negative emotion Positive emotion
189
CHAPTER THIRTEEN • USING BIOMETRIC MEASUREMENT
190
13.3 THE MEASUREMENT OF EMOTION
191
CHAPTER THIRTEEN • USING BIOMETRIC MEASUREMENT
During interactive tasks the corrugator muscle EMG also has been found to pro-
vide a sensitive index of the degree of exerted mental effort (Waterink & Van Boxtel,
1994), and to increase with the perception of goal obstacles (Pope & Smith, 1994).
The corrugator EMG can also measure the more negative emotional responses of the
computer user, reflecting their tension and frustration during usage (Hazlett, 2003).
During video gameplay therefore increases in the corrugator EMG reflect level of
tension, negative emotion, and effort. For the gameplayer perhaps the best overall
label for what the corrugator EMG reflects is tension.
192
13.4 MEASURING THE PLAYER’S EMOTIONAL EXPERIENCE WITH BIOMETRICS
25 FIGURE
13.2
20
Microvolts
15
10
0
Negative events Positive events
Corrugator Zygomaticus
shows these results by events and muscles. One can see that the negative and posi-
tive events had a different pattern of emotions. The negative events results were
particularly striking, but the positive event results, though not as large, were still
significantly different. This study demonstrated that positive and negative emotion
can be measured in real time during video gameplay. Now that we see that this
method can measure the player’s emotional experience, how can it be applied to
developing more compelling and entertaining games?
193
CHAPTER THIRTEEN • USING BIOMETRIC MEASUREMENT
having bursts of negative emotion and increased tension when something bad
happens like being run off the road and bursts of positive emotion when the player
passes another car or wins the race. For driving games, there is a fairly ever-present
mildly elevated tension level associated with concentration on driving, and bursts of
positive and negative emotion throughout the race related to game events.
A genre with a different emotional profile is the action role-playing game (RPG).
We will examine play with Fable (Lionhead Studios), a successful and critically
acclaimed game when it came out in 2004. This game consists of “the hero” devel-
oping skills and acquiring possessions such as weapons, and overcoming challenges
that lead him through a portal to another place while he is hunting the “Jack of
Blades.” Figure 13.3 shows the emotional readout from facial EMG of a fifteen-year
old boy playing Fable through three complete challenge cycles, just over 6 minutes,
and leaves off with him in the fourth cycle. The positive zygomatic response is in
blue, and the negative/tension of the corrugator is in orange. One can see the pattern
of increasing tension during the challenge as he engages in combat, and then a posi-
tive spike when the challenge is overcome, and the hero moves onto the next place
and the next challenge. This increasing EMG gradient is common in sustained tasks
and is associated with increasing level of tension and effort (Malmo & Malmo, 2000).
FIGURE 15
13.3
12
EMG in microvolts
3
1 2 3
Challenge cycles
Fable Play.
194
13.4 MEASURING THE PLAYER’S EMOTIONAL EXPERIENCE WITH BIOMETRICS
Only the third challenge is missing the consistent tension build up where we would
expect to find it. If we were going to analyze this data to help with game design, then
as the developer we would be on the look out for patterns like this amongst players.
Do most players lack the tension build up at that particular place in the game, or was
this lack something idiosyncratic for this particular player? Using the immediate rea-
dout of EMG data to help with asking post-game questions is a particular usefulness
for EMG, and will help the developer zero in on difficulties or emotional dead spots
in the game that would have been overlooked with just traditional data collection.
One method of describing the emotional profile of a game is to count the number
of seconds that the two emotional traces are at least one standard deviation above
their mean. Since people vary on the absolute value of their EMG in microvolts,
the standard deviation gives us a way to compare between players and games. In
Figure 13.3, one standard deviation above the positive emotion mean is represented
by the green line. This gives us a way of noting at a glance what places in the
gameplay did heightened positive emotion occur. The same can be done for lev-
els of tension. The percentage of time the EMG is elevated one standard deviation
above the mean is related to the skewness of the data series, or how many and how
lengthy are the spikes in the player’s emotional record. In this example of Fable
the positive level is elevated for 5.5 percent of the series, and the tension level is
elevated for 17 percent. The ratio of positive elevated moments to tension is 5.5/17,
or 0.32. At this point in game research there is no database that we can compare
this ratio to and find out if this is a favorable positive to tension ratio for an action
RPG. Again, we are looking at one player in this graph, and it will be important to
calculate this emotion ratio based a selection of players. Though there are no norms
yet developed, this calculation does represent an opportunity to quantify the shifts
in emotional experience that underlie enjoyment and immersion, and would give a
quantitative score to help compare and evaluate games of the same genre.
The virtual board game or party game Mario Party (Nintendo) has a decidedly
different emotional profile than Fable. Mario Party offers the player a series of short
challenges called “mini-games” that are fun without too much work. Figure 13.4
shows a profile of a nine-year old boy playing Mario Party 6 on Gamecube for 6½
minutes. In comparing this emotional response profile to the Fable profile, we first
of all see a much greater duration of elevated positive emotion. Positive emotion
is elevated above one standard deviation 18.6 percent of the time, and tension is
elevated 14 percent of the time, resulting in a positive/tension ration of 18.6/14,
which yields the emotional profile number of 1.33. This number is quite above the
emotional profile ratio of Fable and well in the positive range (1.00 would be the
value when positive and negative elevations are balanced).
Mario Party is designed for multiple players, and that is one of its fun elements.
But for this test we control the social interaction and have the player play solo. He
is playing Mario, while the console is controlling the three other players. For the first
mini-game the player first rolls the dice and then hops to the proper space based
on the dice roll. That ends at around second 80, and then there begins a mini-game
called “Take a Breather.” All 4 players hold their breath and go underwater, and
195
CHAPTER THIRTEEN • USING BIOMETRIC MEASUREMENT
FIGURE 20
13.4
15
EMG in microvolts
10
0
0 30 60 90 120 150 180 210 240 270 300 330 360 390
Second
Mario stays under the longest and thus wins the game at 105 seconds. From the
graph, you can see the dramatic increase in positive emotion that occurs when the
game is won. The interesting thing that this data shows us though is that in contrast
to playing Fable winning challenges isn’t the only way that the player experiences
increases in positive emotion. We can see this pattern in the next thing that occurs
for Mario. He begins a long dice roll and walk at around 160 seconds, which lasts for
about 50 seconds to second 210. During the roll he becomes giant size and he gets
doubles, but at the end of the roll he lands on a space that loses him several points.
In looking at the graph one can see that the player experiences elevated positive
emotion all through Mario’s turn until the end when he loses points and then the
positive emotion goes down.
The next Mario Party mini-game illustrates heightened positive emotion during the
mini-game. Beginning at second 250 the player engages in a mini-game called “Candle
Light Flight.” For three times the players chase each other around in the dark, and
there is very little feedback to the player on how he is doing. Only at the end of the
three mini-games do the players find out who won. The three positive emotional spikes
at seconds 260, 300, and 315 correspond exactly to the playing of the three games. The
sharp increase in positive emotion occurs while the player is using his controller to
196
13.5 PRACTICALITIES OF USING BIOMETRIC MEASURES
elude the other players. Figure 13.4 tells us that this is the fun element for the player.
At the end of the three games the player actually finds out that his character Mario
lost. After that loss the player goes on to engage in another series of dice rolls and
moves, which as we can see, are very enjoyable. By looking at the positive emotion
trace we can also see which of the mini-games the player enjoys the most. For exam-
ple, this player finds the “Candle Light Flight” more enjoyable than “Take a Breather.”
So in summary, for a party game the player does experience pleasure when win-
ning, but in contrast to the action RPG most of the pleasure occurs during the play
no matter what the outcome is. The game designers recognize this, and have many
fun elements, like giant sizing, genies, etc occurring during the play. The graph for
Mario Party also illustrates how tension levels stay low and spikes are minimal,
indicating that tension is not a predominant element in the emotional experience
of the player for this genre. These two profiles were used to illustrate how different
types of games will evoke different emotional experiences. The player of an action
RPG can tolerate and enjoy a much lower tension to positive emotion ratio than the
player of a fun causal game like Mario Party.
Recent trends indicate that the growth in the gaming industry may lie in areas other
than the traditional fare of hardcore gamers. There has been a recent increase in popu-
larity in causal games and causal game platforms, serious games, and games targeted
for seniors and baby boomers. These new players appear to be seeking a different
gaming experience that involves different emotional profiles. In order to understand
and design for these emotional experiences the accurate assessment of the emotional
experience of these new gamers is vital. Since these new players do not fit the typical
hardcore gamer demographic developers now more than ever can’t expect that their
gut reactions and interests will be similar to the intended player. Facial EMG can give
feedback to the developer on what features of the game enhance emotional experi-
ence, and what mini-games, scenes, places, characters, challenges etc work best.
Warning
There are many infamous disasters in advertising and marketing that have
occurred when executives thought that the intended consumer’s reaction to
an ad or product would be similar to what the executive’s reaction was. Don’t
make the same mistake in game development and assume you know what
the potential player’s preferences and experiences will be. Testing the player’s
experience and emotional responses are vital to successful game development.
197
CHAPTER THIRTEEN • USING BIOMETRIC MEASUREMENT
it takes training and expertise to conduct any type of user experience and usability
study. Physiologic measurement is not as daunting as it may first appear, and there
are equipment makers, consultants and written resources that one can turn to for
help. This section will give a brief overview of what is involved in emotional meas-
urement of games using facial EMG.
Physiologic testing of gameplayers is conducted preferably in a living room set-
ting, simulating a natural environment as much as possible. Two tiny micro-sen-
sors are placed over the zygomaticus and then the corrugator muscle (see Tassinary
et al., 1989 for more complete description). A common ground sensor can be
attached anywhere on the body. Convenient ground attachments are a wrist bracelet
or an ear lobe clip. The facial muscle sites are cleaned with alcohol and a conduc-
tive gel is used to make the connection between the skin and the sensor. Figure 13.5
shows a gamer at play with the EMG sensors attached. An experienced technician
can hook up a player and be ready to test in less than five minutes. These sensors
on the face may seem like they would interfere with the player’s experience.
However, experience shows that people soon forget them as they get involved in
the game, just like test subjects forget about the one-way mirror in a two-room
testing situation. The wires from the sensors connect to a bioamplifier for each
muscle that amplifies the signal, and these bioamplifiers connect to an analogue-to-
digital (A/D) converter (Psylab, made by Contact Precision Instruments, is one of the
FIGURE
13.5
198
13.5 PRACTICALITIES OF USING BIOMETRIC MEASURES
better systems). In order to filter out noise, the EMG signal is typically filtered to
only allow 30 Hz to 500 Hz to pass. Also researchers often use a 60-Hz notch filter to
block out AC line interference. The raw EMG signal is in the form of a bipolar sine
wave, and needs to be rectified so the absolute values reflect the magnitude of the
muscle contraction. Figure 13.6 shows both the raw bipolar signal, and the rectified
version of the same raw signal. The signal can then be averaged and smoothed
through a hardware device called an integrator, or it can be sent to the computer for
software processing and averaging. Figure 13.6 shows the steps in this processing
from raw EMG signal to the final smoothed signal that can then be related to game
events. The sampling rate of the EMG signal is usually on the order of 1,000 Hz, so
large data files are produced rather quickly (see Cacioppo et al., 1999 for overview
5
0
5 0 5 10 15 20 25 30 35 40
10
15
20
Second
EMG rectified
20
15
Microvolts
10
0
0 5 10 15 20 25 30 35 40
Second
6
Microvolts
0
0 5 10 15 20 25 30 35 40
Second
199
CHAPTER THIRTEEN • USING BIOMETRIC MEASUREMENT
of EMG methods). Averaging to 100 msec or even 1 second values produces a work-
able time series that can then be synchronized with the video of the gameplay.
Tip
At 1,000 data points collected for each second, testing for a few minutes quickly
produces large data files, and not only is disc space a concern but time to proc-
ess starts to rise. One can minimize the processing time with writing some
reusable code to rectify and aggregate the data that can be plugged in quickly.
Sometimes a better alternative is to have the data crunched with a hardware
device called an integrator and then sent to the computer. Sensitivity does seem
to be affected by this approach though and some researchers, including myself,
prefer to manage the larger data files for the sake of precision.
This EMG methodology is likely more of a quantitative approach than most are
use to taking, but with a little preliminary work one can set up a system that you
can use over and over again with little extra work. For example, an Excel spread-
sheet can be made with the formulas for mean, standard deviation etc already in
cells, and after the initial data capture and averaging, the positive data series and
the tension data series can be dropped in two data columns of the Excel worksheet.
The summary values of interest are then instantly calculated and available for use.
During the actual gameplay the EMG signals can be of use as well, without any
extensive data analysis. The tester can observe the gameplay and also have an eye
on the EMG readout that is flowing across the screen. When there is a particular
positive or tension spike on the screen the tester can note what the gameplay was,
and then after the play session ask the player what might have been occurring for
him/her at that moment. The EMG readout then can be quite valuable in directing
the tester’s attention to significant reactions and events in the gameplay that would
have gone unnoticed otherwise.
Warning
In any type of player experience testing the presence of others is a power-
ful influence. Other people are strong emotional elicitors, and social interac-
tion will overwhelm reactions to the game. The tester should always strictly
control this aspect of the testing situation, or risk invalidating the data and
learning little about the game.
Players are usually tested alone, as the presence of other people influences smil-
ing and the report of positive emotion. Basically, sharing experiences with others
200
13.6 APPLICATIONS FOR BIOMETRIC METHODS
Note
Biometric methods in general require fewer subjects than verbal methods to
be valid, as the error variance is usually less. However, the more players one
can test the more confident one can be in their results. The balance between
collecting enough data and costs in time and money is always an issue.
Note
If you are testing reactions to a series of visual or auditory features like scenes,
music, characters, etc., it would be important to vary the presentation order
between players so that there are not order effects in the testing results.
201
CHAPTER THIRTEEN • USING BIOMETRIC MEASUREMENT
FIGURE 80
13.7
75
Heart rate
70
65
60
Pleasant Stressor Trauma
Imagery scene
Pre-treatment Post-treatment
Pre- and Post- Treatment Response to Virtual Reality Immersion of Trauma Victim.
202
13.7 SUMMARY
three virtual environments, and the means are shown in Figure 13.7. As can be
seen, the trauma changes were much larger and more significant than the other two
environments. (Note: the assessment and treatment in this case only used virtual
reality immersion with video of appropriate environments for each of the three envi-
ronments. The addition of actively involved gaming in assessment and treatment for
this type of patient would be an advancement, and is on the drawing board.)
This example of PTSD treatment illustrates that when the goal of a game is
beyond just entertainment, player experience assessment can provide another
function besides feedback to help game design. Biometric player assessment can
inform about the success of each player in achieving game goals. This is of course
important for the player, but also a log of cumulative player success can then pro-
vide performance and ROI data about the game itself. Information such as per-
centage of players that achieved game goals, amount of time and effort required
to achieve goals, and so on can be used for interested parties such as stakeholders
and for marketing and sales purposes. This information becomes important with
serious games because unlike entertainment games, the purchaser of the serious
game may likely not be the intended player, and personal play experience is not a
source of information for the purchaser.
13.7 Summary
In this chapter, I have tried to give the reader a brief introduction to the nature of
emotion, and how the player’s emotional engagement and reaction is the funda-
mental driving force of the gaming experience. Game development is enhanced by
feedback about the emotional experience of the player, and this information can
be used to arrive at an emotional profile of the game. The challenge is in measur-
ing the emotional experience of the player without interfering with the natural
gaming experience. The player’s verbal report is not so informative about their
moment to moment emotional experience so physiological measures have been
turned to. The best physiologic measure for tracking emotional valence is facial
EMG. These methods were described and shown to have validity for measuring
emotion during gaming. Even though the EMG methodology is more quantitative
than most approaches, there are ways to collect and analyze the data that mini-
mize the work.
One of the main goals of biometric assessment in game development is to pro-
vide information on the emotional profile of the game, and how the different ele-
ments of the game enhance or detract from the game’s approach to engaging the
player. In this chapter, biometric assessment of an action RPG and a party game
demonstrated how different game genres have different emotional profiles and
methods for emotionally engaging the player. In addition to entertainment games,
biometric assessment can be useful with serious games. We saw how serious games
have a somewhat different assessment need than entertainment games, and how
biometric assessment might be useful in providing feedback on the less objective
203
CHAPTER THIRTEEN • USING BIOMETRIC MEASUREMENT
outcomes desired for some of these games. Appling game and virtual reality tech-
nology to help with the treatment and assessment of PTSD gave us an example
of how gaming with biometric measurement can achieve the game’s objectives of
providing the player with controlled immersion and active learning. The biometric
assessment also becomes useful for quantifying the effectiveness of the game.
13.8 References
Anttonen, J., & Surakka, V. (2005). Emotions and heart rate while sitting on a chair. In: Proc.
CHI 2005. ACM Press, pp. 491–499.
Cacioppo, J.T., Bush, L.K., & Tassinary, L.G. (1992). Microexpressive facial actions as a func-
tion of affective stimuli: Replication and extension. Psychological Science, 18, 515–526.
Cacioppo, J.T., Gardner, W., & Berntson, G. (1999). The affect system has parallel and inte-
grative processing components: Form follows function. Journal of Personality and Social
Psychology, 76, 839–855.
DeMaria, R. Games for health 2006: Addressing PTSD, psychotherapy & stroke rehabilitation
with games & game technologies. Serious Games Source. http://seriousgamessource.com/
features/feature_052306.php
Ekman, P., & Friesen, W.V. (1978). Facial action coding system (FACS): A technique for the
measurement of facial actions. Palo Alto, CA: Consulting Psychologists Press.
Hazlett, R.L. (2006) Measuring Emotional Valence during Interactive Experiences: Boys at
Video Gameplay. Proceedings of CHI 2006 Conference on Human Factors in Computing
Systems, ACM Press, 1023–1028.
Hazlett, R.L. (2003) Measurement of user frustration: A biologic approach. Proceedings of CHI
2003 Conference on Human Factors in Computing Systems, ACM Press, 734–735.
Keeker, K., Pagulayan, R., Sykes, J. and Lazzaro, N. (2004). The untapped world of video-
games. In Proc. CHI 2004, ACM Press, 1610–1611.
Larsen, J.T., Norris, C.J., & Cacioppo, J.T. (2003). Effects of positive and negative affect
on electromyographic activity over zygomaticus major and corrugator supercilii.
Psychophysiology, 40, 776–785.
Malmo, R., & Malmo, H. (2000). On electromyographic (EMG) gradients and movement-
related brain activity. International Journal of Psychophysiology, 38, 143–207.
Mandryk, R. (2004) Objectively evaluating entertainment technology. Proceedings of CHI 2004
Conference on Human Factors in Computing Systems, ACM Press, 1057–1058.
Pope, L.K., & Smith, C.A. (1994). On the distinct meanings of smiles and frowns. Cognition
and Emotion, 8, 65–72.
Ravaja, N., Salminen, M., Holopainen, J., Saari, T., Laarni, J. and Järvinen, A. (2004)
Emotional response patterns and sense of presence during videogames: potential criterion
variables for game design Proceedings of the third Nordic conference on Human-computer
interaction, ACM Press, 339–347.
Tassinary, L.G., Cacioppo, J.T., & Geen, T.R. (1989). A psychometric study of surface elec-
trode placements for facial EMG recording: I. The brow and cheek muscle regions.
Psychophysiology, 26, 1–16.
Waterink, W., & Van Boxtel, A. (1994). Facial and jaw-elevator EMG activity in relation to
changes in performance level during a sustained information task. Biological Psychology,
37, 183–198.
204
13.9 ADDITIONAL RESOURCES
Watson, D., Wiese, D., Vaidya, J., & Tellegen, A. (1999). The two general activation systems
of affect: Structural findings, evolutionary considerations, and psychobiological evidence.
Journal of Personality and Social Psychology, 76, 820–838.
205
CHAPTER
FOURTEEN
Physiological
Measures for Game
Evaluation*
Regan Mandryk is an assistant professor in the
department of computer science at the University of
Saskatchewan in Canada. Her research focuses on the
design, implementation, and evaluation of user-context
sensing technologies and on incorporating context into
the design of interaction techniques. She focuses on
the affective and cognitive aspects of user context, and
applies her methodologies in computer game environ-
ments. Having received her Ph.D. in computer sci-
ence from Simon Fraser University, her M.Sc. in kinesiology from the same, and
her B.Sc. in mathematics from the University of Winnipeg, Dr. Mandryk is uniquely
positioned to explore the mathematical modeling of user emotional state, based on
physiological signals, while users play computer games.
14.1 Introduction
Given the success of physiological metrics for evaluation in other domains such
as human factors, it is logical to assume that physiological signals would also be
good indicators of user experience with computer games. Physiological signals yield
large amounts of contextually-relevant data, provide an objective indicator of user
* (Editors’ warning: This chapter is not for the faint of heart—it contains far more scientific terminology
and details than other chapters in the book. For a simpler introduction to using some of these measures
in your research, see Hazlett’s chapter. We included this material for those who are interested in diving
even deeper, and are thus willing to pick up many new concepts and terms.)
207
CHAPTER FOURTEEN • PHYSIOLOGICAL MEASURES FOR GAME EVALUATION
experience without impacting the gameplay experience, and can be used to infer
underlying emotional states relevant to gameplay. It would be a boon to the game
evaluation community if there was a plug-and-play system, where a user is sat
down in front of a computer game, physiological sensors are attached, and a few
minutes later we know how much fun she is having, and which parts of the game
are more fun than other parts. Unfortunately, using physiological signals is not this
straightforward, and there are complexities in data collection and analysis that cur-
rently prevent us from achieving this plug-and-play level of ease. But research in
this area is advancing, and the ease of a plug-and-play system is not far off.
This chapter will provide you with the necessary information to introduce physi-
ological measures into your player studies, and will point you to in-depth resources
available for further study.
208
14.2 COMPARISON TO OTHER EVALUATION TECHNIQUES
FIGURE
14.1
Quadrant display including a) the screen capture of the biometrics, b) video of the
participant’s face, c) video of the controller, and d) a screen capture of the game. Audio of
the participant’s comments and audio from the game were included in the quadrant video.
The modeled emotions’ means were evaluated with test data, and exhibited the
same trends as the reported emotions for fun, boredom, and excitement, but modeled
emotions revealed statistically significant differences between three play conditions,
while differences between reported emotions were not significant. The details of the
fuzzy logic model can be found in Mandryk and Atkins (2007). while the validation
of the model and its potential use in interactive systems can be found in Mandryk
et al. (2006). For details on our work and related literature in the area, see Mandryk
(2005). Figure 14.1 shows how we collected the data. Cameras captured a player’s
facial expressions and their use of the controller. All audio was captured with a
boundary microphone. The game output, the camera recordings, and the screen
containing the physiological data were synchronized into a single quadrant video
display, and recorded onto a hard disk. The audio from the boundary microphone,
and the audio from the game were integrated into the exported video file.
To conduct work in this area, we learned lessons the hard way: by trial and error.
By sharing our knowledge stemming from years of experimentation with physiologi-
cal sensors and games, I hope that you will be in a well-informed position to incor-
porate physiological sensors into your own work.
209
CHAPTER FOURTEEN • PHYSIOLOGICAL MEASURES FOR GAME EVALUATION
what skills and resources are available, and what kind of budget and schedule you
have for the evaluation process. For an in-depth discussion of how to decide on an
evaluation method, investigate the DECIDE framework (see Chapter 13 of Sharp et al.,
2007). (Editors’ note: there is also a matrix in Chapter 21 of this book,which com-
pares methods.)
If you are considering using physiological measures as an evaluation metric, see
Figure 14.2 to understand how they relate to other approaches for evaluation. If your
goal is to understand the attitudes and preferences of gamers, subjective approaches
such as interviews or surveys are most appropriate. If your goal is to find usability
or playability problems, heuristic evaluation is likely the best choice. However, if
your goal is to gather quantitative data on a user’s experience while they are playing
a game, observational analysis or physiological metrics are best. Both approaches
are objective and quantitative, and can be measured throughout the play experience.
If your primary deciding factor is budget or schedule, discount approaches like
heuristic evaluation, and surveys are generally fast and inexpensive compared
to other evaluation methods. Collecting physiological measures in the past has
required expensive sensors; however, there are now cheap and robust sensors on
the market which can be added to your user study. Adding sensors only slightly
increases the time needed for a user study as individual baseline measures must be
gathered and participants may need to rest in between games or experimental con-
ditions to return to their resting baselines. Processing and analyzing physiological
data is not as time-consuming as rigorous observational analysis, but still adds an
FIGURE Objectivity
14.2
Observational
analysis Physiological data
Cognitive
walkthrough
Think-aloud
Qualitative Heuristic
evaluation Quantitative
Interviews Surveys
& &
focus groups questionnaires
Subjectivity
210
14.3 WHICH SENSORS TO CHOOSE
extra step. Even so, the detailed information about user experience that physiologi-
cal measures provide is worth the additional time commitment.
This chapter should provide you with the necessary information to introduce phys-
iological measures into your user studies, but to perform rigorous scientific research in
this area requires a certain skill set that will take time to develop. Although measuring
a user’s galvanic skin response while they play a game is not a complex task, using
GSR data to make inferences about your game environment is not as straightforward.
Psychological Counterpart
● Arousal: Increases in psychological arousal are best measured by increases
in galvanic skin response (GSR), but can also be seen in increased respira-
tion, decreased blood volume pulse (BVP), and increased heart rate (HR).
● Mental effort: depending on your setting, decreasing heart rate variability
(HRV) or greater pupil dilation can be used to measure increases in men-
tal effort. Increases in jaw clenching (through EMG sensors on the face)
or brow-raising (EMG of the forehead) may also be indicative of increased
mental effort. Increased respiration rate and a decrease in the variability of
respiration rate are also associated with mental effort.
● Positive versus negative emotions: The valence of an emotion (whether
it is positive or negative) can be measured through facial muscle analysis
(EMG) over the brow (frowning) and cheek (smiling). Some potential has
been shown in the use of heart rate, irregularity of respiration, and pupil
diameter as indicators of valence.
211
CHAPTER FOURTEEN • PHYSIOLOGICAL MEASURES FOR GAME EVALUATION
When choosing which physiological sensors to use, there are two questions
you must ask yourself: what do I want to know about the user’s experience, and
whether using a given sensor will intrusively impact the user’s gaming experience or
be impacted by the play experience. The former question is straightforward, and this
section provides details on what aspects of user experience are measured by which
physiological sensors. The latter question is more difficult because the environments
of game playing and psychophysiological experimentation are in opposition to each
other. When playing a game, users should be unrestricted and immersed in the play
experience. In contrast, traditional experiments involving physiological measures
have taken place in tightly-controlled laboratory environments.
In this section, we describe a number of physiological sensors. Organized by ana-
tomical system, each subsection presents: the measure; its psychological counter-
part; other factors it is affected by; and devices used for measurement. In addition,
we discuss how each measure might impact a gaming experience, and whether the
act of playing a game will prohibit its use.
Psychological Counterpart
Galvanic skin response is a linear correlate to arousal (Lang, 1995) and reflects emo-
tional responses as well as cognitive activity (Boucsein, 1992). GSR has been used
212
14.3 WHICH SENSORS TO CHOOSE
213
CHAPTER FOURTEEN • PHYSIOLOGICAL MEASURES FOR GAME EVALUATION
Blood Pressure
Blood pressure indicates how much pressure is needed to push blood through the
system of arteries, veins, and capillaries. Although blood pressure is known to be
affected by age, diet, posture, and weight, it is also affected by the setting (clinical
vs. normal) and by highly stressful situations (Stern et al., 2001). Generally, BP is
collected using an inflated arm cuff (sphygmomanometer) that is inflated, and sub-
sequently deflated while readings are taken. As a result of cuff inflation and defla-
tion, blood pressure responses to stimuli cannot generally be collected in real-time.
There were some sophisticated and expensive pieces of equipment that were devel-
oped to collect BP continuously, but these systems were removed from the market
due to their lack of commercial success. Automated machines have been developed
for use with polygraph machines, but cannot accurately take more than one reading
per minute (Stern et al., 2001). Generally, technologies that measure BP are restric-
tive and invasive, and not suitable for gaming environments.
214
14.3 WHICH SENSORS TO CHOOSE
Heart Rate
Heart rate (HR) indicates the number of contractions of the heart each minute,
and can be gathered from a variety of sources. HR has been used to differentiate
between positive and negative emotions, with further differentiation made possible
with finger temperature (Papillo and Shapiro, 1990). Distinction has been made in
numerous studies between anger and fear using HR (Papillo and Shapiro, 1990) (for
a comprehensive review, see Cacioppo et al., 2000).
In addition to the psychological differences that HR elicits, it is also affected by
age, posture, level of physical conditioning, breathing frequency, and circadian cycle
(relating to a 24-hour period). We measured HR using electrocardiography, but a
standard exercise HR monitor would suffice for most uses of HR in an interactive
game environment.
215
CHAPTER FOURTEEN • PHYSIOLOGICAL MEASURES FOR GAME EVALUATION
brainstem is associated with a time delay of about 1 sec (Mulder, 1979). This
time delay creates a phase shift and causes the system to oscillate. The oscilla-
tion frequency is about 0.1 Hz (Mulder, 1979). If IBI is fairly constant, then HRV
will be low, whereas if IBI is changing (regardless of absolute value), then HRV
will be higher.
In 1963, Kalsbeek and Ettema (1963) found a gradual suppression of heart rate
irregularity related to increasing task difficulty. Later, Kalsbeek and Sykes (1967)
tested a motivated group versus a non-motivated group (using money as a motiva-
tor), and found that the motivated group maintained a constant level of suppres-
sion while the non-motivated group started at a lower level of suppression and
continued to decline. Since then, many researchers have attempted to use HRV as
an indicator of mental effort.
HRV has been used extensively in the human factors literature as an indication
of mental effort and stress in adults. In high-stress environments such as ambulance
dispatch (Wastell and Newman, 1996) and air traffic control (Rowe et al., 1998),
HRV is a very useful measure. When subjects are under stress, HRV is suppressed
and when they are relaxed, HRV emerges. Similarly, HRV decreases with increases
in mental effort (Rowe et al., 1998) and cognitive workload (Wilson, 1992), but as
the mental effort needed for a task increases beyond the capacity of working mem-
ory, HRV will increase (Richter et al., 1998, Rowe et al., 1998). Many researchers
have found significant differences in HRV as a function of mental workload, while
others have not (Meshkati, 1988, Mulder, 1979). HRV has also been used to differ-
entiate between epistemic behavior (concerning the acquisition of information and
knowledge), and ludic behavior (playful activities which utilize past experience) in
children (Hutt, 1979).
One method of determining HRV is through a short-term power spectral den-
sity analysis of interbeat interval, which is described in the next section. HRV can
be measured using electrocardiography, but less expensive alternatives, such as the
Wild Divine biofeedback hardware (Journey to the Wild Divine, 2008), can also be
used. If heartbeats can be accurately measured, then HRV can be determined and is
suitable for measuring on a user interacting with a game console.
216
14.3 WHICH SENSORS TO CHOOSE
Recently, researchers have used spectral analysis of sinus arrhythmia (heart rate
variability) to provide an objective measure of mental effort. Measuring HRV using
the 0.1 Hz frequency component has the important advantage of being able to dis-
criminate between the effort-related blood pressure component, and the effects
caused by respiration, motor activity, and thermoregulation, since these other factors
influence other parts of the power spectrum Vicente et al. (1987).
In order to perform spectral analysis, researchers used to convert the interval
signal to an equidistant time series using interpolation, or filtering (Mulder, 1979).
Recent digital technology produces a measure of the interbeat interval at 4 Hz, which
can be used directly. This time series data is then smoothed and Fourier-trans-
formed. The frequency range sensitive to changes in mental effort is between 0.06
and 0.14 Hz (Vicente et al., 1987), while the area between 0.22 and 0.4 Hz reflects
activity related to respiration (Jorna, 1992, Mulder, 1979). Integrating the power in
the band related to mental effort provides a measure of HRV. Vicente recommends
normalizing the measure by dividing by the average of all resting baselines and sub-
tracting from one (Vicente et al., 1987). Then, a value between 0 and 1 is produced
where zero indicates no mental effort and one indicates maximum mental effort.
To assist researchers or developers who want to use HRV as an indicator of men-
tal effort, many hardware systems perform the signal processing, and output HRV
as a value. Although it is good to understand how HRV is calculated, users of physi-
ological sensing systems do not have to perform the signal processing themselves,
but can rely on the hardware to correctly determine HRV.
Electrocardiography
EKG (Electrocardiography) measures electrical activity of the heart. During each
cardiac cycle, a wave of depolarization radiates through the heart (Martini and
Timmons, 1997). This electrical activity can be measured on the body using surface
electrodes. An example of an EKG signal is shown in Figure 14.3.
R R FIGURE
14.3
QRS QRS
complex complex
P wave T wave P wave
Q Q
S S
EKG signal. The P wave appears as the atria depolarize, the QRS complex accompanies the
depolarization of the ventricles, and the T wave denotes ventricular repolarization. The R
to R interval is the interbeat interval used to determine heart rate variability.
217
CHAPTER FOURTEEN • PHYSIOLOGICAL MEASURES FOR GAME EVALUATION
Heart rate (HR), interbeat interval (IBI), HRV, and respiratory sinus arrhythmia
(RSA) can all be gathered from EKG. Although there is a standard medical configu-
ration for placement of electrodes, any two electrodes placed fairly far apart will
produce an EKG signal (Stern et al., 2001). The main placement method is on the
chest with the negative electrode on the right shoulder, the positive electrode on
the abdomen, and the ground on the left shoulder (see Figure 14.4 A), although
the forearm provides a good measurement location for less intrusive measurement
(Figure 14.4B and C). EKG provides a good signal with which to derive the afore-
mentioned physiological cardiac measurements.
In our work, we placed three pre-gelled surface electrodes in the standard
configuration of two electrodes on the chest and one electrode on the abdomen
(see Figure 14.4A). Body hair can interfere with an EKG signal, and shaving the
regions for electrode placement is a common clinical practice. As an alternative, we
screened our participants to have little to no body hair on the chest or abdomen.
218
14.3 WHICH SENSORS TO CHOOSE
FIGURE
14.4
ⴚ G
ⴙ G G
ⴚ ⴙ ⴚ
A B C
Three common electrode placements for EKG. (A) Chest placement. (B) Forearm
placement. (C) Forearm and leg placement. (Adapted from Thought Technologies, 2008).)
gaming environment, although the data can be quite noisy if users are talking or
frequently shifting position.
* (Editors’ note: the chapter from Hazlett in this book (13) provides an excellent overview of the appli-
cation of EMG to game user research.)
219
CHAPTER FOURTEEN • PHYSIOLOGICAL MEASURES FOR GAME EVALUATION
EMG has been used to distinguish between positive and negative emotions (Fridlund
and Cacioppo, 1986). EMG activity over the brow region (corrugator supercilii, the
frown muscle) is lower and EMG activity over the cheek (zygomaticus major, the
smile muscle) and preiocular (orbicularis oculi) muscle regions are higher when
emotions are mildly positive, as opposed to mildly negative (Cacioppo et al., 2000).
These effects are stronger when averaged over a group rather than for individual
analysis, and have been able to distinguish between positive, neutral and negative
valence at a rate greater than chance when viewing pictures or video as stimuli
(Partala et al., 2005). Tonic activity from EMG on the forehead (musculus frontalis,
the eyebrow-raising muscle) has been used as a measure of mental effort (Fridlund
and Cacioppo, 1986). In addition to emotional stress and emotional valence, EMG
has been used to distinguish facial expressions and gestural expressions (Stern et al.,
2001).
EMG feedback is generally used for relaxation training, headache, chronic pain,
muscle spasm, partial paralysis, speech disorder, or other muscular dysfunction due
to injury, stroke, or congenital disorders.
In our experiments, we used surface electrodes to detect EMG on the jaw (indica-
tive of tension), and on the forehead (indicative of frowning), and cheek (indicative
of smiling). On the jaw and cheek, we used three electrodes preconfigured in a tri-
angular arrangement. Because of the small size of the corrugator supercilli muscle,
we used the extender cables to collect EMG on the forehead. The disadvantage of
using surface electrodes is that the signals can be muddied by other jaw activity,
such as smiling, laughing, and talking. Needles are an alternative to surface elec-
trodes that minimize interference, but are not appropriate for non-clinical settings.
Body hair can interfere with an EMG signal, and shaving the regions for electrode
placement is a common clinical practice. As an alternative, we screened our partici-
pants to have clean-shaven faces in any of the regions where electrodes were to be
placed.
Although highly useful as an indicator of emotional valence (positive versus
negative emotions), EMG can be difficult to measure in a non-clinical setting, and
requires care with electrode placement. Interference from other facial muscles is
common, and users who are sensitive or easily embarrassed may not want to place
sensors on their face.
220
14.4 CONSIDERATIONS FOR COLLECTING PHYSIOLOGICAL DATA
2001). We have not used EEG in our previous work and readers interested in the
use of EEG for game evaluation should investigate the FUGA research project (Fuga:
Fun of Gaming Research Group, 2008).
Pupillometry is the study of the dilation of the pupil (Stern et al., 2001); pupil
dilation is a useful measure as it is affected by mental effort (Porter et al., 2007).
Unfortunately, pupil diameter can also be affected by changing light conditions, color,
or spatial pattern, and target motion of the visual input (Li and Sun, 2005, Porter et al.,
2007). More research needs to be conducted to make pupil diameter effective as an
evaluative physiological feature for use in interactive gaming environments.
221
CHAPTER FOURTEEN • PHYSIOLOGICAL MEASURES FOR GAME EVALUATION
0
1 2 3 4 5 6 7 8 9 10
Participant ID
Average GSR values for ten participants playing either against a friend or against a
computer.
percentage of the total GSR span. For example, if a user had a minimum GSR value
of 4.6 μS and a maximum GSR value of 8.7 μS, their GSR value at time t of 6.3 μS
would be represented as:
Normalizing the user data in Figure 14.5 would yield the results for average GSR
shown in Figure 14.6. The GSR differences between the two play conditions are
much more apparent once the data has been normalized. One of the drawbacks
with normalizing user data is that you must know the minimum and maximum
values for a user, requiring that all analyses happen after all of the data has been
collected, rather than in real time while the data is being collected. If your appli-
cation requires that you analyze the data as it is collected, then consider relative
values rather than inferring meaning from absolute values. For example, consider
that a user’s GSR signal is rising or falling, or that the user just experienced a local
peak, rather than inferring something from the fact that their GSR value is presently
5.6 μS. Normalizing sensor values also allows you to compare between individuals,
although you still must be careful as normalization procedures are only based on
the available information. If a user is having difficulty relaxing during a rest period,
her arousal during the game (inferred from normalized GSR) will appear to be
lower, as her resting levels were higher.
222
14.4 CONSIDERATIONS FOR COLLECTING PHYSIOLOGICAL DATA
60
GSR (%)
40
20
0
1 2 3 4 5 6 7 8 9 10
Participant ID
Normalized GSR values for the same ten participants playing either against a friend or
against a computer.
223
CHAPTER FOURTEEN • PHYSIOLOGICAL MEASURES FOR GAME EVALUATION
FIGURE
SCGSR
14.7
3.0
2.5
2.0
1.5
1.0
One participant’s GSR signal over the course of the experiment. The areas shaded in light
grey represent when the participant was being interviewed. The areas shaded in dark grey
represent when the participant was playing the game.
playing. The GSR signal drops off at the beginning of each game condition from the
reaction to the interview process. These interview peaks cannot be excluded from
the analysis, because there were no rest periods in between play conditions. The
effects of relaxing post-interview and being excited by the game are inseparable,
thus the interview peaks cannot be eliminated. To correct for this, ensure that you
include resting periods in between play conditions so that the effects of the experi-
mental setting don’t overshadow the effects of the game you are studying. The act
of applying sensors to the body and monitoring body responses can be a stressful
experience for a participant, and every effort must be made to allow the participant
to relax and feel at ease.
In our initial experiments, we also found that resting rates of some physiolog-
ical measures were higher than game play rates. Anticipation and nervousness
seemed to have caused the resting baselines to be artificially high. This creates
a problem for a researcher who wants to use resting rates to normalize the data.
Vicente et al. (1987) recommend collecting a number of baselines throughout
the experimental session and averaging them to create a single baseline value. In
addition, using participants who are familiar with the process of being connected
to physiological sensors would help lower the resting values. Beginning the exper-
iment with a training or practice condition, before collecting the resting values,
224
14.4 CONSIDERATIONS FOR COLLECTING PHYSIOLOGICAL DATA
might also help the participants to relax. Finally, in subsequent studies we used
relaxation music during the resting periods to help us achieve consistent resting
baselines.
225
CHAPTER FOURTEEN • PHYSIOLOGICAL MEASURES FOR GAME EVALUATION
of frames (N) to form the window is chosen. The average for N frames is calculated,
replacing the current value. The window is moved down the time series by one
frame, and the process is repeated. As a result, the minor fluctuations are smoothed
out over the entire time series. With a low-pass filter, the short-term oscillations are
removed by a filter that passes over the low-frequency signals, but reduces the mag-
nitude of signals with frequencies higher than a cutoff frequency. Thus, the slower
changes are kept, while the rapid increases and decreases are removed. While both
methods achieve a smoother signal, the moving-average window can easily be
achieved with simple spreadsheet programming, while the low-pass filter requires
more mathematically-powerful tools and approaches. On the other hand, the
low-pass filter approach is more powerful, allowing you greater control over which
frequencies you keep and which you discard.
226
14.5 HOW TO ANALYZE PHYSIOLOGICAL DATA
Luckily, there are other approaches for interpreting physiological data. Figure 14.7
shows how a user’s GSR signal changes over the course of an experiment. Not only
could we average this time series, but we could look for local peaks and valleys in
the signal. We could consider when the signal is rising versus falling or flat, or we
could look at how sharply the rises and falls occur. In signal analysis, these examples
of characteristics of the signal are called local maxima, local minima, and slope
respectively, and are easily calculated using spreadsheet programming or more com-
plex analysis tools like Matlab™. Consider what you want to discover about a user’s
experience in order to decide on the processing approach. For example, if you want
to know which of two artistic approaches in a game are more relaxing for users,
you may simply want to test users in both conditions and average the GSR signal,
similar to how we tested users playing against a friend or computer (Mandryk and
Inkpen, 2004, Mandryk et al., 2006) (see Figure 14.6). If you want to know whether
a narrative cut scene is relaxing or exciting, you may want to graphically exam-
ine users’ GSR signals (or respiration rate or HR) before, during, and after the cut
scene, like we did when we graphed the data surrounding goals scored and fight-
ing in NHL 2003 by Electronic Arts™ (Mandryk and Inkpen, 2004, Mandryk et al.,
2006). Figure 14.8 shows one participant’s GSR data after scoring a goal once
against a friend and twice against the computer. If you want to determine whether
users are feeling positively or negatively towards your game while playing, you
may want to use EMG sensors on the face and look at points where smiling or
frowning activity exceeds a certain threshold; we took a similar approach when
determining the valence (positive versus negative feelings) of users playing NHL
2003 against a computer, stranger, or friend (Mandryk and Atkins, 2007, Mandryk
et al., 2006).
1.8 FIGURE
14.8
1.6
1.4
1.2
GSR ( m)
Friend
1.0
0.8
Computer
0.6
0.4
0.2
0.0
Goal
One participant’s GSR response to scoring a goal against a friend and against the
computer twice. Note the much larger response when scoring against a friend. Data were
windowed 10 seconds prior to the goals and 15 seconds after.
227
CHAPTER FOURTEEN • PHYSIOLOGICAL MEASURES FOR GAME EVALUATION
Using the whole time series, rather than simply an average of the time series is
an informative and powerful approach. For example, if you wanted to know whether
users smiled more when playing your game when they could either create their own
avatar or use pre-rendered stock avatars, you could collect EMG from smiling activ-
ity over both play conditions. Averaging the data over the entire time series might
tell you that people smiled more, on average, when using an avatar of their own cre-
ation. But you wouldn’t know whether the users were smiling more often, smiling
with bigger smiles, or maintaining a generally higher smile level throughout the
experience. Depending on your goals, it might be enough to just know that users
smiled more in one condition, but there are situations where you might benefit from
more information. Assume that you looked at the data and found that when users
created their own avatars, they were smiling bigger smiles when their avatar was
on screen. In this case, you have contextualized the user’s response by considering
their data in the context of their play experience.
228
14.6 ADVANCED USES OF PHYSIOLOGICAL DATA
The results that we gathered in our initial experiments formed a basis for devel-
oping our models of user emotion, based on physiological reactions. Our first model
transforms four physiological signals (GSR, HR, EMG smiling, EMG frowning) into
levels of arousal and valence. Representing a participant’s experience in arousal-
valence space is a great method of objectively and quantitatively measuring their
experience when engaged with entertainment technologies. Figure 14.9 shows a
visual representation of a participant’s experience continuously in arousal-valence
space, representing the positive and negative stimulation that the participant feels
as they engage with the technology.
Our second model transforms arousal and valence into four emotions: boredom,
excitement, frustration, and fun. Emotions modeled from physiological data provide
a metric to fill the knowledge gap in the objective-quantitative quadrant of evalu-
ating user interaction with entertainment technologies (see Figure 14.2). We com-
pared the modeled emotions to subjective reports and found the same trends for
fun, boredom, and excitement; however, modeled emotions revealed differences
between play conditions, while the differences between the subjective reports failed
to reach significance.
Our modeled emotions were based on fuzzy transformation functions from phys-
iological variables to arousal and valence, and then from arousal-valence space to
emotions. For more information on the development of the mathematical models,
see Mandryk and Atkins (2007), while a validation of the modeling approach, as
well as a description of its use in HCI can be found in Mandryk et al. (2006).
229
CHAPTER FOURTEEN • PHYSIOLOGICAL MEASURES FOR GAME EVALUATION
230
14.7 CONCLUSIONS
SPECIFICITY FIGURE
14.10
One-to-one
Marker Invariant
Context-bound
GENERALITY
Context-free
Outcome Concomitant
One-to-many
the issue for a researcher is in establishing the invariant relationship instead of sim-
ply assuming that the relationship between a psychological event and a physiological
response is an invariant.
14.7 Conclusions
Incorporating physiological metrics into your user studies is not as straightforward
as attaching sensors to an individual and then reading their emotional state from a
computer printout; however, progress is being made that is moving this methodol-
ogy towards the ease of a plug-and-play system. When determining whether or not
to use physiological measures, you must first decide on your goals for your evalua-
tion. If a continuous, objective, and quantitative representation of user experience is
desired, then physiological measures are a great choice.
231
CHAPTER FOURTEEN • PHYSIOLOGICAL MEASURES FOR GAME EVALUATION
Choosing which sensors to use should also be based on your evaluation goals.
Different physiological sensors can provide indications of different user states such
as arousal, valence, and mental effort. But in choosing your sensors you will also
need to consider the sensor’s impact on the gameplay experience, and the game’s
impact on the sensor.
Although it can be intimidating to use physiological sensors in your work, fol-
lowing the guidelines for the collection, analyzes, and use of physiological data pro-
vided in this chapter will help you to achieve valid, useful, and rich data on user
experience. Consider adding physiological measures to your suite of evaluation tech-
niques and using them in concert with more familiar methods to achieve a robust
and complete picture of user experience with games.
14.8 Acknowledgements
Thank you to Dr. Kori Inkpen, Dr. Stella Atkins, Dr. Tom Calvert, and Dr. Kevin
Stanley for their contributions to this work. In addition, thanks to the National
Sciences and Engineering Research Council of Canada and Electronic Arts for fund-
ing the research.
14.9 References
Affective Computing Group. Retrieved March 2008, from http://affect.media.mit.edu/;
Galvactivator. Retrieved March 2008, from http://www.media.mit.edu/galvactivator/;
Affective jewelry. Retrieved March 2008, from http://affect.media.mit.edu/AC_research/
projects/affective_jewelry.html; Affquake. Retrieved March 2008, from http://affect.media.
mit.edu/projects.php?id180.
Bersak, D., McDarby, G., Augenblick, N., McDarby, P., McDonnell, D., McDonald, B., and
Karkun, R. (2001). Intelligent biofeedback using an immersive competitive environment.
Paper presented at UBICOMP 2001 Workshop on Ubiquitous Gaming.
Boucsein, W. (1992). Electrodermal activity. New York: Plenum Press.
Cacioppo, J.T., Berntson, G.G., Larsen, J.T., Poehlmann, K.M., & Ito, T.A. (2000). The psy-
chophysiology of emotion. In Handbook of emotions. M. Lewis, & J.M. Haviland-Jones
(Eds), New York: The Guilford Press. pp. 173–191
232
14.9 REFERENCES
Cacioppo, J.T., & Tassinary, L.G. (1990). Inferring psychological significance from physiologi-
cal signals. American Psychologist, 45(1), 16–28.
Csikszentmihalyi, M. (1990). Flow: The psychology of optimal experience. New York: Harper
Perennial.
Desurvire, H., Caplan, M., & Toth, J.A. (2004). Using heuristics to evaluate the playability of
games. In Ext. Abst. CHI 2004, ACM Press, pp. 1509–1512.
Ekman, P., Levenson, R.W., & Friesen, W.V. (1983). Autonomic nervous system activity dis-
tinguishes among emotions. Science, 221(4616), 1208–1210.
Fisher, C., & Sanderson, P. (1996). Exploratory data analysis: Exploring continuous observa-
tional data. Interactions, 3(2), 25–34.
Fridlund, A.J., & Cacioppo, J.T. (1986). Guidelines for human electromyographic research.
Psychophysiology, 23, 567–589.
Fuga: Fun of gaming research group. Retrieved March 2008 from http://project.hkkk.
fi/fuga.
Fulton, B., and Medlock, M., (2003) Beyond focus groups: Getting more useful feedback from
consumers. In Proc. Game Dev. Conf.
Hjelm, S.I. (2003). The making of brainball. Interactions, 10, 26–34.
Hutt, C. (1979). Exploration and play. In Play and learning, B. Sutton-Smith (Ed.), New York:
Gardner Press, pp. 175–194.
Jorna, P.G.A.M. (1992). Spectral analysis of heart rate and psychological state: A review of its
validity as a workload index. Biological Psychology, 34, 237–258.
Journey to the Wild Divine. Retrieved March 2008 from http://wilddivine.com/.
Kalsbeek, J.W.H., & Ettema, J.H. (1963). Scored regularity of the heart rate pattern and the
measurement of perceptual or mental load. Ergonomics, 6(306).
Kalsbeek, J.W.H., & Sykes, R.N. (1967). Objective measurement of mental load. Acta
Psychologica, 27, 253–261.
Lang, P.J. (1995). The emotion probe. American Psychologist, 50(5), 372–385.
Lazzaro, N. (2004) Why we play games: 4 keys to more emotion. In Proc. Game Dev. Conf.
Lego mindstorms community bulletin boards. Retrieved January 2004, from http://
mindstorms.lego.com/eng/forums/
Li, Z., & Sun, F. (2005). Pupillary response induced by stereoscopic stimuli. Experimental
Brain Research, 160(3), 394–397.
Mandryk, R.L., & Atkins, M.S. (2007). A Fuzzy Physiological Approach for Continuously
Modeling Emotion During Interaction with Play Environments. International Journal of
Human-Computer Studies, 6(4), pp. 329–347.
Mandryk, R.L., Atkins, M.S., & Inkpen, K.M. (April 2006). A Continuous and Objective
Evaluation of Emotional Experience with Interactive Play Environments. in Proceedings of
the Conference on Human Factors in Computing Systems (CHI 2006). Montreal, Canada,
pp. 1027–1036.
Mandryk, R.L., Inkpen, K.M., & Calvert, T.W. (March–April 2006). Using Psychophysiological
Techniques to Measure User Experience with Entertainment Technologies. Behaviour and
Information Technology (Special Issue on User Experience), Vol. 25, No. 2, pp. 141–158.
Mandryk, R.L. (2005). Modeling User Emotion in Interactive Play Environments: A Fuzzy
Physiological Approach. Ph.D. Dissertation, School of Computing Science, Simon Fraser
University, Burnaby, BC, Canada.
Mandryk, R.L., & Inkpen, K.M. (2004) Physiological indicators for the evaluation of co-
located collaborative play. In Proc. CSCW 2004, ACM Press, 102–111.
233
CHAPTER FOURTEEN • PHYSIOLOGICAL MEASURES FOR GAME EVALUATION
Marshall, C., & Rossman, G.B. (1999). Designing qualitative research. Thousand Oaks, CA:
Sage Publications.
Martini, F.H., & Timmons, M.J. (1997). Human anatomy (2nd ed.), Upper Saddle River, New
Jersey: Prentice Hall.
Meshkati, N. (1988). Heart rate variability and mental workload assessment. In P.A. Hancock, &
N. Meshkati (Eds), Human mental workload, North-Holland: Elsevier Science Publishers.,
pp. 101–115
Mulder, G. (1979). Sinusarrhythmia and mental workload. In N. Moray (Ed.), Mental work-
load: Its theory and measurement, New York: Plenum, pp. 299–325.
Nielsen, J. (1992). Evaluating the thinking-aloud technique for use by computer scientists. In
Advances in human-computer interaction, H.R. Hartson, & D. Hix (Eds), Norwood: Ablex
Publishing Corporation pp. 69–82.
Norman, D.A. (2002). Emotion and design: Attractive things work better. Interactions, 9(4),
36–42.
Pagulayan, R.J., Keeker, K., Wixon, D., Romero, R., & Fuller, T. (2002). User-centered design
in games. In , Handbook for human-computer interaction in interactive systems, J. Jacko,
& A. Sears (Eds), Mahwah, NJ: Lawrence Erlbaum Associates, Inc., pp. 883–906
Papillo, J.F., & Shapiro, D. (1990). The cardiovascular system. In Principles of psychophysiol-
ogy: Physical, social, and inferential elements L.G. Tassinary (Ed.), Cambridge: Cambridge
University Press, pp. 456–512.
Partala, T., Surakka, V., and Vanhala, T. (2005). Person-independent estimation of emotional
experiences from facial expressions In Proceedings of the 10th international conference on
intelligent user interfaces, San Diego: ACM Press, pp. 246–248.
Porter, G., Troscianko, T., & Gilchrist, I.D. (2007). Effort during visual search and counting:
insights from pupillometry. The Quarterly Journal of Experimental Psychology, 60(2),
211–229.
Richter, P., Wagner, T., Heger, R., & Weise, G. (1998). Psychophysiological analysis of mental
load during driving on rural roads- a quasi-experimental field study. Ergonomics, 41(5),
593–609.
Rowe, D.W., Sibert, J., and Irwin, D. (1998) Heart rate variability: Indicator of user state as
an aid to human-computer interaction. In Proc. CHI ‘98, 480–487.
Sharp, H., Rogers, Y., & Preece, J. (2007). Interaction Design: Beyond human-computer inter-
action. West Sussex. England: John Wiley & Sons Ltd.
Stern, R.M., Ray, W.J., & Quigley, K.S. (2001). Psychophysiological recording. New York:
Oxford University Press.
Sweetsner, P., & Wyeth, P. (2005). GameFlow: A model for evaluating player enjoyment in
games. ACM Computers in Entertainment, 3(3). Article 3A.
Thought Technologies. Retrieved March 2008 from http://thoughttech.com/index.htm.
van Ravenswaaij-Arts, C.M.A., Kollee, L.A.A., Hopman, J.C.W., Stoelinga, G.B.A., &
van Geijn, H.P. (1993). Heart rate variability. Annals of Internal Medicine, 118(6),
436–447.
Venables, P.H., & Christie, M.H. (1973). Mechanisms, instrumentation, recording techniques,
and quantification of responses. In Electrodermal activity in psychological research, W.F.
Prokasy, & D.C. Raskin (Eds), New York: Academic Press, pp. 2–124.
Vicente, K.J., Thornton, D.C., & Moray, N. (1987). Spectral analysis of sinus arrhythmia: A
measure of mental effort. Human Factors, 29(2), 171–182.
Wastell, D.G., & Newman, M. (1996). Stress, control and computer system design: A psycho-
physiological field study. Behavior and Information Technology, 15(3), 183–192.
234
14.9 REFERENCES
235
CHAPTER
FIFTEEN
TRUE Instrumentation:
Tracking Real-Time
User Experience in
Games
Eric Schuh, Daniel V. Gunn*, Bruce Phillips*, Randy J.
Pagulayan*, Jun H. Kim, and Dennis Wixon*
Microsoft Game Studios
Eric Schuh has been interested in games ever since a square white ball bleeped and
blooped across his screen in the early 1970s. He has been a member of the Games
User Research team since 2002. In that time, he’s worked on franchises such as
Fable, Project Gotham Racing, Forza, and Crackdown, developing innovative research
techniques to answer complex research questions. Eric has a long history doing user
research on how people use technology dating back to 1993, working on projects as
varied as MSN, Access, printers, and network administration tools. Not a bad career
for a guy who almost, but not quite, got his PhD from the University of Washington
in social psychology.
*
For photos and biographical information about these authors, see Chapter 4’s introduction.
237
CHAPTER FIFTEEN • TRUE INSTRUMENTATION
Abstract
Collecting customer feedback on how people play your game—identifying where the
design is too confusing, too easy, too lethal, and so on—can dramatically improve
the users’ experience with the game and increase its chances of success in the mar-
ketplace. Although there are many methods for collecting users’ impressions and
experience of games—focus groups, usability testing, playtesting—feedback from
these is based on a limited exposure to the game. When these methods are used,
there is a pragmatic reason for limiting the feedback from users: it is labor intensive
to observe people playing the game at a level of detail that is needed to spot prob-
lems and identify their underlying causes.
This chapter outlines a system for the automated collection of gameplay
feedback, enabling development teams to understand how people experience the
entire length of the game. By having the game automatically log behaviors of
interest—player deaths, items collected, levels completed—it is possible to collect
large amounts of data efficiently. When paired with other data streams, such as cap-
tured video and in-game surveys, it is possible to understand what people are doing
in your game, what elements of the design are causing them to behave the way they
238
15.1 THE GENESIS
do, and how they feel about their experience of the game. In many cases, under-
standing behavior, its causes, and user evaluation is precisely the information you
need in order to improve your game.
Automated capture of user data is not new. In games, post-match stats and lea-
derboards have been around as long as videogames have. Although relatively rare,
using these data to improve a game is not unique: Valve has used automated data
capture and Steam to tweak the difficulty of Half Life 2: Episode 2 after it was
released. This chapter will provide details on how to use automated data collection
to improve a game before it is released, improving the quality without negatively
impacting the schedule. We will illustrate how Microsoft Game Studios has used
automated data collection to improve games at various stages of development. In
addition, we will share best practices we developed in the course of using this form
of data collection for the past five years on over thirty games.
239
CHAPTER FIFTEEN • TRUE INSTRUMENTATION
FIGURE
15.1
Usability lab at Microsoft with observer side (left) and participant side (right).
240
15.1 THE GENESIS
FIGURE
15.2
One of three Playtest Labs at Microsoft Game Studios: Entire lab (left) and an individual
station (right).
more information. Were the platforming sections in the carnival portion of the world
(the last level) too difficult? Where were people missing jumps? What puzzles were
people failing to complete throughout the game, and why? How long did it take to
beat the bosses, and how many attempts did it take?
241
CHAPTER FIFTEEN • TRUE INSTRUMENTATION
FIGURE
15.3
we noticed people spending an inordinately long time and having lots of falling
deaths on Chapter 19 (of 32)—The Rat Rodeo. This level is a Boss Battle involving
a screeching opera statue (the boss), a crumbling floor, falling bricks, and our hero
Vince riding a flea-bitten rat as shown in Figure 15.3. The player must guide the rat
around a circular room, maneuvering so that falling bricks hit Vince on the head
(he is a Voodoo Doll, after all, so hurting him hurts his enemies) while jumping
over pits that open up as the floor disintegrates. Making matters worse, the enraged
statue periodically shrieks, sending out a shock wave that pushes poor Vince into a
waiting pit.
The chapter proved very difficult—it was hard to tell where the falling bricks
would land, so it was tricky to maneuver the rat into the proper position; the floor
would disintegrate with little warning, sending players to their doom; and it was
difficult to judge where the shock wave was, making it almost impossible to avoid.
As a result, people were dying left and right, were spending over an hour in the
chapter, and were generally frustrated. See Figure 15.4 for a chart of deaths in this
portion of the game.
242
15.1 THE GENESIS
80 15.4
60
Deaths
40
20
0
Crypt keeper Sarcophagus Zombie Downtown Crypt rat race
jam hustle guidance city rodeo
counselor
Death locations
Player deaths (Total) Player deaths (Mean)
Frequency and mean number of deaths in Voodoo Vince for each Crypt City level.
243
CHAPTER FIFTEEN • TRUE INSTRUMENTATION
We developed a series of requirements for this tool which formed the foundation
for what we have called the TRUE (Tracking Real-Time User Experience) method.
The hallmarks of TRUE instrumentation include:
15.1.6 Surveys
When specifying the requirements for the TRUE instrumentation tool, we looked
at the shortcomings of the data collected in Voodoo Vince. One of the biggest flaws
was that we only captured behaviors and not any attitudes. Recording what people
do is vitally important, but it only tells an incomplete story. It captures the mechan-
ics and dynamics of a game but not its aesthetics (Hunicke, Leblanc, & Zubek,
2001). Knowing that someone died ten times against a particular enemy is interest-
ing but hard to interpret. Is the person frustrated by the repeated deaths at the hand
of the same enemy, or are they enjoying the challenge associated with figuring out
how to take out an effective adversary? Put differently, are these ten deaths a prob-
lem that need to be addressed, or are they a key component to the overall enjoy-
ment of the game that should be preserved? Without collecting the attitudinal data,
we will never know.
To address this shortcoming, we added the ability to include brief surveys within
a game itself. At certain points of a game, the game would pause and display brief
questions on the screen as seen in Figure 15.5. The participant would use the con-
troller to select a desired response, and then hit the A button to register that feed-
back. There were 3 main categories of in-game surveys we wanted to support, each
suited for answering different sorts of questions. These categories are:
244
15.1 THE GENESIS
FIGURE
15.5
eventually they will become frustrated with this constant intrusion. To avoid
this problem, we recommend that you display surveys as infrequently as needed
to get the information you need (we have ranged from every 3 minutes to every
10 minutes), and that they are displayed when there is a natural break in the
action (after combat is complete, not in the middle of a sword swing).
245
CHAPTER FIFTEEN • TRUE INSTRUMENTATION
FIGURE
15.6
separate or combine the data from the two different groups. Having a test name
field recorded with every data point allows you to do so.
● Participant ID—you want to be able to identify which participant the data are
coming from.
● Timestamp—you need to know when the data were collected.
● Difficulty setting—you want to be able to tease apart whether a problem you
identify in the data is common for everyone, or just people playing a certain level
of difficulty.
● Chapter name/mission name/quest name/level name/map name—the specifics
will depend on what type of game you’re testing, but there should be some indi-
cation of what portion of the game the data come from.
● Position coordinates—recording the x, y, z coordinates of every piece of data
allows you to display that information on a map, which is an incredibly
powerful way of identifying where problems are occurring. See example in
Figure 15.6.
246
15.2 PUTTING IT TOGETHER: TRUE INSTRUMENTATION
PC
Survey Logfile SQL &
parser reporting Report
Event set server
-Triggering event
-Participant response
-Contextual info
Video -Timestamp
encoder Viewer
Video synched with logs using timestamp
15.1.8 Video
We also added captured video to our instrumentation so that while participants play
the game, a video capture card records their on-screen activity. These videos are
then synched with the instrumentation data using the always-present timestamp
information. The combined data can then be included in a SQL database, upon
which detailed reports can be built (see Figure 15.7). The resulting reports allow
researchers and team members to skip painful hours of reviewing videos, instead
jumping directly to the item of interest (a death, a level completion, a survey
response of “I’m lost”, etc.).
247
CHAPTER FIFTEEN • TRUE INSTRUMENTATION
FIGURE 3.5
15.8
3.0
Mean deaths per encounter
2.5
2.0
1.5
1.0
0.5
0.0
M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12
Missions
The missions are linear, containing between ten and thirty smaller encoun-
ters. Overall, there are over two hundred encounters in the Halo 2 single-player
campaign.
To implement the TRUE method, we partnered with designers and engineers and
automatically recorded player deaths, which enemy killed them, what weapon the
enemy was using, and other combat related variables. We also recorded the rele-
vant contextual information with each data point—what mission, which encounter,
and what time the event occurred. For the attitudinal data, we used a time-based
in-game survey (approximately three minute intervals) asking players to rate their
perception of the game’s difficulty and their sense of progress through the game.
Finally, we captured video as each participant played the game and synched that up
with our instrumentation data.
On average, a given testing session consisted of approximately 25 participants
that came onsite to play through the campaign over the course of two days. After a
given session, we were able to quickly view participant performance at a high level
across the individual missions (see Figure 15.8). In this example, we were interested
in the number of times a player died in each mission.
As shown in Figure 15.8, there were more player deaths in Mission 10 (M10)
than in the other missions—more than we had expected. However, viewing total
deaths across missions did not tell us how participants died or whether participants
found this frustrating.
To better understand what was happening, we drilled down into the specific mis-
sion data to see how participants died in each of the encounters comprising this
mission (see Figure 15.9). Using a Web front end, we could simply click on the
bar of interest in the graph to drill down to another level of detail to the average
deaths for each encounter within Mission 10. From there, we observed a potential
problem in the third encounter of the mission, as you can see clearly in Figure 15.9.
248
15.2 PUTTING IT TOGETHER: TRUE INSTRUMENTATION
20 FIGURE
Mean number of deaths 15.9
15
10
0
M10_01
M10_02
M10_03
M10_04
M10_05
M10_06
M10_07
M10_08
M10_09
M10_10
M10_11
M10_12
M10_13
M10_14
M10_15
M10_16
M10_17
M10_18
M10_19
M10_20
M10_21
M10_22
M10_23
M10_24
M10_25
Missions 10 encounters
Although these data helped us locate the area of a potential issue with the mission
difficulty, it did not provide sufficient information to explain what, in particular, was
causing participants difficulty. We knew that during this particular encounter in the
mission, participants were fighting successive waves of enemies in a large room.
However, we did not know what specifically was causing them to die.
To figure this out, we drilled down even further into specific details of this
encounter. Specifically, we were able to break out deaths into the particular causes.
In this example, the Brutes (one of the enemies the player had to defeat) were
responsible for 85 percent of participant deaths.
Drilling down into the data even further (again, by the click of the graph), we
identified three primary ways participants were dying: Brute Melee attacks, Plasma
Grenade Attach (the grenade sticks to the player), and Plasma Grenade Explosions.
Being able to isolate the exact cause of deaths was important because in Halo 2
there are numerous ways enemies can kill a player. This successive drill down
approach allowed us to quickly discover the main causes of participant deaths
within minutes.
However, we still did not completely understand how this was killing the partici-
pants. Because the combat system in Halo 2 is complex, we turned to the designers
of that system to provide further insight and also took advantage of a key com-
ponent of our instrumentation—captured video. For each death we could link to
a video that showed us exactly what happened. With the game designers by our
side, we viewed several of the participant deaths—in particular, deaths where par-
ticipants died due to direct hits from a plasma grenade.
After watching these videos, the designers were able to immediately pick up on a
subtle nuance in the game mechanic that only they were able to identify. The Brutes
in this section of the game threw grenades faster and with less of an arc (compared
249
CHAPTER FIFTEEN • TRUE INSTRUMENTATION
FIGURE 200
15.10 Brute melee attacks
Plasma grenades
attach
150 Plasma grenades
explosion
Total player deaths
100
50
0
Pre-design changes Post-design changes
Test sessions
Number of player deaths before and after design changes were implemented.
to other enemies which could throw plasma grenades) which gave participants less
time to react.
Using this information, the designers made several changes to reduce difficulty.
Specifically, they prevented Brutes from throwing Plasma Grenades altogether (for
that encounter). In addition, they reduced the overall number of enemy waves and
spawned enemies in only one location. Previously, enemies could spawn from sev-
eral locations in the room players were fighting in. This minimized the chance that
players would get melee’d from behind.
A week later, we tested a new build of the game that included the fixes the
designers had generated. We brought in a different set of participants, had them
play the game, and then checked to see whether the design changes worked. In
looking at the new data, we saw a dramatic reduction in the number of deaths,
especially those from Brute Melee’s and by plasma grenades (See Figure 15.10).
So, we solved the problem of too much death. But did we go too far? In tweaking
the Brute behaviors (who are supposed to be scary and lethal) and the way plasma
grenades were to be used (which are supposed to be devastating), did we make this
encounter too easy? We turned to the in-game survey data to answer that question.
Similar to the way we drilled into the death data, we were able to quickly assess
the frequency of responses for the time-based survey data. As Figure 15.11 shows,
the changes did not negatively affect how people perceived this encounter. The per-
centage of responses indicating that the level of difficulty was “about right” jumped
250
15.3 FORZA 2: PRODUCTION AND POLISHING
100% 15.11
0%
Pre-design changes Post-design changes
Test sessions
Percent of player responses before and after design changes were made.
from 43 to 74 percent, while the percentage of people who felt the encounter was
“too easy” stayed incredibly flat and very low (only 3 percent indicated it was too
easy after making the changes).
As a reminder, the real-world example we’ve been discussing pertains to only
1 of the 211 encounters in Halo 2. We discovered many issues across the entire cam-
paign, worked with designers on changes, and verified that those design changes
worked. This would not have been possible without TRUE instrumentation as well as
the experiences gleaned from doing this sort of research on a variety of games in vari-
ous points of the development cycle.
The following sections provide examples of what data collection and impact can
look like at various points in the game development lifecycle. We start off with a sim-
ple, straight-forward, example of assessing design intent during the polishing phase
of production utilizing TRUE. Next, we discuss the utility of TRUE testing during the
beta phase of a project. Finally, we demonstrate how TRUE can be successfully uti-
lized to inform demo construction of a game.
251
CHAPTER FIFTEEN • TRUE INSTRUMENTATION
team right up until ship. We conducted one particular TRUE test on Forza 2 during
late production which demonstrates how valuable data can be gleaned very quickly,
very late in production, and still have a tremendous impact on a game.
Forza 2 is a racing simulation game released in the Spring of 2007. One defin-
ing characteristic of the game is its realistic driving physics; thus, trying to drive a
Ferrari F430 GT is considerably more challenging than driving a Ford Focus (as it
would be in the real-world). The game includes two single player modes, Career
and Arcade, in addition to a multiplayer mode. In Career Mode, players start off with
access to slow, low-end, cars and gradually earn credits and access to faster cars by
doing well over multiple races. In Arcade Mode, players have the ability to jump
right in and race faster high-end, and typically more difficult to handle, cars. While
we used TRUE to help balance and tune progression in Career Mode during produc-
tion, there was one big aspect in Arcade Mode that we wanted to get user feed-
back on before the game shipped, and that was the Time Trials. Yet, our window of
opportunity for testing the time trials in Arcade Mode came at a point very late in
production. In fact, we had only one week to run the test, turn the data around, and
make changes to the time trials before they were locked down permanently. The fol-
lowing example relates to our efforts to polish the Arcade Mode time trials.
Forza 2 contains 25 different time trials in Arcade Mode. The player’s goal for
each trial is simple: Complete at least one lap on the track faster than the pre-speci-
fied target lap time for that trial. If the player successfully beats the target lap time,
the car used for that trial is unlocked and added to the player’s Arcade Garage. If the
player is not successful, the car remains locked and players cannot use it in other
parts of Arcade Mode. One challenging aspect for players is that the car, the track
ribbon, and the target lap time to beat are all preset. Thus, if players would typi-
cally shy away from racing with the powerhouse Nissan Silvia Top Secret (a car not
available until well into Career Mode) because of its notoriously intractable han-
dling, in order to beat time trial #3, they would have to use that Silvia. In addition, it
wouldn’t be the Silvia on an oval track, it would be on a challenging real-world race-
track (i.e., Tsukuba) and the lap time would need to be under 46 seconds to boot!
While the design intent behind the time trials was that they be challenging for
players, the designers did not want them to be overly frustrating. Indeed, what
could be more frustrating to a player than the inability to unlock a car for their
arcade garage because of a target lap time they perceive to be impossible beat after
the 50th unsuccessful lap? Luckily, the Forza 2 designers had a clear design intent
for the time trials that we could use to test against: The time trials should be chal-
lenging to players but approximately 80 percent of the target users should be able to
complete any particular time trial and unlock the car after ten laps.
252
15.3 FORZA 2: PRODUCTION AND POLISHING
The target lap times for each trial were determined by several members of the team
but not tested with users. Although each trial target time felt challenging, but doable,
by the members of Design and Test that helped determine them, these team members
were no longer “typical players” for Forza 2. In fact, they had been racing on some
of the same tracks with the same cars for several months during the game’s develop-
ment. This fact presented us with a problem: we didn’t know how close the target
times determined by the team members mapped onto the skill level of the typical
player. If the target times were set too low, less than 80 percent of the target popula-
tion would be able to complete each one after ten laps. Our research question became,
“What percentage of typical Forza 2 players can complete each time trial after ten laps
on a track?”
TA B L E 15.1 Forza 2 Time Trials: Target lap time and percentage of participants beating it for
each trial
1 33.81 24 percent
2 44.69 41 percent
3 45.697 0 percent
4 51.292 97 percent
5 55.602 7 percent
6 56.033 54 percent
7 57.132 37 percent
8 57.782 19 percent
9 58.731 59 percent
10 59.964 8 percent
11 62.8 8 percent
12 68.5 12 percent
13 73.8 19 percent
14 77.5 12 percent
15 78.5 12 percent
253
CHAPTER FIFTEEN • TRUE INSTRUMENTATION
TA B L E 15.1 Continued
were consistently too difficult. In fact, only one time trial (that is, #4) met the design
intent criteria. Indeed, six of the twenty-five time trials were unbeatable by players
after ten laps.
254
15.4 SHADOWRUN–BETA
15.4 Shadowrun—Beta
There are times when the research questions we have require a large number of
players—more than the fifty-one that we can simultaneously accommodate in our
playtesting facilities. For example, we may need to test matchmaking systems that
support thousands of simultaneous players, balance game economies, or tweak the
attributes of character classes. In some cases, we can get the data we need from
players individually interacting with a game. Sometimes, we need many—often
thousands—of players simultaneously interacting with both the game and one
another. In these situations, we beta test the game and collect instrumentation data
from many thousands of players over an extended time. In this section, we discuss
the use of instrumentation in the Shadowrun beta.
255
CHAPTER FIFTEEN • TRUE INSTRUMENTATION
FIGURE 80%
15.12 Percent of participant selection
70%
60%
50%
40%
30%
20%
10%
0%
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Day
Character classes participants selected over the course of a month of the Shadowrun beta
test.
differentiate it from the other classes and therefore appeal to different play-styles. To
gather the relevant data, the game logged each time a player joined a game and chose
a character. We were able to look at both the choices of each player in each game as
well as the choices of each gamer across the many games they played. This way,
we could determine both overall trends in choices as well as individual preferences
for character races. We also logged the success of each player both in terms of how
well they individually did in a game and whether their team won or lost.
We found, by tracking character choices, that over time one character class
was clearly preferred to all of the others. If the classes were all as fun or effective,
we would expect to see each being selected approximately 25 percent of the time.
Figure 15.12 shows the actual character classes participants selected over the course
of a month of the beta test. The Elf class, the top line in the graph, was significantly
more popular than the other classes. The fluctuating popularity of the character
classes shown on the left of Figure 15.12 (the beginning of the month) occurred dur-
ing a period when we introduced 1000 new participants into the beta. The popular-
ity of the races fluctuated as the new participants explored each of the races. A few
days later, however, the preference for Elf once again emerged. This illustrates the
advantages of beta testing over an extended period of time. A research methodol-
ogy that limited our investigation to a few games—or even a few days of gaming—
would have given us a misleading picture of character class preferences.
256
15.5 CRACKDOWN: DEMO
classes. Each character class had its strengths and weakness, but no class was
clearly dominating the others.
Beta testing with instrumentation enabled us to tweak the class variables in the
game to achieve the designers’ intent of having a game with well-balanced and dif-
ferentiated character classes. Conducting the beta test over several months enabled
us to collect data that showed both how new users approached the game as well
as patterns in players’ behavior over the course of many game sessions. The large
number of participants in the beta test also enabled us to examine the variables
of interest with many different types of players in many different game situations.
Further, being able to iterate on the game and continue testing with the same par-
ticipants enabled us to verify design changes before the game was available to con-
sumers, none of which would have been possible using in-house testing or without
instrumented data collection.
257
CHAPTER FIFTEEN • TRUE INSTRUMENTATION
258
15.7 LESSONS LEARNED
1. Preproduction
● A high-level instrumentation plan is in place. This consists of the types
consists of the research questions, the mock reports, a list of roles and respon-
sibilities, and a timeline.
● All of the main hooks are set by Code Complete. There may be some minor
iteration of hooks post-Code Complete, but the bulk of the heavy lifting
should be done.
● Data collection starts, reports are iterated as needed. For some additional
examples of reports generated from our TRUE instrumentation for Bioware,
see DeRosa (2007)
● http://www.gamasutra.com/view/feature/1546/tracking_player_feedback_
to_.php
3. Polish/Bug Fixing
● Data collection continues at a frenzied pace, the game itself is iterated and
verified to improve.
● If you’re doing an instrumented beta, it probably happens here.
● Any work on the demo also occurs here.
4. Post-release
● In the case of key franchises, you may want to collect some data to better
259
CHAPTER FIFTEEN • TRUE INSTRUMENTATION
instrument builders all play a role. Below we outline some of the lessons we have
learned over the years of conducting successful instrumentation projects.
260
15.7 LESSONS LEARNED
Rather than tracking everything, you should limit yourself to the handful of
variables (and related contextual information) needed to know whether there is a
problem. For example, if you want to identify problematic combat encounters, all
you need to track are deaths, cause of death, and how the player felt about the
death. That is sufficient to say “there’s a problem here,” and you could review the
synched video in order to understand why that problem is occurring. Depending on
the game, we typically recommend that you track no more than 15 different events,
along with related contextual information (timestamp, x and y coordinates, etc.).
When determining what variables to track, the maxim “less is more” truly holds.
261
CHAPTER FIFTEEN • TRUE INSTRUMENTATION
missing variables, come up with new research questions, or concede that certain
pieces of information they thought would be important are actually not needed.
Other benefits of mocking up reports before setting the hooks include:
● Helping the developer who is setting the hooks make decisions on how to set the
hooks. Frequently, developers can set hooks in multiple ways, including ways
that affect what the data mean. When they know what you are trying to achieve,
they can make better decisions on how to set the hooks.
● Getting buyoff from the stakeholders on what the instrumentation effort will
yield. For many people, seeing is believing.
● Quicker turnaround of reports after data collection. If you mock up your reports
in your reporting tool, you can simply change the data source from the fake data
to the real data, and your reports are instantly updated. If you wait until the
actual time of data collection to build your reports, there will be a substantial
delay between when you collect the data and when you can actually use it. At
this phase of the game lifecycle, time is the most precious element, so quick turn-
around of data is essential.
262
15.7 LESSONS LEARNED
want 90 percent of people to get a lap time of 2:00 or less within 5 attempts in a D
class car or 3 attempts in a C class car.”) (Romero, 2008). Knowing the design intent
vastly simplifies issue identification. It is possible to build reports that show devia-
tions from expected levels of performance, quickly highlighting problematic areas.
Even if design intent is not explicitly stated, designers implicitly know how they
want their game to be played. Designers are so intimately familiar with the game
and the experiences they are trying to create, they can often spot problems in the
data that others overlook. For this reason, designers should always take the time
to review the data after each test. And because they will be doing so, it is vital that
Design be involved in focusing the research question (Lesson 1) and reviewing the
reports (Lesson 4).
263
CHAPTER FIFTEEN • TRUE INSTRUMENTATION
collecting attitudinal feedback on what people think about different aspects of the
game (such as in playtests).
In reality, instrumentation can work hand in hand with other user research tech-
niques (Kim et al., 2008). For example, we used instrumentation and usability test-
ing together to improve a real-time strategy game. In this series of studies, people
played the single-player campaign over the course of a weekend. The single-player
campaign consisted of a series of missions which had to be completed in order for
the game to progress. We used instrumentation to get a rough understanding of
which missions were most problematic, and then did a detailed follow up usability
study on only those problematic missions. We simply did not have the time needed
to do the detailed usability testing of every single mission. By doing the instrumen-
tation, we were able to identify which missions were most problematic and focus
our research energies there.
15.8 References
Bungie.net website. (n.d.). Retrieved February 6, 2008, from http://www.bungie.net/
Crackdown (Computer software). (2007). Redmond, WA: Microsoft.
Davis, J., Steury, K., & Pagulayan, R. (2005). A survey method for assessing perceptions of a
game: The consumer playtest in game design. Game Studies: The International Journal of
Computer Game Research, 5. Retrieved February 7, 2008, from http://www.gamestudies.
org/0501/davis_steury_pagulayan/
DeRosa, P. (2007). Tracking Player Feedback to Improve Game Design. Gamasutra, 7.
Retrieved February 2, 2007, from http://www.gamasutra.com/view/feature/1546/
tracking_ player_feedback_to_.php
Dumas, J.S., & Redish, J.C. (1999). A practical guide to usability testing (Rev. ed.), Portland,
OR: Intellect Books.
Forza 2 (Computer software). (2007). Redmond, WA: Microsoft.
Fulton, B. (2002). Beyond Psychological Theory: Getting Data that Improve Games.
Proceedings of the Game Developer’s Conference, San Jose, CA.
Half-life 2 instrumentation stats webpage. (n.d.). Retrieved February 6, 2008, from http://
www.steampowered.com/status/ep2/ep2_stats.php
Halo 2 (Computer software). (2004). Redmond, WA: Microsoft.
Hunicke, R., LeBlanc, M. & Zubek, R. (2001). MDA: A formal approach game design and
research. Workshop at the AAAI (American Association for Artificial Intelligence)
2001 Conference, North Falmouth, MA. Retrieved February 2, 2007, from http://www.
cs.northwestern.edu/~hunicke/pubs/MDA.pdf
Kim, J.H., Gunn, D.V., Schuh, E., Phillips, B., Pagulayan, R.J., & Wixon, D. (2008). Tracking
Real-Time User Experience (TRUE): A comprehensive instrumentation solution for com-
plex systems. Proceedings of the SIGCHI conference on Human factors in computing sys-
tems, Florence, Italy.
Medlock, M.C., Wixon, D., McGee, M., & Welsh, D. (2005). The Rapid Iterative Test and
Evaluation Method: Better Products in Less Time. In R. Bias, & D. Mayhews (Eds),
Cost Justifying Usability: An Update for the Internet Age (pp. 489–517), NY: Morgan
Kaufmann.
264
15.8 REFERENCES
Medlock, M.C., Wixon, D., Terrano, M., Romero, R.L., & Fulton, B. (2002). Using the RITE
method to improve products: a definition and a case study. Proceedings of Usability
Professionals’ Association 2002 Annual Conference, Orlando, FL.
MGS Games User Research Website. (n.d.). Retrieved February 6, 2008, from http://www.
mgsuserresearch.com/
Nichols, T. (2007). User Research in the Games Industry: Opportunities for Education. HFES
Bulletin, 40(4), 1–2.
Nielsen, J. (1993). Usability engineering. San Francisco, CA: Morgan Kaufmann.
Pagulayan, R., Gunn, D., & Romero, R. (2006). A Gameplay-Centered Design Framework
for Human Factors in Games. In W. Karwowski (Ed.), 2nd Edition of International
Encyclopedia of Ergonomics and Human Factors (pp. 1314–1319), Boca Raton, FL: Taylor &
Francis.
Pagulayan, R., Keeker, K., Fuller, T., Wixon, D., & Romero, R. (2007). User-centered design
in games (revision) (2007). In J. Jacko, & A. Sears (Eds), Handbook for Human-Computer
Interaction in Interactive Systems: Fundamentals, Evolving Technologies and Emerging
Applications (pp. 741–760), Mahwah, NJ: CRC Press.
Pagulayan, R.J., Steury, K.R., Fulton, B., & Romero, R.L. (2003). Designing for fun: User-
testing case studies. In M. Blythe, K. Overbeeke, A. Monk, & P. Wright (Eds), Funology:
From Usability to Enjoyment (pp. 137–150), New York: Springer.
Romero, R. (2008, February). Tracking attitudes and behaviors to improve games: Successful
instrumentation. Presentation at the annual meeting of the Game Developers Conference,
San Francisco, CA.
Shadowrun (Computer software). (2007). Redmond, WA: Microsoft.
VALVE STEAM website. (n.d.). Retrieved February 6, 2008, from https://steamcommunity.
com/
Voodoo Vince (Computer software). (2003). Redmond, WA: Microsoft.
WOW Jutsu website. (n.d.). Retrieved February 6, 2008, from http://wowjutsu.com/world/
265
CHAPTER
SIXTEEN
Interview with
Georgios Yannakakis,
Assistant Professor
at the Center for
Computer Games
Research,
IT-University of
Copenhagen
Interviewer:
Katherine Isbister
267
CHAPTER SIXTEEN • INTERVIEW WITH GEORGIOS YANNAKAKIS
The IEEE task force on player satisfaction modeling (PSM) is a community of aca-
demics and game developers interested in heterogeneous approaches for capturing
and optimizing player satisfaction in human-computer interactive systems. Games,
a prime example of such systems, constitute the main application test-bed for the
approaches investigated by the members of the task force.
Georgios, what exactly is this PSM task force and why might it be valuable to game
developers?
The task force focuses on a wide range of approaches regarding quantitative
player satisfaction modeling and artificial intelligence (AI) for improving play-
ing experience. The idea is to encourage a dialog among researchers in the AI,
human-computer interaction, cognitive modeling, affective computing and psy-
chology disciplines who investigate methodologies for improving user (player)
experience.
Optimizing player satisfaction is the second research focus of the community.
That is, given successful models of player satisfaction how can we adjust the inter-
active systems in order to improve player experience.
Game developers may find valuable research results for improving the quality
of their games through player satisfaction modelling and optimization techniques
established within our task force. Interested developers may want to partici-
pate in the task force, and also, keep an eye out for research publications in this
emerging field of PSM. So far there are promising results in small-scale games
(both screen-based and real-world physical interaction-based games) that can
be used as a starting point. For instance, work by Yannakakis and Hallam [1,2]
in arcade prey/predator games; Togelius et al. [3] in racing games; Spronck et al.
[4] in fighting games and so on.
Could research done by the task force actually improve a game company’s financial
bottom line?
We have indications of high interest from the game industry and it is my belief that
pilot studies developed during basic research now may go commercial within the
268
CHAPTER SIXTEEN • INTERVIEW WITH GEORGIOS YANNAKAKIS
next five years. I believe that building middleware capable of capturing player satis-
faction in real-time will deliver products of higher commercial/marketing value and
will automate specific game development processes like user testing.
The ways to achieve this is by bringing our results to the attention of the indus-
try and provide evidence for the robustness and efficiency of our methodologies.
However, in order to be more convincing about the potential of PSM, approaches
should be first tested and evaluated in commercial-standard games.
If someone wanted to get started with this, where would your recommend that they
turn? (books, first steps, etc)?
The PSM field is quite new; however, the increasing interest from academics and
game developers has resulted to a significant number of research articles in the
topic. Articles related to PSM can be found in the proceedings of the two workshops
(in conjunction with SAB’06 and AIIDE’07) organized prior to the establishment of
the task force. The IEEE computational intelligence and games (CIG) symposia and
the AI for Interactive Digital Entertainment (AIIDE) Conference series include such
articles, too. User and affective modeling related conferences cover some aspects of
PSM.
I would highly recommend a visit to the IEEE-PSM website (http://game.itu.dk/
PSM/) for further information on people, groups and articles related to PSM.
16.1 References
Yannakakis, G.N., & Hallam, J. (2007). Modeling and Augmenting Game Entertainment
through Challenge and Curiosity. International Journal on Artificial Intelligence Tools, vol.
16(issue 6), 981–999.
Yannakakis, G.N., & Hallam, J. (2007). Towards Optimizing Entertainment in Computer
Games. Applied Artificial Intelligence, 21, 933–971.
J. Togelius, R. De Nardi, and S. M. Lucas: Making racing fun through player modeling and
track evolution, in Proceedings of the SAB Workshop on Adaptive Approaches to Optimizing
Player Satisfaction, 2006, pp. 61–70.
P. Spronck, I. Sprinkhuizen-Kuyper, and E. Postma (2004), Difficulty Scaling of Game AI, GAME-
ON 2004: 5th International Conference on Intelligent Games and Simulation, pp. 33–37.
EUROSIS, Belgium.
269
CHAPTER
SEVENTEEN (A)
Usability for Game
Feel
Steve Swink is an independent game developer, author,
and lecturer currently based in Tempe, Arizona. As
a game designer and managing partner at Flashbang
Studios, he’s contributed to games such as Off-Road
Velociraptor Safari, Splume, and the upcoming Jetpack
Brontosaurus. Before joining Flashbang, he toiled in
the retail game mines at Neversoft and the now-defunct
Tremor Entertainment. His first book, entitled Game
Feel: The Game Designer’s Guide to Virtual Sensation,
will be published by Elsevier/Morgan Kaufmann spring 2009. He also co-chairs the
Independent Games Festival, is an IGDA Phoenix chapter coordinator, and teaches
all game design classes at the Art Institute of Phoenix. Note to self: sleep every
night, not just some.
Just so we’re on the same page, I define game feel as the tactile, kinesthetic sensa-
tion of control in a videogame. In simple terms, it’s when a videogame takes over
the action-perception cycle you normally use to cope with and navigate your every-
day physical environment. Instead of perceiving your hand moving to grab a coffee
cup in front of you, you see Mario move in response to your inputs on the control-
ler. You’re not moving your hand, you’re moving Mario. And you do not perceive
the result of the resulting movement in your space, you perceive Mario’s position
in his space. It’s a subtle transposition, where action flows from hands to controller
and into the game transparently and the eyes, ears, and hands perceive the results,
process them, and respond within a few milliseconds. If this happens uninterrupted,
in what is more commonly known as a “correction cycle,” then we have what I’d
consider game feel. There are all sorts of other bits to it—creating clues about the
nature of certain interactions using particle effects and screen shake, for example—
but that’s the core. Good game feel, then, is when the controls of a game feel intui-
tive, deep, and aesthetically pleasing. Like the difference between the feel of driving
271
CHAPTER SEVENTEEN (A) • USABILITY FOR GAME FEEL
a Porsche or a school bus with one flat tire, good game feel is subconscious, visceral
and extremely important.
Game feel is particularly bound up with usability concerns because of the deli-
cate interplay of learning, skill, and challenge. To cut to the chase, properly balanc-
ing and tuning a mechanic to feel good is one of the most difficult challenges a
game designer will face. There will always be a learning curve for a new mechanic.
Players will be frustrated at first. They understand this and tacitly agree to a cer-
tain amount of frustration on the promise that some enjoyable, engaging experience
will result. Remember what it was like to learn to ride a bike or drive a car. When
you finally got it, how did you feel? Was it worthwhile? The designer must make
delineation, then, between what frustrations are the byproduct of the skill building
process, and what is a usability concern. The problem lies primarily in the fact that
players can learn any interface. The challenge is to create a mechanic worth learn-
ing. This is a moving target—the amount of “worth” will be a factor of how difficult
it is to learn, of individual players, and of the other rewards provided for mastery.
A very difficult mechanic may be worth learning if it’s a game you play with your
friends over and over again, or if there’s an online leader board. Or it may be that
you enjoy the theme and art of the game particularly, or the story which is doled
out after the completion of challenges. Or you may just be the kind of player who
enjoys mastering exceptionally challenging mechanics. I tend to fall into this cat-
egory—I loved Gunvalkyrie, Ski Stunt Simulator, and Trials: Construction Yard (pos-
sibly the most difficult game ever to be worth playing.) Depending on your design
272
17.1 THE GAMEPLAY GARDEN
vision, extreme difficulty may be okay. As long as the reward for learning the skill
seems commensurate to your intended player, you’re all good.
What follows is one possible method for creating and testing a new mechanic,
for creating good game feel.
1. Input
Input is the player’s organ of expression in the game world, the only way a player
can speak to the game. This is an overlooked aspect of game feel: the tactile feel of
the input device. Games played with a good-feeling controller feel better. The Xbox
360 controller feels good to hold; it’s solid, has the proper weight, and is pleasingly
smooth to the touch. By contrast, the PS3 controller has been lamented as being
light and cheap feeling, like one of those third party knockoffs.
This difference in tactile feel of the input device has implications for the feel of
a given game. When I prototype something—platformer, racing game, whatever—it
will feel noticeably better if I hook up the inputs to my wired Xbox 360 control-
ler than to simple keyboard inputs. You can’t always control the input device your
player is going to use to interface with your game so you should be aware of, and
compensate for, how different input devices feel. One way to lean into a given input
device is through natural mappings.
273
CHAPTER SEVENTEEN (A) • USABILITY FOR GAME FEEL
A B
Back Front Back Front
Back Front Back Front
right left left right
Imagine trying to use each of them. Which one requires no thought to operate?
Clearly, figure C is a natural mapping: the layout of the dials correspond clearly and
obviously to the burners they activate. There is a clean, physical metaphor connect-
ing the input device and the way it can alter the system. A good example from a
modern game is Geometry Wars for Xbox 360.
Consider Geometry Wars relative to the Xbox360 controller. The way that the
joystick is formed transposes almost exactly to the motion in Geometry Wars. It’s
almost one for one: the joystick sits in a circular plastic housing that constrains its
motion in a circular way. Pushing the control stick against the edge of the plas-
tic rim that contains it and rolling it back and forth creates little circles, which
is almost exactly the analogous motion produced on screen by Geometry Wars
in response to input. This is what Donald Norman would refer to as a “natural
mapping.” There’s no explanation or instruction needed because the position and
motion of the input device correlates exactly to the position and motion of the
thing being controlled in the game. The controls of Mario 64 also have this prop-
erty; the rotation of the thumbstick correlates very closely to the rotation of Mario
as he turns, twists, and abruptly changes direction.
Another way input device affects game feel is through the inherent sensitivity of
the input device. Consider the difference between a button and a computer mouse.
A typical button has two states, on or off. It can be in one of two positions. As an
input device, it has very little sensitivity. By contrast, a typical computer mouse has
complete freedom of movement along two axes. It is unbounded; you can move it
274
17.1 THE GAMEPLAY GARDEN
as far as the surface underneath allows, giving it a huge number of possible states.
A mouse is an extremely sensitive input device.
So an input device can have an inherent amount of sensitivity, falling some-
where between a mouse (near-complete freedom in two axes) and a button (only
two states, on or off.) This is what I call input sensitivity; a rough measure of the
amount of expressiveness inherent in a particular input device.
The implication for game feel prototyping is to consider the sensitivity of your
input device relative to how fluid and expressive you want your game to be. In most
cases, this is a decision about complexity—as a general rule, additional sensitivity
means greater complexity. This is not a value judgment per se; greater sensitivity
has both benefits and drawbacks depending on the goals of the design and how the
mechanic fits into that design. What’s important to realize is the implications your
choice of input device has for the sensitivity of the game. Of course, the input device
is only half the picture. The other place to define sensitivity is in reaction: how does
the game process—and respond to—the input it receives from the input device.
2. Response
Consider the games, Zuma and Strange Attractors.
In Zuma, there is a reduction in the inherent sensitivity of the mouse as an input
device. Instead of freedom of movement in two axes, the object being controlled is
stationary. The frog character rotates in place, always looking at the cursor, clamp-
ing the mouse’s sensitivity down to something more manageable.
275
CHAPTER SEVENTEEN (A) • USABILITY FOR GAME FEEL
By contrast, Strange Attractors is a game that uses only one button as input.
The position of your ship in space is always fluid, always changing very subtly,
and you can manipulate it only by activating or deactivating your ship’s gravity
drive. Both Strange Attractors and Zuma have fairly sensitive, nuanced reactions
to input. This is reaction sensitivity: sensitivity created by mapping user input to
game reaction to produce more (or less) sensitivity in the overall system. It is in
this space—between player and game—where the core of game feel is defined.
Consider just how simple the original Nintando Entertainment System (NES)
controller was relative to the expressive feel of Super Mario Brothers. The NES con-
troller was just a collection of on/off buttons, but Mario had great sensitivity across
time, across combinations of buttons, and across states. Across time, Mario sped up
gradually from rest to his maximum speed, and slowed gradually back down again,
his motion dampened to simulate friction and inertia in a crude way.
In addition, holding down the jump button longer meant a higher jump, another
kind of sensitivity: across time. Holding down the jump and left directional pad but-
tons simultaneously resulted in a jump that flowed to the left, providing greater
sensitivity by allowing combinations of buttons to have different meanings from the
pressing of those buttons individually. Finally, Mario had different states. That is,
pressing left while “on the ground” has a different meaning than pressing left while
“in the air.” These are contrived distinctions which are designed into the game but
which lend greater sensitivity to the system as a whole so long as the player can
correctly interpret when the state switch has occurred and respond accordingly.
The result of all these kinds of nuanced reactions to input was a highly fluid
motion, especially as compared to a game such as Donkey Kong, in which there
was no such sensitivity.
This comparison, between Super Mario Brothers and Donkey Kong, shows very
clearly just how much more expressive and fluid Mario’s controls are. The interest-
ing thing to note is that Donkey Kong used a joystick, a much more sensitive input
than the NES controller. No matter how simple the input, the reaction from a sys-
tem can always be highly sensitive. No matter how sensitive the input, the reaction
from a system can always be reduced or muted. Of course, there isn’t some magic
formula for the right amount of sensitivity in the system.
Look for happy accidents, though. Do you surprise yourself with what you can
express or accomplish with your controls? Does the act of playing create something
aesthetically pleasing? Do you find yourself wasting time noodling around instead of
continuing to tweak and tune? Does it feel like you’re building a meaningful skill?
If the answer to these questions is yes, it’s time to give this motion some spatial
meaning.
3. Context
Returning to Mario 64, imagine Mario standing in a field of blank whiteness, with
no objects around him. With nothing but a field of blankness, does it matter that
Mario can do a long jump, a triple jump, or a wall kick? Context is the soil of your
garden; it’s necessary for the mechanic to grow and bloom.
276
17.1 THE GAMEPLAY GARDEN
If Mario has nothing to interact with, the fact that he has these acrobatic abilities
is meaningless. Without a wall, there can be no wall kick. At the most pragmatic
level, the placement of objects in the world is just another set of variables against
which to balance movement speed, jump height, and all the other parameters
that define motion. In game feel terms, constraints define sensation. If objects are
packed in, spaced tightly relative to the avatar’s motion, the game will feel clumsy
and oppressive, causing anxiety and frustration. As objects get spaced further apart,
the feel becomes increasingly trivialized, making tuning unimportant and numbing
thoughtless joy into thoughtless boredom (most Massively Multiplayer Online games
suffer from this phenomenon to some degree.)
For this reason, it’s a good idea to build some kind of test environment as you
create the system of variables you’ll eventually tune into good game feel. This is the
Magic Garden of game feel: if you can make it exceedingly pleasurable to interact
with the game at this most basic level you’ve got a superb foundation for enjoyable
gameplay.
So you should be putting in some kind of platforms, enemies, some kind of
topology that will give the motion meaning. If Mario is running along with an end-
less field of blank whiteness beneath him, it will be very difficult to judge how
high he should be able jump. So you need to start putting things in there to get a
sense of what it will be like to traverse a populated level. In many cases, the goal
is to find the standard unit from which the game’s levels should be constructed.
In a two-dimensional game, this could be the number of tiles high and wide for a
good-feeling standard jump. In a racing game, this could be the width of the road
and the angle of various curves (with an eye for how difficult they are to navigate
successfully.)
My favorite approach is to use a wide array of primitives in various sizes. Just
throw stuff in there; don’t worry too much about the spacing. Tweak the spacing
of the objects relative to the avatar and vice-versa until it all starts to feel really
good, but then just throw in all kinds of objects of various sizes, types, shapes,
and physical properties. Build a playground of interesting interactions. Put in big
archways, round things, fat things, pointy things, anything you can think of. Get
a bunch of shapes in there and study the way the spatial dynamics are interacting
with the feel you’re creating. Plan for happy accidents and stay loose and open-
minded when testing. Take note of crescendos of enjoyment as you interact with
the space and lean into them with tuning and additional test terrain.
Another thing to consider about adding spatial context is that constraint is the
mother of skill and challenge. Think of a football field: there are these arbitrary
constraints around the sides of the football field that limit it to a certain size. If
those constraints weren’t there, the game of football would have a very different
skill set and would likely be less interesting to watch. If you could run as far as
you want in one direction before bringing it back, where’s the challenge? The skills
of football are defined the constraints that bound it. If things are going well with a
prototype, I find myself creating mini goals, trying to shoot through gaps or other-
wise skillfully traverse the increasingly fleshed-out spatial topology.
277
CHAPTER SEVENTEEN (A) • USABILITY FOR GAME FEEL
4. Polish
At or around the same time you’re building context, you’re going to want to start
putting in a bit of polish—but only what’s essential to your prototype. Polish can
include sprays or dustings of particles where things hit or interact, screen shake,
view angle shifts, or the squash and stretch of objects colliding. The point is to
convey the physical properties of objects through their motion and interaction. Any
effect that enhances the impression that the game world has its own self consistent
physics is fair game.
This is opportunity to take inspiration from the film, animation, and *gasp* the
world around you. Look at the way things interact. If you hit a glass table with a
hammer it will shatter, complete with noise, motion, and a spray of “particles.” The
more clues like that you can borrow to inform the player of the physical properties
of the objects they’re interacting with the better.
When prototyping, I like to list these cues out and sort them in order of impor-
tance to the physical impression that should be conveyed. As an example, consider
the goal of making a game that feels squishy. This is a good place to start because
to say that something is squishy implies visuals, sounds, tactile sensation. It pro-
vides a great benchmark: if something is squishy, it will deform in a certain way,
like a water balloon or silly putty.
As these deformations happen, certain sounds accompany them; familiar
squelching and schlucking noises which are hard for me to describe but easy to
recall. It’s the noise of walking through deep mud, or kneading wet dough with
your hands. Separating out the various pieces of squishiness as a physical property
yields something like this:
Motion—The thing must deform and bend when it comes into contact with other
objects, especially relative to speed.
Tactile—You can easily deform, mold, or stretch the thing.
Visual—To aid the impression of squishiness, the thing could look moist like a
slug, translucent with tiny bubbles like Jello, or amorphous like putty or clay.
Sound—Any movement or deformation of the object should be accompanied by
squelching noises.
These comprise the physical clues that get assembled into your brain to create
the notion of squishiness. Anything you can layer on top to fake these effects will
increase and improve the impression of physicality and, hence, the feel. As polish
is a notorious time sink, you want to limit the amount of time you spend creating
effects to those that are crucial to demonstrate the impression of physicality you’re
going after.
Something squishy needs to deform and to sound squishy, but it probably
doesn’t need a full fluid or spring simulation. A simple squash and stretch deforma-
tion is probably enough to get the idea across.
So, yes, polish is time consuming but it’s also vital. A little screen shake or spray
of particles can make all the difference in the world to a game’s feel.
278
17.3 FURTHER THOUGHTS FROM STEVE SWINK ON GAME USABILITY
5. Metaphor
Your choice of metaphor changes game feel dramatically. I like the following
example: imagine Gran Turismo, Project Gotham, or whatever your favorite “simu-
lation” style racing game happens to be. Now substitute for the car a giant, balding
fat guy running as fast as he possibly can spraying sweat like a sprinkler in August.
Without altering the structure of the game, the tuning of the game, or the function
of the game, the feel of the game is substantially altered.
All you’ve done is swap out a 3D model of a car for a 3D model of a giant fat guy
running and you’ve got Run Fatty Run instead of Gran Turismo. This will change
the feel of the game because you have preconceived notions about the way a car
should handle. Obviously.
You know how a car should feel and move and turn based on your experience
driving a car and looking at cars. Oftentimes, people will play a game—horse-riding
gameplay is my favorite example—and they’ll say “this doesn’t feel like a horse.”
And you’ll ask them well, have you ever ridden a horse before? And they’ll say “no,
but this doesn’t feel like a horse.” People have these built-in, preconceived con-
structs, mental models about the way certain things move and, by extension, how it
should feel to control them.
The implication for prototyping is this: you need to take a step back and decide
how much of your metaphor to represent in the prototype to get an accurate read
on the game feel you’re building. Iconic is fine, but if it’s going to be a car, it needs
to read as a car. The trick is not to limit yourself to only everyday objects, but to
look at how you can use preloaded conceptions to set up, and execute on, expecta-
tions for how a thing should feel and behave when controlled.
6. Rules
Rules are the final layer that goes into a game feel prototype. Basically, you’re
looking for longer-period objectives to give additional meaning to the sensation of
control and mastery. If you’ve been noodling around with a mechanic for a couple
hours, this shouldn’t be too much trouble since you’re probably already making up
little goals for yourself. Race from point A to point B, scale this tall mountain, res-
cue five wayward puppies. These kinds of higher order goals define game feel at a
different level: sustainability.
This is one of the most difficult things to do. You need to build in some longer
period goals to find out whether or not this motion you’ve created has depth. This
will necessarily be a bit of a rough test, and there’s really not a good way to get an
objective read on depth unless you watch a bunch of people play the game, but you
can get a sense of whether or not your mechanic is deep.
That is, whether or not you can have long-period sustained interactions that are
deeper than the surface pleasure of steering the guy around the most basic context
and spacing you’ve created. This is things like get to the top of the hill, get from
A to B, collect X number of coins, sort all these things into colored bins, perform
a certain trick at a certain location, and so on. Just about any goal implies a set of
rules for achieving that goal.
279
CHAPTER SEVENTEEN (A) • USABILITY FOR GAME FEEL
This sort of testing brings your fledgling game feel up against the hard ham-
mer of game-creation reality. You’re starting to try and create challenges that could
become a sustainable game. This is a bit of a grey area, as it starts down the slip-
pery slope of game design proper, but I would encourage you to consider creating
these types of throwaway goals. Don’t consider them a prototype of the complete
structure of the game.
Just throw a bunch of goals in there—get around things, collect coins, get to a
certain place—find the coolest interactions, the coolest parts of the level. If you’ve
been playing around in your level, tweaking the mechanics and spacing the objects,
you have a good sense of what’s going to be fun about it. You’ve already devel-
oped a bunch of intrinsic, internalized goals: can I do a flip off this thing, can I get
up there, can I do two flips before I land and so on. Just throw these in there and
codify them.
Interestingly, there is a big difference between inventing goals for yourself and
explicitly coding those goals into the game: completing a goal means satisfying
the conditions of the impartial, third-party computer. It also means some kind of
reward, no matter how meager. If you can’t come up with a bunch of different goals
that are enjoyably challenging, that’s trouble. It might be time to abandon or signifi-
cantly alter your mechanic.
17.2 Conclusion
At this point, you’ve proven whether or not your game is going to feel good at the
most basic level.
With diligence and luck, you’ve got a game that feels great. Moment to moment,
it just feels good to steer around and feel out the space. The spacing of objects is
in perfect harmony with the tuning of your controls and you’re quickly finding the
places where the spatial context crosses over and constrains the motion, yielding
the most interesting interactions. You feel yourself starting to build skills that might
give rise to longer period interactions.
Finally, you started adding on some rules that test whether or not this mechanic
will be sustainable and may give you some interesting directions to lean into when
you start designing the system dynamics that are supposed to sustain the experience
across an entire game. You now have the foundation for a great-feeling game.
As a final note, consider the aesthetic beauty possible with game feel. Create
something beautiful at the intersection of player and game. Remember: the first, last,
and most common thing a player will experience when playing your game is its feel.
280
CHAPTER
SEVENTEEN (B)
Further Thoughts
from Steve Swink on
Game Usability
(Editors’ note: We asked Steve for a chapter on game feel, and he submitted that
as well his musings about usability and games, more generally speaking. We
include much of that rant here, because it is another interesting and valuable
developer perspective on how to think about usability. Enjoy!)
281
CHAPTER SEVENTEEN (B) • FURTHER THOUGHTS FROM STEVE SWINK ON GAME USABILITY
Experiential Testing
When I say experiential testing, I mean what most game designers mean when they
say game testing. The goal is to get an objective read on the current state of the
game as-is. If a fresh, uninitiated user were to play the game in its current form,
what would the experience be? This is the most basic and fundamental form of
game testing and addresses things like the balancing of systems, the tweaking of
mechanics, the spacing and placement of level objects, flow and pacing, and other
common game design issues.
In an experiential test, the designer sits down and watches players play the game,
observing and taking notes on the results. The purpose of experiential testing is to
compare the actual, live experience a player has with the game to an idealized vision
of experience contained within the designer’s mind from a purely artistic standpoint.
Consciously or not, the designer takes the many subtle clues from the faces, pos-
ture, language, and mood of the players and intuits them into a generalized notion
of how the game plays in its current incarnation. Players laugh and yell in triumph.
They grind their teeth in frustration. Or they may be rapt, their entire conscious
world focused into a narrow cone on the development and successful execution
of strategy. At least, that’s what the designer hopes for. Initially, the results are
likely to be the most humbling and bitter pill for a designer to swallow: apathy.
In game design, indifference is anathema. If the player is screaming and tearing
their hair out in frustration, that can be tweaked. If they shrug and say “it’s okay,
I guess,” it’s the designer’s turn for hair-pulling. So that’s experiential game testing,
and almost every game goes through some of it. There’s just no other way to get a
reasonable, objective read on the experience of the game. You as the designer have
played it too much and understand it too well to objectively judge. You know how
it’s supposed to work and will subconsciously play it that way. Experiential testing
reveals how it actually works.
Defect Testing
By defect testing I mean bug hunting. By “bug” I mean an unintended error in
logic or syntax that prevents the software from behaving as intended. This includes
crashes, which forcibly halt the game, and things like dead ends, which prevent the
player from progressing further. At the end of every major software project, there
is a period where all content and features are locked down and the team spends
their time hunting down and fixing the inevitable bugs. It sucks, but it’s reality:
282
17.5 WHY DO WE TEST?
every game has bugs. To find them, you have to explore every nook and cranny
of the game, including bizarre cases such as leaving the game running all night
in every possible state. Defect type of testing is rigorous, systematic, and inten-
tionally excludes the concerns of experiential testing. It’s not about experience; it’s
about building a solid piece of software. In other words, defect testing is no fun for
anyone.
It’s also worth noting that defect testing is carried out by paid professional testers.
This is as far as it is possible to get from a fresh, objective test with an uninitiated
player. These folks are (under) paid, overworked, and are literally handed a tasklist
which tells them how specifically they are supposed to proceed through the game.
Again, the purpose of this testing is simply to find and eliminate bugs by systemati-
cally trying every possible option or course of action.
Usability Testing
The third and final type of testing is usability testing, the type of testing this book
is about. Usability testing straddles the line between experiential testing and defect
testing. In a way, it’s about debugging the experience. Will Wright once said that
designing a videogame is half computer programming, half people programming.
Usability testing is about finding and eliminating the flaws in the people program-
ming. Imagine playing Tetris but being unable rotate to the block left and right. Is
the game as engaging, challenging, or enjoyable? Hells no! Now, ask yourself this: is
there any difference whatsoever between a game of Tetris where one cannot rotate
and one where one cannot figure out how to rotate? Usability tests are intended to
find and correct any such issues.
283
CHAPTER SEVENTEEN (B) • FURTHER THOUGHTS FROM STEVE SWINK ON GAME USABILITY
metrics, there was some success in all our testing and tweaking: the game sold over
three million copies. But I could not escape the feeling that it was not as good a
game as it could have been. Whether through lack of focus or the inevitable crush
of time pressure that accompanies a one-year development cycle, I have always felt
that we as designers could have done better. Specifically, the experience of playing
the game could have been better if we had tested better. Not more, necessarily—
testing once a week is fine—but testing smarter.
This thought led me to thinking about why we test games in the first place. The
answer I arrived at is bound up in the frustratingly ephemeral nature of what we
call “gameplay”, that bizarre universe that is conjured improbably at the moment
the game begins, living in the space between player and game. The insight is this:
without a player, there is no game. This kind of seems like an obvious statement.
But compare the active, multifaceted learning process of gameplay to the compara-
tively passive process of watching a film. There is mental engagement in a film,
sure, and there can be critical discussion and analysis after the fact, but a film will
still play back from start to finish if there’s no one there to watch it. An unplayed
game just sits there. The crucial point, I think, is that a videogame is fundamen-
tally a system whose output is participatory human experience. A videogame is
collaboration between player and designer intended to produce a particular experi-
ence. This is why testing is necessary in the first place. Filmmakers may screen their
films and try to gauge the audience response. Based on this response, they may
make some additional edits or reshoot a scene, but the vision of the film can be
and is detached from the audience while it is in production. A director can look at
a scene and say “yes, nailed it!” or “once more from the top, with feeling!” Would
you release a completed commercial videogame on the hunch that it’s an enjoyable,
engaging experience? Would you build it to a near-complete state without ever hav-
ing users test it afresh? Exactly.
The problem lies in the disconnect between a designer’s intended experience
and the actual experience of playing the game for a new user. Film has no active
participation; you can guarantee that the frames will play back in the same order
every time. It’s possible that viewers will have wildly different experiences viewing
it based on wildly varying interpretations, but probably not.1
In a participatory videogame, there is no such determinism. A player may
move through the level exactly as the designer intended, or they may obsess over
a plant in an obscure corner of the level for minutes at a time. They may succeed
on the first try or they may fail one hundred times before beating a boss or song
or stage. There are no guarantees about a player’s behavior, only the signposts the
designer has built into the game. As designers, we can’t tell the players to take
it from the top. We can only make small incremental improvements to bring the
1
Perhaps this is why so many videogame designers, myself included, love David Lynch films. The high
amount of ambiguity in every scene necessitates a high level of intellectual engagement.
284
17.5 WHY DO WE TEST?
player’s actual reality in line with our desired experience. We can’t be in every
download. We can’t pop up behind every player in every living room and wave a
disapproving finger if they don’t play the game the “correct” way. All we can do
is modify your game’s rules and structure so that most of our players will experi-
ence the game the way we intended it. This is, in a nutshell, the process of game
design.
So we test because a game requires play. Play needs a player. In order to get an
objective read on whether or not a game is providing players with the experience
you as a designer intended, you have to see people play the game. Sounds simple
enough. Just compare the desired experience to the experience of people playing.
So what’s the problem?
For starters, there’s the Pandora’s Box of definition. How do you codify the “ideal”
experience you’re testing against? Testing implies measurement, and to measure you
need something to measure against. How do you explain experience? In terms of
other art forms? In terms of everyday feelings and activities? By pointing, raving, and
jabbering? Or what? Especially with finicky things like tuning game mechanics which
have little or no real-world analogy, clarifying exactly the feel you want can be diffi-
cult in your own mind let alone when expressed to other members of a team. With
no clearly defined experience to test against, it’s easy to waste time when testing.
You can fix the obvious stuff, make exploratory changes, and try to lean into the
positive things that are happening, but it’s difficult to adapt to large-scale problems
in the overall design or to really drill down and strip away everything but the pure
essence of the experience you’re going for. Defining that experience is key, and is a
key problem with game testing.
Another difficulty in effectively testing is what I would describe generally as
“staleness.” If you work on a project three months it’s easy to lose all objectiv-
ity. Three months or three years; it’s not a question of whether you as a designer
will lose perspective, but when and how you deal with it. A well-run playtest
with clear goals and fresh players is the best remedy I’ve seen. As I said, it takes
a player to see a game. For a short while at the start of the design, that player can
be you. You can be fresh enough, open-minded enough, and objective enough to
tune your own system and get some kind of read on whether or not it feels good.
For me this period lasts about a week. After a week of tweaking on the same
mechanic or system, it starts to become extremely difficult to make a change and
say definitively that it has had a positive or negative effect. Recognizing that this
staleness is present and that it’s an insidious danger whose primary solution is
testing may take years of experience.
Finally, there is the problem of challenge versus obfuscation. It takes judgment
and experience to figure out whether a problem is part of the challenge presented
by the game or if it is outside of it and needs to be corrected in some way. In other
words, can the user figure out what they’re supposed to do? Can they master the
controls or learn the interface well enough to engage with the challenge? Stated like
that, it seems a clear division, an easy split between issues of usability and issues of
game design. But, like everything else related to game design, it’s never that simple.
285
CHAPTER SEVENTEEN (B) • FURTHER THOUGHTS FROM STEVE SWINK ON GAME USABILITY
Usability and game design are inextricably intertwined, and the relationship
between them should be well explored and understood.
Of course, these problems are multiplied the longer a project goes on and the
more people are working on it. The longer a project continues without a clear vision
for its final experience, the greater the layering of testing and aimless tweaks that
will pile on top of it. The longer the project continues, the easier it is to lose objec-
tivity about balance and tuning, and the easier it is to lose sight of any dissonance
between the intended experience and the actual experience. This in turn makes it
difficult to discern whether a first time player is failing at the crucial skill of playing
the game, or failing to understand it.
Hopefully, examining these issues in depth and with examples from my own
work will help other designers in avoiding or mitigating them.
“When I create a game, I try to envision the core element of fun in the game.
To do that, I imagine one thing, the face of the player when he or she is play-
ing the game. My personal view as a designer is that I want that reaction,
that emotion to be positive. Glee, surprise, happiness, satisfaction. Certain
obstacles may raise suspense, competition, frustration, but we insert these ele-
ments in order to produce a new sensation that you’ve never felt before. And
I want that final emotion to be a positive one. That’s what I have in mind
when I create my games.”
As human beings we’ve all had years and years of practice reading, understand-
ing, and reacting to everyday social situations. You can watch a person’s face, their
286
17.6 DEFINING EXPERIENCE
posture, and their body movements and read easily if they’re deep in strategiz-
ing thought, deeply frustrated, or delightfully surprised and satisfied. This can tell
you what experience specific moments in the game are conveying to the player
and provide great signposts for modification. This is what most designers I know
do; they watch the players test and try to emphasize the fun and minimize the
boring (or excessively frustrating). A huge part of defining the intended experi-
ence of a game, then, should be to define exactly what the reaction of the player
should be.
I think there’s a possibility for oversimplification, though, in ending things there.
There is a tendency in game design to oversimplify and ask “is the player having
fun?” and call it day. Unfortunately, that word is both loaded and slippery in ways
that make me uncomfortable using it as any kind of serious design discussion (let
alone basing large-scale changes to a design on it). As one simple example, my dad
loves to garden. He grows a lovely little crop of tomatoes, squash, and beans every
year. He potters around and fusses over his garden, builds little planter boxes with
spare wood from the garage, and relishes picking the fresh veggies each night and
turning them into dinner. Is he having fun? By most definitions of fun I’ve read, the
answer would be no. Is he enjoying himself and does gardening enrich his life? Does
he find it engaging and worthwhile? Are there many benefits to him, both tangible
and intangible, to gardening? Yes, of course. What I’m getting at is that the notion
of fun is limiting in many ways, and it may blind us to some of the less obvious
ways that people engage with games and find them enjoyable and enriching. He
gardens because he loves it. He’s not laughing, crying, or clenching his teeth in
frustration, but I can see that gardening makes him happy and relaxed. In defin-
ing an experience we want to create, we should consider not only the signs that a
player is experiencing triumph over adversity, frustration, or social gratification, but
other, more subtle sorts of experiences. (For a great breakdown of different kinds of
enjoyment, check out Nicole Lazzaro’s Four Fun Keys model in chapter 20.)
My dad’s gardening habits are also useful in that they illustrate the other impor-
tant metric of experience: behavior. If he didn’t enjoy gardening, he wouldn’t do it.
In the same way, there’s often no better test of experience than in-game behavior.
For example, in Tony Hawk’s Pro Skater 4, it is possible to “skitch”—grab the back
of a car and go along for the ride. Seems fun and interesting, right? But it’s almost
categorically ignored by players because it’s not useful in the context of the game’s
other systems. Unlike, say, “grinding” ledges and rails which is one of the most
crucial and fundamental mechanics in the game, skitching is totally ignorable. The
player doesn’t need to do it and has no in-built incentive other than “hey, cool,
I can grab on to the car.” In this case, the behavior of players (ignoring the skitch
mechanic) was okay. Skitching was an Easter egg, a fun but ultimately unneces-
sary thing for advanced players to find and toy with briefly. When the skitching
mechanic later became required, as it was in Tony Hawk’s Underground, this behav-
ior was no longer desirable.
As another example, think of games which feature high, climbable structures.
In games like Super Mario 64, Aquaria, and Crackdown I often find myself looking
287
CHAPTER SEVENTEEN (B) • FURTHER THOUGHTS FROM STEVE SWINK ON GAME USABILITY
upwards and trying to reach the highest visible point. This is a simple, enjoyable
goal for me to pursue and has given me a great amount of satisfaction in each game.
I love the sensation of virtual vertigo and the feeling of calmness I get from being at
the highest point in the level and surveying the surroundings. Without the challenge
of jumping my way up there, the threat that I could misstep and fall, and the promi-
nence of tall features in the landscape to attract my attention, I would not have this
experience. And yet, I did not feel forced or coerced. Was this a designed experi-
ence? You bet your ass it was. And you can also be sure that it took some testing
and a clear vision of the desired experience to get it right.
We can design for experiences like these. They don’t have to spring full-formed
from a complex and unpredictable interblending of different systems and mechanics
as if by magic. Every designer I know intuitively translates ideas about mechanics
and systems into player experiences in their head. All I’m proposing is that we
should do a better job of pre-examining what we’re looking for.
So in describing an experience in the detail needed to meaningfully test against,
the following can be useful and should be thought out:
1. What should be the player’s emotions and reactions at a given moment and what
are the signs that the player is experiencing them? This includes facial expres-
sions, body movement and posture, and verbal feedback.
2. What behaviors indicate that the player is engaged with the intended experience
of the game? Do they chase the rabbit, use the skitching mechanic, or try to get
to the highest point?
The best way I know of to get at the kinds of descriptive accounts of experi-
ence that are exceedingly useful for detailed testing is to break the game down into
moments. We can think of these as specific moments of interactivity, if not moments
in time. Even if we don’t know when, we know that these things will happen at
some point in time in the course of play. At least, we can test to make sure they do.
If they don’t happen, that’s a usability issue. If they’re happening but they’re not
conveying the proper experience, that’s a game design issue. If we clearly define the
experience moment by moment, minute by minute, hour by hour from the begin-
ning experience to the end, we have something we can really test against. We don’t
have direct control over when things are going to happen in our games, but we do
have access to the tools of people programming.
Here are some example segments of a detailed, written-out game experience, bro-
ken down by moments:2
2
Note that these moments can be and are asynchronous. Even if we can’t say for sure when, they’re
going to happen at some point in the course of play. So they can be tested for. Huzzah.
288
17.6 DEFINING EXPERIENCE
Jeep Feel
The jeep should feel intuitive and satisfying to drive around. The player should
not notice or be inhibited by the controls. There will be a slight learning curve
to adjust to the specific feel of this mechanic and the spacing and layout of
objects in the world. At the end of the first game, however, the player should
be able to get the jeep to go where they want it to go 90% of the time and
289
CHAPTER SEVENTEEN (B) • FURTHER THOUGHTS FROM STEVE SWINK ON GAME USABILITY
should be focused on higher level goals and objectives such as hitting raptors
or continuing combos. Simple steering should not be an impediment. Despite
being controlled with simple on/off buttons, the driving should feel very
expressive. The jeep should not flip over too easily, but it should be clear that
rolling and crashing the jeep is possible (and encouraged through the damage
and stunt systems.) It should have a sense of weight and presence, and feel
to a player as though it is a large and heavy vehicle with responsive steering
with which they can navigate the environment easily. Crunching tire sounds,
engine revving, and crashing noises of various kinds further lend credibility to
the impression of physicality. Slow motion camerawork at moments of impact
or during long jumps further emphasize the physical nature of the interactions.
Also, the world should feel expansive to the player and they should enjoy the
simple pleasure of discovering new areas. “Ooh, waterfall!” and “hey, a ramp
I can go up” are common exclamations. The player is actively seeking new
places to go, and fleshing out their mental map of the world.
290
17.7 USABILITY AND GAME DESIGN
mode all give the player positive feedback. The interaction is extremely satisfy-
ing and has a sort of extreme sports bloopers appeal, but with virtual ragdoll
raptors instead of some poor motorcyclist flying over hay bales, or a cowboy
being gored by a bull. It’s a victimless pleasure. The experience of hitting a
raptor feels gratifying, interesting, and worthwhile in and of itself. The player
should laugh and respond as the raptor is sent flying, looking around to see
who else is watching. The ragdoll raptors will also hopefully have “over the
shoulder” appeal. People walking by should stop and want to know more about
this irreverent game in which you run over feathery ragdoll raptors that look
like parrots in an off-road Jeep. In addition, the player gets score feedback. It
becomes clear that this game is about score, and the player will hopefully now
be attuned to ways to maximize their score. They haven’t realized that there’s a
global high score list yet, probably. That will occur at the end of the first game.
Advanced skills
Through many playthroughs, the player begins to notice deeper skills in the
game such as whipping the chain ball around and throwing it to hit a rap-
tor and balancing the jeep on two wheels. These advanced techniques should
provide a nice sense of depth and reward for those players who play the game
obsessively for hours on end. These are also the skills needed to score large
numbers of points and ascend the leader boards. This should make players,
new and expert, feel as though the game is both easy to pick up and difficult
to master, that it has a great deal of depth.
As you can see, much of the experience of Off-Road Velociraptor Safari is
hooked into the web page/ browser experience and into the fact that there are
global leader boards and the types of incentives that provides and the types of
behaviors that result. We knew this going in and considered the design as a
whole entity, including the web page wrapper, achievements, and trimmings
(all of which were coded by the indefatigable Matthew Wegner in less than
three days!) Even the press release about feathered velociraptors, promo vid-
eos, and t-shirts were part of it. It’s all one experience, the experience of play-
ing Off-Road Velociraptor Safari. Take it, raptors!
291
CHAPTER SEVENTEEN (B) • FURTHER THOUGHTS FROM STEVE SWINK ON GAME USABILITY
This is what is known as a “got it/don’t got it” behavior. It’s binary. Either the
player figured it out or not. In other words, you can test for this. A usability expert
could use a list of got it behaviors like this to draft up a detailed test plan in prepara-
tion for running participants through usability tests in their lab. Each behavior would
be distilled to a particular task, which the user would then be assigned and have
to work through. The goal here is to make sure the player is able to engage with
the game’s challenge, that it is not obfuscated by a confusing interface. The chal-
lenge to this kind of testing is how to test for a behavior without invalidating your
results. It’s easy to give away what you’re testing for or to otherwise lead the player
into behaviors they would otherwise not have exhibited. For example, how do you
test boost button usage without saying something like “use the boost button to get
a huge stunt.” You have to give participants a task can only be completed by using
the boost. In the case of Raptor Safari, this could be something like “climb as high as
you can” or “climb a steep hill.” Alerting the player to the fact that there is a boost
would denude the purpose of the test. That the player realizes there’s a boost in the
first place is part of what we’re testing for. This is a classic usability test. It has noth-
ing to do with whether the player thinks that using the boost is fun or annoying or
whether they are generally getting the experience we as designers intended. It has
everything to do with removing obstacles to engagement. If the player doesn’t figure
out that there is a boost button or how to use it, they will not be experiencing the
game as intended.
By contrast, experiential behaviors test essence of the design. Experience is
the payload; to make the game more usable is to ensure proper delivery. Doing a
moment by moment breakdown like we did with Off-Road Velociraptor Safari opens
the possibility of testing for the intended experience in the same way we test for
usability issues. Watching a player play the game I can say definitively that they
either enjoy hitting raptors or not, and how much. From total passivity (no reac-
tion) through a snort to chuckle and on up to raucous laughter, I can tell how close
the experience of hitting a raptor gets to our intended experience. If they check
their scores after each game, or linger on the achievement page, I can tell that they
understand how far the system extends. I can see if the player’s left wanting more
after each game. If they play again immediately, great success! As with testing for
usability issues, the trick is to distill the intended experience down to got it/don’t
got it behaviors which can be measured. Unlike usability testing, however, there are
shades and ranges. As with my above example of the raptor, the desired behavior is
that a player thinks hitting raptors is absolutely hilarious and never gets tired of it
but the possible outcomes fall on a range between apathy (worst) to laughter and a
high level of ongoing enjoyment from hitting raptors (best).
I think that drafting a detailed test script and running a specific usability test
is unnecessary, however. If you’re properly organized, you can test for usabil-
ity and game experience in one. As long as you put the work into preparation,
you’ll be ready when the data starts to fly and will have already determined pos-
sible problem areas, complete with what game experiences rely on what usability
behaviors.
292
17.8 CLEAR THE BRIDGE
In the next section, I’ll provide an example of one way to do this, by defining the
important moments of experience and the types of experience to be tested for and
correlating these with the usability behaviors that unlock them. First, though, an
ugly example of what happens when tests are not properly prepared for or used to
their fullest potential.
293
CHAPTER SEVENTEEN (B) • FURTHER THOUGHTS FROM STEVE SWINK ON GAME USABILITY
In addition, the position of the skater on the back of the car is important.
While keeping the meter balanced, the player must press another button to adjust
the skater slightly to the left. The skater needs to be a bit left of center of the car
because after hanging to the car for this considerable length of time and keeping
the skitch meter from failing, the player must release from the skitch within a very
small window of time as the car nears the bridge and, having done so pull the fast-
moving skater into proper alignment with a ramp in front of the bridge. If the skater
has not been properly positioned, they’ll have no chance of hitting the ramp.
This goal is difficult for me to complete consistently, and I’ve done it probably
100 times. It requires, among other things, multitasking, precise timing, knowl-
edge about where to position the skater on the back of the car, and a nuanced
understanding of how the meter balancing system works. As difficult as it is, how-
ever, the biggest problem for players in trying to complete this goal is their inabil-
ity to understand exactly what they should be doing. There are instructions at
the start, of course, but no one will ever read instructions if they have the ability
skip them.
So here is the player, dumped on a skateboard moving slowly forwards towards
the back of a stopped car. For most people, the normal reaction to this situation isn’t
to grab on to the back of the car and ride it around the block, building up speed to
leap hundreds of feet over a bridge. Having skipped the instruction text (assuming
that the player even knows what skitching is), the player has nothing to guide her.
We included text at the top right telling the player exactly what button to press to
start the skitch, but there is perhaps a two second window before the skater runs
into the back of the car and is sent off in the other direction. Yowch!
What this problem illustrates is the relationship between game design and usa-
bility. They’re crucially intertwined. With this goal, we inadvertently create a dif-
ficult puzzle for the player. Our intended experience, I think, was something like
this:
Honestly, it wasn’t all that well thought out from a game design perspective.
For example, how does this skill relate to the rest of the game? Isn’t this just a
weird, esoteric goof goal that has no relationship to what you as the player have
294
17.8 CLEAR THE BRIDGE
learned and done up to this point? It has nothing to do with combos, scoring, and
exploring the level, the skills the player has built up to this point. The Tony Hawk
games have done a great job in the past of providing some weird/funny goals to
mix it up a bit but now we’ve made this a required goal. You have to complete
it in order to progress. Herm, that’s not so good. At best it feels forced and the
player says to themselves “um, ok… WTF LOL!” At worst, this is a game breaker.
We’re forcing the player to use skills that never get used in the normal sandbox
gameplay and we’re forcing them to apply them at an exceedingly high skill level.
This goal is HARD. Ipso facto, this might be the point at which many players sim-
ply stop playing.
Another problem is in the necessarily repetitive nature of this goal. To figure out
that you have to adjust the skater a bit to the left of the car in order to hit the ramp
properly when you release takes knowledge and experience. In other words, we’re
pretty much guaranteeing that the player must fail at least once to complete this
goal. There is a thirty second build up before you can try the release again, so we’re
essentially putting a 30 second tedium penalty on a goal that requires trial-and-error
style play to figure out.
Finally, we just didn’t think through what would happen if you missed the ramp
or hit the jump at an off angle. Or, heh, if you plowed into the back of the car,
which doesn’t magically disappear just because you’ve released your skitch. So you
miss and you hit something, thud. You lose all your speed instantly and get the goal
fail noise and graphics. What an awful feeling! There is such a thing as entertaining
failure—we could have had a missed jump still launch you somewhere interesting,
or at least give you a huge air that could be turned into a trick or combo. Maybe the
player could have gotten a stat bonus for this or some other, secondary goals could
be hooked into it. Or, hell, it might have been funny just to watch the player crash
spectacularly. As it is, this is a hugely punitive system.
In the end, the experience of this goal to an uninitiated player went something
like this:
Wah!? What’s going on? Why is this car in my way! I just hit the back of that
car! Now what do I do?
*Retries goal from the menu if they’re aware of the menu retry. If not, they have
to wait for the goal timer to end (30 seconds) and skate awkwardly back to the
goal pedestrian to restart the goal *
*Skips text*
*Retry. Reads text or notices the message about which button to press for
skitch.*
295
CHAPTER SEVENTEEN (B) • FURTHER THOUGHTS FROM STEVE SWINK ON GAME USABILITY
Huh, ok, so I guess I’m riding on the back of this car now. Wait, what!? I fell
off! Oh man, I have to balance this meter? Jeez!
*Retry*
Ok, ok, I got it, I’ve got it…I, CRAP! I fell off again!
*Retry*
Ok, I think I’ve got the hang of this.
*Player nears bridge*
What the hell? Am I supposed to…?
*Player doesn’t read or understand text that pops up telling him to release the
skitch and fails to do anything. Car turns, player is released and just runs into
barricade with an unsatisfying thud. Goal fail noise and graphics play. Player
is livid.*
GAH! THIS IS SO RETARDED!!!!1!
*Retry until the player figures out that they must release the button and steer to
hit the ramp*
Oh my God I did it I…WHAT?!?!?
*Player is reduced to nonsensical angry jabbering as he finally hits the ramp
and is launched to the left of the bridge and into the river, failing the goal*
WHAT. DO. YOU. WANT. FROM. ME?!?!
*Retry*
So I have to hit it just right…if I could just move the guy to the left…oh, ok, I
can adjust the guy on the back of the car.
*Retry three more times*
FINALLY. God, I’m glad that’s over. I don’t even want to play this game
anymore.
Excruciating, no? This is not a verbatim test log—we weren’t doing anything that
sophisticated—but it is cribbed from my notes and observations. Sadly, these were
mostly observations after the fact, watching friends, family, and students play the
game after its release. The goal stayed in as-is. Go play it, see what you think. It’s
one of the biggest halt points in the game, one of the places that lots of people get
stuck and stop playing entirely. Besides the game design issues, a simple usability
test would have told us that at every step of this goal, we’d inadvertently created
a confusing, muddled mess for players to try and make sense of. Unfortunately,
Tony Hawk is not a puzzle game. This example illustrates the crucial difference and
constant clash between challenge and obfuscation.
296
17.9 CHALLENGE VS. OBFUSCATION
297
CHAPTER SEVENTEEN (B) • FURTHER THOUGHTS FROM STEVE SWINK ON GAME USABILITY
With that distinction made, let’s pull it all together into a detailed test plan for
actual use.
I don’t think there’s any “correct” take on what the different types of chal-
lenges are and how they can be used. That’s part of the art of game design.
One of my favorites is Chris Crawford’s breakdown of different types of chal-
lenge by areas of the brain they utilize (Chapter 4 in “Chris Crawford on Game
Design”.) I also like Scott Kim’s Venn diagram of the various types of puzzles
and the rich fruit yielded by their interblending (slides online at http://www.
scottkim.com/thinkinggames/GDC03/index.html.)
Also, many games riff on different kinds of challenges as part of their design.
Some games that do a great job of exposing the nature of challenge and com-
bining different kinds of challenge (and which are awesome games) include:
● Wario Ware
● Arcadia (http://www.gamelab.com/game/arcadia)
● Brain Age
● ROM CHECK FAIL (http://farbs.org/games.html)
Usability: What are the got it/don’t got it behaviors? In order to play the game
properly, what must the player first understand? For example, to play Super
Mario Brothers properly, the player must understand that the A button makes
Mario jump, that holding down the button longer will cause him to jump
higher, and that Mario can still be steered left and right while in the air.
Experience: Define the important moments in the game. What is the desired expe-
rience at each moment? What should the player be feeling, thinking, and expe-
riencing at each moment? What are the specific behaviors and actions that will
tell us what they are experiencing at a given moment, and how close is that to
what we want? For example, in the game flOw, the experience of being bitten
by another creature should not be jarring, frustrating, or pull the player out of
state. Their creature is sent down one layer, but can immediately resume calmly
swimming around and eating other creatures. If the player verbalizes, jumps, or
shows other signs of frustration, this indicates that the transition is too jarring.
298
17.11 THE TETRIS TEST
The ideal experience is uninterrupted calmness and serenity with mild chal-
lenge to keep the engagement level high.
Challenge: What are the intended challenges of the game? What types of chal-
lenge is the game about? This will keep clear the distinction between usabil-
ity issues and game design issues. Is the game about quickly and decisively
taking very specific tactical actions, like Bungie’s Myth, or is about manag-
ing your own attention—figuring out what’s important amidst a roiling sea of
ever-changing unit movement and building upgrading as in Rise of Nations?
Or maybe there should be no time pressure at all, as in Civilization 4, and each
tiny decision should be deliberate and intended to further a greater over-arch-
ing strategic goal. Whatever the real challenge is, try to make sure it’s clarified
such that there is no ambiguity between usability and game design issues.
Usability
List the got it/don’t got it behaviors. In order to experience the game properly, the
player must understand:
1. Filling a line horizontally will clear it
2. If the blocks pile up to the ceiling the game ends
3. Blocks can be rotated (and actively rotating them to fit better is a key skill)
4. Blocks can be moved left and right (and actively moving them to fit better is a
key skill)
5. The game speeds up as the player clears levels
6. A “Tetris” occurs when four lines are cleared at once
7. A Tetris can only be achieved with the long, thin block
Experience
What are the important moments in the game? For each, describe the “ideal” expe-
rience for the player.
1. Start game
The empty playfield is blank and clean. This should provide a nice counter-
point to later in the game when the playfield becomes unordered and chaotic.
299
CHAPTER SEVENTEEN (B) • FURTHER THOUGHTS FROM STEVE SWINK ON GAME USABILITY
The player will want to keep things ordered. This is probably the only time in
the game when everything will be perfectly ordered. No real behavior to track
here, other than the player seeing the empty screen. The first block to drop down
immediately draws attention; it is the only thing that moves.
3. Clear a line
Clearing a line relieves the pressure slightly and provides the player a clear way
to reverse the tide of ever-falling blocks. The player understands that their com-
pulsion to sort and order is being rewarded in the form of cleared lines, which
give them more space to maneuver and add to their score. This should feel good
and is the primary reward that the player seeks in the game.
4. Complete a level
After a few lines have been cleared, the player will graduate to the next level, at
which point the game speeds up. This speed increase should return the player
to a state of deep concentration as they have to now adjust the speed of their
sorting to match. They may again start misplacing blocks. Every time the player
clears a certain number of lines, the speed ratchets up. When the level changes,
the music and color of the background also change, making each level feel
different and making the transition distinct to the player. Eventually, a thresh-
old will be hit where they simply cannot adjust their sorting to the speed of
the game. Before then, though, it should feel like a gradual increase to
which they are able to consistently adapt. It’s sort of a frog in the frying pan
scenario.
300
17.11 THE TETRIS TEST
6. High score
The player should regard the score as a benchmark for their performance, and
be interested in seeing how each game they play stacks up against all previous
games. They want to see that they’re improving, and quickly. Each time they
beat their previous high score, they should feel a sense of triumph, and will prob-
ably yell and raise their hand in the air.
7. Score a Tetris
After one or two games, players should quickly figure out that the best way to
score a large amount of points is to go for a Tetris, clearing four lines at once.
The player should also realize fairly quickly that this is a somewhat risky strat-
egy—it relies on keeping one thin line of blocks open as you build, and it relies
on getting lucky drop in the form of a long, thin block. The player should under-
stand that this is a high risk, high reward strategy. If it causes them to lose, they
should only blame themselves. This might be manifest by them saying “I got
greedy there—damn!” or similar.
Challenge
What are the intended challenges of the game? What type of challenge is the game
about?
The primary challenge is sorting irregular shapes into ordered patterns (horizontal
lines) under ever-increasing time pressure. The skill is in quickly evaluating the play
field relative to the currently falling piece and placing that piece such that it creates as
few inaccessible holes as possible. Advanced players will plan ahead, constructing the
playfield so later pieces may be fit more easily, especially long line pieces (clearing
four lines simultaneously to create a Tetris.) Anything that interferes with the player
being able to engage with this core challenge should be treated as a usability issue.
Assembling this into a shorthand Excel spreadsheet, I end up with this in next
page top.
There are a few things going on here. First, you’ll notice that I’m conflating
usability concerns with intended experience. This is because there is a direct rela-
tionship between the two most of the time. By my earlier definition, a usability con-
cern is something that interferes with a game design concern, preventing the player
from properly engaging with it. So all I’ve done here is correlate desired experi-
ence with the usability concern that “unlocks” it. If the player doesn’t understand
that they can rotate the blocks left and right, they won’t be able to order and sort
them. Second, I’ve put in a numeric ranking 1-5 for each experience. This is to
get a general sense of how close we’re getting on this particular part of the experi-
ence. Again, this might be on a scale from total apathy to raucous laughter. What
you’re measuring is determined by the experience in the box above. If the player is
supposed to feel overwhelmed, but the game ends abruptly and they’re confused
as to why, this might be a 1 or 2. We need to adjust the increment at which the
speed builds up each level, perhaps, or further emphasize that when the playfield
301
CHAPTER SEVENTEEN (B) • FURTHER THOUGHTS FROM STEVE SWINK ON GAME USABILITY
is full the game is over. The idea is pinpoint the areas that need the most attention.
Finally, I’ve left large blanks for note taking. This is the meat of any test, obviously.
You’re clear about what experiences you’re going for and what it is required that
the player get in order to engage with those experiences, but it’s specific things that
the player does and says that will really tell you how to proceed. I recommend the
following note-taking format:
302
17.11 THE TETRIS TEST
This is what the sheet looks like when filled out with notes:
Cool! What I’ve ended up with here is a couple clear places to focus my efforts.
The problem with counter-clockwise rotation is a bug that will get fixed. We’ll need
to run another test to see if the player can figure it out once that’s fixed. The other
problem was with the challenge ramping up. The player died suddenly and didn’t
realize why. This is something of a usability issue in that the player misunderstood
what was going on and it prevented him from getting the desired experience. The
solution may be to add some kind of warning when blocks are getting close to the
top, or may be a simple fluke that we can live with. Additional tests will be instruc-
tive here.
This is essentially represents what I would be otherwise tracking in my head
were I taking the traditional Wild West approach to game testing. It may feel a lit-
tle like homework, but codifying things this way has a few great benefits, among
which are clarity, scope, and transparency. For me, clarity is the big one. I’m no
Reiner Knizia, and I realize this. I can’t create and playtest an entire game in my
head in perfect detail. Writing this stuff down and thinking it through provides a
clarity about what exactly is being built that seems refreshingly thorough compared
to simply keeping it in one’s head. These details may change as the game evolves
and we discover that it appeals in different ways. But since our focus here was on
303
CHAPTER SEVENTEEN (B) • FURTHER THOUGHTS FROM STEVE SWINK ON GAME USABILITY
defining the experience—the actual output of the game in terms of the player and
how they behave and how they feel about the game—what we have is more head-
ing than task list. We can be flexible. Maybe when we watch a playtest of Tetris, the
player doesn’t feel compelled to sort the blocks and keep them as ordered as pos-
sible. That’s a generalized, high-level problem to which there could be hundreds of
different solutions. We could give a visual and aural reward when blocks are linked
together, or punish the player by making holes in their structures glow in red. The
point is, if we’re clear about the outcome, we can keep trying things until something
works. If we design only one possible solution to this problem and don’t address
the fact that the eventual output of the system is a particular experience, we’re lock-
ing ourselves into an implementation that might not work. This is bad; it drastically
reduces the possibility of success by treating the design as a “right answer” prob-
lem. In other words, you’re looking for the one right answer to the problem instead
of really exploring the potential design space. If instead you’re clear on what the
eventual outcome should be, and frame it as a high level experiential goal, you can
brainstorm 20 different implementations. I guarantee that if you think of twenty dif-
ferent approaches to a well framed design problem, you’ll find answers that are sur-
prising and surprisingly good.
This clarity also allows me to address scope from an early point in development,
which has ramifications for artists, designers, and programmers. Say the goal is to
create a particular emotional connection between the player and a character in the
game, something a lot of designers seem to be attempting just now. If we’re clear
on that point, we can be flexible about implementation. There are as many different
approaches to this problem as there are games that try to do it. One solution is to
try to make photorealistic characters with thousands of animations and to try to win
the player over that way. Another approach is through a long, detailed story that
brings out the character through narrative. But maybe we don’t have the budget,
time, or personnel to attempt these things. The strongest emotional connection I’ve
seen between players and characters in a game exist in two games: The Sims and X-
Com: UFO Defense. These games used clever systems design to give the characters
meaning to the player, and featured characters with cartooned and generic faces.
Well, herm, maybe this is the route we should go? It’ll probably cost less. Again,
if we’re clear on our eventual output—that the player care deeply about particular
characters—we can try anything to get there.
Defining by experience also has another great ancillary benefit: transparency. One
of the most frustrating things for an artist or a programmer is to have to redo work
when a designer decides that a particular system is simply not working or needs to
be modified to change the player’s experience. Luckily, this problem is easily solved.
If you have a clear vision for what the experience of the game is intended to be and
share it with the team up front, the changes will make sense. Be forthcoming. Say
“look, this is what we’re trying to do. I’m going to have you build this thing to test it
out, but if doesn’t work, we may have to try something else.” And now we’re finally
back to the original topic I was intending to write about, usability and game feel.
304
CHAPTER
EIGHTEEN
Interview about
Prototyping and
Usability with
Jenova Chen
Jenova Chen is one of the first-generation video game
design graduates from USC School of Cinematic Arts,
creator behind the multi-award-winning student game
Cloud and flOw, and co-founder of thatgamecompany,
(flOw PS3 & Flower) Jenova Chen is dedicated to expand-
ing the emotional spectrum of videogames and making
them available for a much wider audience.
Have you utilized user testing and feedback in the design of your own projects?
FlOw was created as a master of fine arts thesis at the University of Southern
California. Its user testing and feedback process was a little bit different from tradi-
tional game development.
flOw’s user feedback collection process can be divided into three different
phases:
1. Mentor and peer review
2. Friends and family testing
3. Beta testing
At the beginning of the project, while I was still consolidating thoughts and ideas
for the design of the game, I shared my designs with my mentors and classmates
who have a lot game design experience. Even though there was no game playable
305
CHAPTER EIGHTEEN • INTERVIEW ABOUT PROTOTYPING AND USABILITY WITH JENOVA CHEN
yet, the bouncing of ideas between experienced designers was a great way to pre-
vent potential flaws in the game. This kind of review allows you to test out the high
levels of game design. And it continues through the entire project.
Once there were gameplay prototypes available, I had friends, classmates, and
family members test them out. During this phase, because the prototypes are in a
rather rough stage, certain guidance and help is needed to carry players through
the game. And the testing is more focused on the core mechanics, whether the core
gameplay mechanics work for the players or not.
Beta testing happened when the game was almost done, soon before release. I
hosted the beta test for flOw on the Internet three weeks before the deadline. Having
a large number of people playing the game will quickly provide you feedback. This
feedback will help you further polish the game.
What role do prototypes and prototype iteration play in getting feedback from
players as you work?
The way we perfect a video game experience is very simple and similar to how we
design many other human experiences.
306
CHAPTER EIGHTEEN • INTERVIEW ABOUT PROTOTYPING AND USABILITY WITH JENOVA CHEN
How do you approach prototyping in general? Are there shifts in how this works
when you investigate dynamic system adaptation as you did in flow?
Interestingly, the dynamic system I used to control the difficulty itself was a big
unknown when I started the project. Therefore, I approached the game with multi-
ple prototypes.
● Visual prototype, to test out the system limitations of how complicated the flOw
creature could look, and how it moved. I realized that Flash 8 doesn’t support
transparent objects and solid objects very well. It informed us and influenced the
line art look of the final game.
● Control prototype, to explore the potential ways players to control the flOw crea-
ture. I realized it was quite stressful to ask the player to constantly hold down
the mouse button to move. Eventually, we let the creature be able to move even
when player was not holding the mouse button.
● Gameplay prototype, a sandbox with creatures and food where I could mess
around with different rules and mechanics. Unlike the scroll bar in the Traffic
Light prototype, I ended up deciding to use a special food creature which is part
of the gameplay to allow the player to switch levels and further change the game
difficulty experience.
● Sound prototype, flOw’s mesmerizing experience relied on a non-conventional
sound and music interaction. However, I am not a musician, so prototypes for
sound and music were made and sent to a real composer for feedback.
What recommendations do you have for developers who would like to use proto-
types to get user feedback?
For Prototypes
You need to know when you should make a prototype. It is a good idea only when
the feature you want to prototype is somewhat risky. Either you are not sure of a
certain design and you really want to test it out, or you feel the implementation
of a feature takes too long, and there are other design questions depending on the
feature to answer.
When you make a prototype, make sure everybody on the team is aware of
it. After all, the value of a prototype is to convince others whether certain design
works. And make sure the way you make your prototype is quick and dirty. If you
spend too much time perfecting a prototype on the bits and pieces, your goal fails.
You want to test out an idea with the least amount of resource on it.
For Feedback
If you test your prototypes on your team members, make sure you communicate
enough information to them before they start. They need to know things like why
307
CHAPTER EIGHTEEN • INTERVIEW ABOUT PROTOTYPING AND USABILITY WITH JENOVA CHEN
you made the prototype and how you’re going to use it. You need to guide them
through the glitches and flaws in your prototype and help them to focus on the real
issues your prototype is made to test.
If you are testing your prototype on real players, you need to pay extra attention
to their reactions. Outside players are generally shy, and they don’t react the same
way under observation as they would at home. To get the best play test result, you
should keep yourself away from the player’s awareness. There are many ways you
can do this. You can use a camera. You can hire or pretend to be an outsider rather
than the maker of the game.
308
CHAPTER
NINETEEN
Social Psychology and
User Research
Katherine Isbister, prior to joining the IT University of
Copenhagen Center for Computer Games faculty as an
associate professor in 2008, was an associate professor
at Rensselaer Polytechnic Institute in New York, where
she was founder and director of the Games Research
Laboratory. Dr. Isbister’s current primary research
interests include emotion and gesture in games, sup-
ple interactions, design of game characters, and games
usability. Isbister’s book—Better Game Characters by
Design: A Psychological Approach—was nominated for a Game Developer Magazine
Frontline Award in 2006. Isbister serves on the advisory board of the International
Game Developers Association Games Education Special Interest Group, and
is the vice chair of the Game Studies Special Interest Group of the International
Communication Association.
309
CHAPTER NINETEEN • SOCIAL PSYCHOLOGY AND USER RESEARCH
310
19.2 SOME HELPFUL SOCIAL PSYCHOLOGICAL FINDINGS
Therefore, game developers can make conscious choices about what qualities
they would like to come across in a player’s first impression of his/her avatar, and
of NPCs in the game, and then can use psychological findings to help guide the
design choices that are made. User research can confirm whether or not the avatar
or NPC is “reading” as it should for players. Researchers can even find and use the
same measures that are used in social psychological studies to figure out whether
the game characters are hitting the mark.
There are many facets to first impressions; what follows here is a targeted subset
that may be especially useful in game character design.
Attractiveness
There is a reason that the cosmetics and beauty industry is so large. Social psy-
chological research has demonstrated that people who are thought to be attractive
are perceived to have other positive qualities that do not rationally follow—what
is termed a “halo effect.” For example, they are also perceived to be more intel-
ligent and capable, may be given preferential treatment in work situations, and are
awarded bigger settlements in experiments that simulate jury trials. Making a char-
acter attractive (by the standards of your game’s target audience, that is) can create
powerful positive attributions from players right from the start.
311
CHAPTER NINETEEN • SOCIAL PSYCHOLOGY AND USER RESEARCH
The answers to these two questions shape the strategies a person will take in inter-
acting with the other. Will this person become an ally? Is the person a potential threat?
Where do they fit in the social hierarchy in relation to me? What sorts of relationships
are possible and desirable between us? Because social encounters unfold so quickly,
people are very adept at reading cues of friendliness and power in others in just a few
moments. As a game designer, you can make strategic use of these cues to telegraph
to the player what a character’s social position and relationship to the player’s charac-
ter are. Social psychologists have made in-depth studies of the various cues of domi-
nance and friendliness, and these can be used as guidelines for developing character
appearance and behavior (for a detailed taxonomy of these cues, see Isbister, 2006).
Marks of belonging
Another very practical set of judgments that people are making when forging a first
impression, are ideas about what sorts of group memberships a person has—what
cultural and subcultural groups does this new person belong to? This includes social
class, political stance, and many other factors. Getting a read of group memberships
helps establish potential common ground, or potentially troublesome conflicts.
People literally “wear” these cultural and subcultural memberships, encoded in
the choices they’ve made about how they dress, how they move, their style of speak-
ing and the language they use. Game designers can use these group memberships to
develop a plan not just for how a character will dress, but also, for how the character
will move and speak, and how s/he will behave toward others depending upon their
312
19.2 SOME HELPFUL SOCIAL PSYCHOLOGICAL FINDINGS
own group markers. Showing these nuanced reactions among characters is a very
powerful way to create heightened believability and engagement for players.
Mood management
Another very helpful category of social psychological findings for game designers
is results concerning the mechanics of emotions. There are two specific results that
I’ve found to be very useful in understanding why certain design choices work well
to create player emotions:
Emotional contagion
This result was initially put forward by social psychologists, and since then, neu-
ropsychologists have found complementary information that backs up what was
observed in the laboratory. Essentially, it is the case that human beings are highly
susceptible to feeling the feelings of others. When a person talks with another per-
son who expresses a feeling, s/he subconsciously and subtly mimics the expres-
sion of those feelings, and also internally begins to feel those feelings as well.
Evolutionary biologists have suggested that this is part of how the powerful social
bonds among people (and other primates) are formed. Neuropsychologists have
found that there is physiological support for these observations—they’ve found
what have been termed “mirror neurons,” which fire in the brains of primates when
they observe others taking an action such as expressing an emotion. These neurons
fire as if the primate itself were taking this same action.
What all this means for game designers, is that you have a very powerful psy-
chological mechanism at your disposal. In particular, designers can use the expres-
sion of emotions in both the player character and in NPCs in a game to powerfully
influence the feelings of the player him/herself. For example, if the player sees his/
her avatar gleefully celebrating a victory, this can heighten his/her own feelings
about that victory. Conversely, if the player sees his/her avatar calmly navigating
obstacles despite the player’s own nervousness, it can help to steady that player’s
nerves. Emotional contagion is also a very useful principle to consider in the design
of social games. If the designer wants to evoke a certain mood in the group of
players, s/he can use the actions and reactions of their on-screen avatars to
encourage and exaggerate to move people toward this mood state.
313
CHAPTER NINETEEN • SOCIAL PSYCHOLOGY AND USER RESEARCH
they had to use by holding a penlike device in their mouth. Half of the participants
were told they should purse their lips around the device as they used it, which hap-
pened to activate their frown muscles. The other half were told they should clench
their teeth with lips parted, which happened to activate their smile muscles. The
latter group reported a more positive impression of the device!
Game designers can use this physical feedback loop particularly in the case of
input devices that allow for physical movement, such as the Sony EyeToy or the
Nintendo Wiimote. Getting players to move as if they feel certain ways can then
lead them to attribute these feelings to themselves as they play—for example, I’m
smiling and gesturing as if I’m happy so I must be happy.
314
19.4 REFERENCES
valuing these qualities in your testing results will help ensure that they catch
any problems as they emerge.
3. Make decisions and tradeoffs throughout production based upon these criteria.
Inevitably in the course of production, many features of a game end up cut
due to time and budget limitations. If your team has decided it values certain
social and emotional impressions on players, then it’s important to weigh these
in decisions that get made about cuts. There may be ways to preserve these
impressions with creative reworkings of key moments or qualities in the game,
but this will only happen if the team is keeping an eye on these effects and mak-
ing sure they are preserved when things get scaled down.
19.4 References
Ambady, N., & Rosenthal, R. (1993). Half a minute: Predicting teacher evaluations from thin
slices of nonverbal behavior and physical attractiveness. Journal of Personality and Social
Psychology, 64(3), 431–441.
Isbister, K. (2006). Better game characters by design: A psychological approach. San Francisco,
CA: Morgan Kaufmann.
315
CHAPTER NINETEEN • SOCIAL PSYCHOLOGY AND USER RESEARCH
Reeves, B., & Nass, C. (1996). The media equation: How people treat computers, television,
and new media like real people and places. Cambridge: Cambridge University Press.
Strack, F.L., Martin, L., & Stepper, S. (1988). Inhibiting and facilitating conditions of the
human smile: A nonobtrusive test of the facial feedback hypothesis. Journal of Personality
and Social Psychology, 54, 768–777.
19.5 Resources
Fiske, S.T., & Taylor, S.E. (1991). Social Cognition. New York: McGraw-Hill, Inc.
A thorough overview of cognitive approaches to understanding social behavior—topics
like the use of social categories and schemas, how we form impressions and inferences
about one another and what attracts our attention, how we form attitudes about thing
and each other, and much more. Great material for building interesting in-depth characters
that feel realistic to players in terms of their assumptions and problem-solving strategies in
interaction. Also great for understanding how to set up multi-player dynamics and situa-
tions, and casts of playable avatars.
Goffman, E. (1967). Interaction ritual: Essays on face-to-face behavior. New York: Pantheon
Books.
A classic text that analyzes interpersonal interaction in unexpected and very useful ways
for design thinking. Goffman introduces notions of what drives the dynamics of interac-
tion such as the idea of “saving face.”
Gudykunst, B., & Mody, B. (2002). Handbook of international and intercultural communica-
tion., 2nd ed.Thousand Oaks, CA: Sage Publications.
Includes valuable research findings on emotion and culture, and other topics that become
very relevant when designing games that appeal across cultures.
Knapp, M.L., & Hall, J.A. (2002). Nonverbal communication in human interaction. Australia:
Wadsworth Thomson Learning.
Another excellent overview textbook that covers all the fundamental research in how non-
verbal communication works. Very helpful for designing characters that have realistic and
engaging nonverbal behaviors.
Oatley, K., Keltner, D., & Jenkins, J.M. (2006). Understanding emotions. Madden, MA:
Blackwell Publishing.
A recent and comprehensive introductory text that covers the basics about emotion—from
cues that you can see in people’s bodies, to the physiology of emotion, to cultural and
social factors. A great place to begin if you are looking for more information about how
emotion works and how to use it in your designs.
Zebrowitz, L.A. (1997). Reading faces: Window to the soul? Boulder, CO: Westview Press.
A great overview of some of the main psychological effects associated with faces, includ-
ing attractiveness and babyface findings.
316
CHAPTER
TWENTY
The Four Fun Keys
317
CHAPTER TWENTY • THE FOUR FUN KEYS
318
20.2 EMOTION AND ENGAGEMENT IN PLAYER EXPERIENCES
and mastery from overcoming obstacles. Outside of goals, games provide novel oppor-
tunities for interaction, exploration, and imagination, which create Easy Fun. Games
that use emotions in play to motivate real-world benefits to help players change how
they think, feel, and behave or to accomplish real work create Serious Fun. Finally,
games that invite friends along get an interpersonal emotional boost from People
Fun (Lazzaro, 2004b). The Four Fun Keys are a collection of related game interac-
tions (game mechanics) that deliver what players like most about games. Each offers
a key to “unlock” unique emotions such as frustration, curiosity, relaxation, excite-
ment, and amusement. Best-selling games provide features that support at least three
of these Four Fun Keys to create a wider emotional response in the player. To keep
things fresh during a single-play session, gamers move between the four different
play styles (Lazzaro, 2004b). Developing each key focuses and rewards the player
with emotion from a self-motivating experience that deepens the game’s player-
experience profile. Designers of products and productivity software can also use these
Four Fun Keys to increase emotional engagement for applications outside of games.
Only some of the emotions from playing basketball in the real world come
from the Hard Fun of making baskets. Close examination reveals that all Four Fun
Keys are part of this popular sport. Dribbling the ball or doing tricks like a Harlem
Globetrotter offers Easy Fun from novelty and role-play. Intentionally blowing away
frustration and getting a workout creates Serious Fun. Competition and teamwork
make the game even more emotional from People Fun. All four types of fun make
basketball’s player experience more enjoyable. None of these require story or char-
acter. Through examination of how each type of fun creates emotions, designers
and researchers can create better and more emotional player experiences.
319
CHAPTER TWENTY • THE FOUR FUN KEYS
see games that are highly usable, but are no fun. In XEODesign’s lab we often see
games that do just that. To differentiate the two we use the term user experience
(UX) and player experience (PX) defined as follows:
● UX: is the experience of use, how easily and well suited to the task, what the per-
son expects to accomplish.
● PX: is the experience of play. How well the game supports and provides the type of
fun players want to have. Players cannot just push a button once and feel like they
won.
Put even more simply for UX we look at what prevents the ability to play and for
PX we look for what prevents players from having fun. To test games the first step is
to divide the features into two buckets, then apply different techniques to measure
and improve the quality of each. Comparing the goals of user and player experiences
reveals they are used for different purposes and that they strive for different values.
User experiences and player experiences are like 2 wheels on a bicycle. One con-
nects to the drive chain to make the bike go (UX), the other wheel steers and creates
the fun (PX). The practice of improving software has only identified a few spokes
on that rear wheel: heuristic evaluation, usability testing, time on task, reduce error
rates, satisfaction surveys, certain ethnography such as contextual inquiry. All of these
improve interface design and the quality of the user experience. None of these usabil-
ity related practices address specific emotions. If anything, current UX methods target
a single emotion, frustration, in order to reduce it; and they track “satisfaction” with-
out a precise definition. Taken to an extreme a system that is 100 percent usable will
have few errors and require little effort, however this risks boring workers by making
a task too routine. It also does nothing to increase a worker’s sense of accomplish-
ment from mastering a complex task or a job well-done. Usability alone is not enough
to improve all aspects of interactive experiences humans enjoy at work or at play.
320
20.2 EMOTION AND ENGAGEMENT IN PLAYER EXPERIENCES
mechanics that offer players emotions they enjoy give players a reason to play.
Games are self-motivating activities. Emotion plays a big role in this. Emotions focus
attention, make decisions, improve performance, create enjoyment, and reward
learning (Lazzaro, 2007). Researchers are only beginning to understand the impor-
tant role that emotions play in decision making. In fact people without emotional
systems cannot make choices (Damasio, 1994). Because games are about making
interesting choices, studying the emotional reactions of player serves a critical role
in improving the quality of player experiences (PX.) Without emotion, or too much
of the wrong kind, gameplay feels flat and uninspired. Players know what to do.
The game is usable. They know how to play, but they don’t know how to have fun.
321
CHAPTER TWENTY • THE FOUR FUN KEYS
emotions in detail, but like most emotion research, focuses on negative problematic
emotions rather than enjoyable ones. Ekman also focuses on real-life emotions rather
than how emotions come from entertainment. Tiger, Jordan, and Norman describe
the importance of emotion in product design but focus on general positive or negative
emotion, but not how to measure or increase specific emotions such as curiosity or
amusement. There is likewise very little discussion about the role specific emotions
play in making different types of decision or how one emotion builds into the next.
TA B L E 20.2 Comparison of Models for Creating Emotion and Engagement
322
20.2 EMOTION AND ENGAGEMENT IN PLAYER EXPERIENCES
TA B L E 20.2 (Continued)
XEODesign Hard Fun Easy Fun Serious Fun People Fun
Four Fun Keys Fiero Curiosity Relaxation Amusement
Challenge Novelty, Real World Social Life,
Game, Goal Fantasy Game, Purpose Life, Goal
Open Ended Open Ended
Tiger (1992) Physio-pleasure Ideo-pleasure Socio-pleasure
Jordan (2000) Psycho-pleasure
Wright et al. Spatial- Compositional
(2003) temporal thread
thread Sensual thread
Emotional thread
Common Drama Character Setting, Plot, Catharsis, Music, Character
and Theater Motivation, Story, Character, Set and costume Dialogue
Constructs Plot points, Suspension of design Acting
Objectives 3- disbelief
act structure
(Lazzaro, 2007)
1.1.1
To help make better games, researchers and designers need tools to measure and
adjust the emotional engagement coming from play. Reviewing the literature we
found very little on how experiences create emotions, let alone how games create
emotions. The three most relevant and detailed models were Csikszentmihaliyi’s
Flow and Bartle’s four player types, and Norman and Boorstin’s three sources of
engagement (Norman, 2004, Boorstin, 1990). However, none of these mapped out
the wide range of emotions we saw when people played their favorite games. None
of them broke out different factors for creating engagement.
Csikszentmihaliyi’s Flow models one aspect of how games create engagement
(Csikszentmihalyi 1990). The flow model offers two parameters for designers to
adjust and three emotional states: boredom, anxiety, and “flow” which is more of
a state of engagement than an emotion. In testing games we often measured how
players responded to other parameters than the game’s balance of skill and diffi-
culty. During great gameplay we knew that players responded to: reward cycles, the
feeling of winning, pacing, emotions from competition and cooperation. To get great
gameplay designers had to make a lot of adjustments, not only in difficulty.
Our experience testing players showed there were several types of player behav-
ior not predicted by Csikszentmihalyi’s model for Flow. The experience of being in
a flow state is an important part of many ways that games create engagement not
just the actions that are challenging. Players also experienced other emotions such
as curiosity in addition to frustration. The most engaging designs that came through
our lab often started with challenge but players preferred games that offered more
than balancing difficulty with skill. Examining the relationship between players’
323
CHAPTER TWENTY • THE FOUR FUN KEYS
favorite emotions and how they play we saw that people played for other experi-
ences as well. Players clearly responded to factors outside the Flow model.
Similarly, we saw that players enjoyed games in more ways than Bartle’s four
player types (or his revised model). There were likewise more forms of creating
engagement than in Norman and Boorstin’s models. Players enjoyed more than
whether an emotion was positive and negative or was arousing or relaxing. In short
game designers needed a model for creating emotion from gameplay and research-
ers needed a way of collecting data from players to inform the designers.
Given the lack of research on this subject we decided to use a simplified ver-
sion of Paul Ekman’s Facial Action Coding System (Ekman, 2003) to identify what
emotions came from what players liked most about games. Watching the emotions
on players’ faces that would lead us to understand how they relate to the types of
choices that players liked the most. We would hack the “what’s fun” problem from
the player’s perspective.
In designing the studies to look at how games create emotions we kept two
things in mind. First to design emotions game developers needed a way to measure
specific emotions. Second, the emotions to measure were ones that relate to what
players like the most about games.
324
20.4 HARD FUN MECHANICS
Hard Fun comes from a careful balance of three emotions. The most important emo-
tion has no word in English, so at XEODesign we borrow an Italian word, fiero (like
the car) which means personal triumph over adversity (Ekman, 2003). For example
fiero is the feeling you get from winning the Grand Prix or beating the boss monster.
Players experience fiero often scream “Yes!” and punch an arm up over head, jump
their characters, or do a victory dance. If the feeling is especially strong players even
jump up out of their chairs.
Hard Fun: The opportunity for FIGURE
challenge and mastery 20.2
Players cannot push a button and feel fiero, they must feel frustrated first.
In Hard Fun players cycle between three emotions: fiero, frustration, and relief.
We call the way players cycle between emotions a PX Spiral. During play gamers
often start bored (a top reason to play a game), then become frustrated as they
work to solve the challenge. When they solve the challenge they feel fiero causing
a huge state change in the body where they go from feeling very negative to feel-
ing very good. As the feelings of fiero fade the player feels relief. Then the player
encounters a new challenge and the cycle repeats.
Because game design requires balancing a number of choices and parameters, it
is helpful to first focus on the Hard Fun of the game and creating fiero. Fiero is the
strongest and most satisfying emotion coming from Hard Fun mechanics and for
many players fiero is their favorite emotion. Fiero also offers a special paradox to
researchers and designers: usability requires removing frustrating features, where
as mechanics that produce fiero demand adding them. In game testing separating
good frustration from bad frustration is a requirement during observations aimed at
improving Hard Fun. Make a game too usable and it is no fun at all.
325
CHAPTER TWENTY • THE FOUR FUN KEYS
this way each type of fun focuses on different types of choices with different kinds
of feedback and therefore creates different emotions. To create the emotions in the
PX Spiral for Hard Fun the game requires different mechanics than those found in
other types of fun.
The emotions for Hard Fun come from the choices and feedback relative to a
goal with at least one major obstacle. We call how the mechanics create the emo-
tions a PX Profile. It is possible to target these emotions and increase the Hard Fun
of a game by adding mechanics such as the extra bonus coins in Zuma or seating
customers by color in Diner Dash. Creating a PX profile helps explore the relation-
ship between choices, feedback, and player emotions.
The emotions for Hard Fun come from the player using the controls to make
choices, develop strategies, overcome obstacles, and achieve the goal. Typical Hard
Fun mechanics include short term and long term goals, obstacles, levels, boss mon-
sters, and power ups. All of these vary the pace of the game, affect the challenge
ramp, and enhance player feelings of accomplishment. Player testing of Hard Fun
examines what kind of response these mechanics create in players.
TA B L E 20.3 Hard Fun PX Profile
goals fiero
challenge frustration
obstacles boredom
strategy
power ups
puzzles
score
levels
monsters
Hard Fun mechanics are the ones most at odds with traditional measures of usabil-
ity. Here is where usability recommendations can do the most damage to gameplay.
Usability advice to widen and lower a basketball hoop will reduce error rates however,
it also makes the game less fun. Pushing one button to buy a car upgrade improves
the game, pushing another button to win the Grand Prix does not feel like winning.
326
20.5 HOW HARD FUN MECHANICS WORK TOGETHER TO CREATE MASTERY
Hard Fun: Mastery creates Fiero. Player choice rewards effort. FIGURE
327
CHAPTER TWENTY • THE FOUR FUN KEYS
Easy Fun is the bubble wrap of game design. Best selling games offer interactions
outside the main challenge to inspire player imagination and capture their attention
in between challenges. Novelty inspires player curiosity to fill their attention and
motivate different kinds of play. In Easy Fun these opportunities for fantasy, explo-
ration, and role play increase immersion into the game world outside of the main
goal and offer a refreshing alternative to the emotions from Hard Fun.
For Easy Fun novelty inspires player curiosity similar to the role challenge has in
Hard Fun. Because of this Easy Fun lacks the structure of Hard Fun. Instead players
play for the sheer enjoyment of the interaction. Like Improv theater, games such as
Grand Theft Auto (GTA) make offers to the player. To get from point A to point B in
a mission the game offers the player a car, in fact any car they want and then other
things such as parking meters, plate glass windows, and freeway exit ramps. It is up
to the player to accept these offers and see what happens.
Easy Fun plays an important role in the life cycle of a play session. Players often
self regulate their emotions when the challenge becomes too hard by switching from
Hard Fun to Easy Fun such as goofing off inside the game, off track play, or explor-
ing what Will Wright calls “interesting failure states.”
328
20.6 EASY FUN
Games without enough Easy Fun may be highly usable and have appropriate
challenge ramp, but players will play less if they don’t want to see what is on the
next level, they don’t enjoy the theme, or if the controls feel arbitrary or too real-
istic. Accelerometer games such as XEODesign’s accelerometer game Tilt on the
iPhone or Wii Sports on the Wii creates part of their appeal from the controls them-
selves. The difference between using the real object and the virtual one increases
engagement. Without enough Easy Fun players are more likely to become frustrated
with pursuing the game’s main goal. Oftentimes players find the theme unappealing
or the story uninspiring. They don’t see the point and don’t care about the outcome.
If the game is only about the Hard Fun players loose interest.
329
CHAPTER TWENTY • THE FOUR FUN KEYS
compelling if someone “spoils the plot” by telling you the ending. If the outcome is
known the only curiosity left is how the characters get there.
Identify a game’s Easy Fun by looking at how these kinds of choices and feed-
back create the emotions of curiosity, surprise, and wonder.
330
20.8 HOW EASY FUN MECHANICS WORK TOGETHER TO INSPIRE IMAGINATION
players to explore what Will Wright calls interesting failure states or what Hal
Barwood calls the pure joy of figuring it out.
331
CHAPTER TWENTY • THE FOUR FUN KEYS
disinterest. If the game becomes too predictable the player leaves because they are
bored, if it becomes too novel then they quit because it does not make sense. To borrow
a phrase from literature the game balances novelty and boredom to create a suspension
of disbelief. If something is too predictable players leave because they are bored, if it is
too improbable players slip into disbelief. A 100 percent novel experience where noth-
ing is recognizable would be confusing like a gun that shoots flowers. A 100 percent
familiar experience is too much like real life. Games strive to balance between these.
Serious Fun* is where players play with a purpose to create something of value
outside of the game itself such as to relax after a hard day at work. Players aim
to change how they think, feel, behave, or to accomplish real work. Players often
select games based on how they feel before, during, and after play. Serious Fun
requires engaging the player viscerally and mentally. Players use the fun of games
to motivate the development of other skills or to change how they feel inside. For
Serious Fun we look for what players are doing to relax, create excitement, learn, or
do real work through play. They play with a purpose or use games as therapy.
Serious Fun focuses on the emotions created at the intersection of the game and the
player in the real world. Where as Hard Fun and Easy Fun both create emotions about
events inside the game world. In Serious Fun players feel differently about how the
game changes their real life. Although like Easy Fun, Serious Fun offers engagement
without challenge; Serious Fun is different than Easy Fun in that it creates engage-
ment directly from visceral sensory stimulation and thoughts about the game rather
than through the imagination and curiosity of Easy Fun. People play the game because
the game gives them something they value and reflects their values. These additional
outcomes and reasons to play creates emotions as well. Most importantly Serious Fun
uses different mechanics to create different emotions than other kinds of fun.
Serious Fun creates emotions about benefits from playing a game such as play-
ing Dance Dance Revolution to loose weight or Brain Age to get smarter, or playing
Halo to blow off frustration at their boss. The emotions from play reward prac-
tice. Some games create a real work product such as the ESP Game developed
at Carnegie Mellon University, where people play a guessing game to make the
otherwise boring task of providing text labels for images on the Internet more exciting
*Note: At first we called this type of fun Altered States because it was clear that players played to
change how they felt (Lazzaro, 2004a, Lazzaro, 2004b). The visceral sensations from the game’s graph-
ics, audio, and rhythm clearly created enjoyment. As we continued our analysis we then found that
those who played word and card games or ones that did real work wanted a mental workout and often
created a real world skill or work product. Therefore, we renamed this playstyle Serious Fun.
332
20.10 SERIOUS FUN EMOTIONS
(von Ahn, 2004). That players accomplish a real world task increases their enjoy-
ment. Simulation games can also teach complex ideas such as city management
(Sim City) or leadership (running a guild in World of Warcraft (WOW)) (Gee, 2003).
Such simulation games give players the ability to make choices and get real-time
feedback—an experience that reading from a textbook cannot.
Games low on Serious Fun feel like a waste of time. The enjoyment quickly fades
as the game does not make a lasting impact on how the player feels, or it creates a
less desirable mental state such as watching too much TV. While all games to a
certain extent are “time wasters” especially among adults, players believe they pro-
vide value whether it is stress release or a quick break. Without visceral or mental
stimulation from Serious Fun the game often is not engaging enough to change how
they are thinking and feeling.
A common flaw in games is that players may find it challenging (Hard Fun) and
are curious about the theme (Easy Fun) but the game fails to establish a rhythm, or
provide enough visceral stimuli to draw them in, or does not let them express their
interests, morals, or values. For example the pacing of interaction for the game may
be too chaotic for players to find a pattern. In this way Serious Fun can influence the
enjoyment of other kinds of fun. The basic sequence of moves may require too much
thinking for players to complete a strategy (Hard Fun). If the game does not engage
them to relax, get excited, or enough mental stimulation then it fails to provide them
an experience they value. Players often enjoy learning something they don’t know,
even if it is as simple as where chocolate comes from, as in the game Chocolatier.
333
CHAPTER TWENTY • THE FOUR FUN KEYS
334
20.12 HOW SERIOUS FUN MECHANICS WORK TOGETHER TO EXPRESS AND CREATE VALUE
Bros, Gears of War, or Animal Crossing expresses a player’s identity and their sense
of values to themselves and to others.
Serious Fun: Players value how a game makes them feel and helps FIGURE
them change themselves 20.7
335
CHAPTER TWENTY • THE FOUR FUN KEYS
Many Serious Fun mechanics have a strong visceral component. Pleasure from the
visual and audio stimulation increases desire to continue. Matching and gathering
mechanics such as Bejeweled have a very primal sense of enjoyment. Collecting
Achievement badges in Pogo or completing a set of Pokemon cards is rewarding.
Whether it is the collecting dream jewels in Dream Chronicles or a gold star for an
expert score in Diner Dash, designing game objects that look valuable or pleasing
to hold enhances the feeling of collecting them. Bejeweled would create a very dif-
ferent experience if instead of matching rubies, diamonds, and emeralds the player
matched dog droppings and dirty broken glass.
336
20.13 PEOPLE FUN
they don’t like playing games. Games often serve as icebreakers, topics of conversa-
tion, something to get the party started, or structure the conversation. Some players
enjoy talking about a game more than actually playing it.
The emotions from People Fun can also come from in game characters as well
as other players. Part of the success of Diner Dash is the tight integration of the
game mechanic with balancing the emotional states of numerous NPCs (non player
characters)(Lazzaro, 2005). Please enough customers as a waiter and the player
wins. Not all games need People Fun, however, games that lack people fun such
as Bejeweled have to be a lot stronger in the other areas to create the same level of
emotional engagement.
Games without enough People Fun offer limited interaction between players and
game characters. As a result players do not care about the plight of a game character.
Without People Fun the game fails to spark competitive urges or cooperation between
players towards a shared goal. These games can be MMOs whose NPCs feel like quest
vending machines rather than reacting to choices players make. In multiplayer games
each player’s actions can feel isolated lacking the opportunity to interact with other
players.
An easy way to reduce People Fun is to provide a highly organized experience
that is too immersive for social interaction. Such games offer one way to play and
provide too much structure, limit customization, restrict house rules, offer ridged
communication channels, and too much stimuli. Sometimes the only reason the
other player is there is to provide more competition (where Hard Fun and People
Fun overlap) and in doing so these games miss out on opportunities for other
emotions between players.
337
CHAPTER TWENTY • THE FOUR FUN KEYS
The experience of playing games together deepens social bonds. Players will
laugh at each other, themselves, and tell jokes. They develop secret languages and
pass social tokens that create rich emotional bonding between players (Lazzaro,
2008). Playing together generates positive emotions, feelings of trust, companion-
ship even in highly violent games. It is this emotion of feeling closer to one’s friends
that players most enjoy from People Fun. It is the intense feeling of closeness and
companionship after laughing with a friend. Again there is no word in English for it.
In People Fun players interact and cycle between many emotions. These cycles of
emotions offer what players like about hanging out with friends and increase social
bonding.
338
20.13 PEOPLE FUN
with which to play. In Diner Dash the game mechanic requires players to balance
the emotions of the NPC’s. In Diner Dash Home Town Hero players can do this and
play with and against other players.
TA B L E 20.6 People Fun PX Profile
Choice Emotion
cooperate amusement
compete social bonding
communicate schadenfreude
mentor naches
lead envy
perform love
spectacle gratitude
characters generosity
personalize elevation
open expression inspire
jokes excite
house rules ridicule
secret meanings embarrass
Pets
endorsements
chat
People Fun mechanics have the ability to greatly widen a game’s PX. If we look
at what choices create the emotions seen between players the PX Profile includes
game mechanics such as cooperation, competition, and the opportunity to perform
and to personalize. These choices create emotions such as schadenfreude (taking
delight in the misery of others), naches (a Yiddish word for the sense of pleasure
and pride when someone you help succeeds), and amusement between players.
Adding a single mechanic such as a tradable health pack to a game creates three
emotions: generosity when a player gives it, gratitude when a player receives it,
and elevation when someone witnesses the human kindness in the exchange. Later
on in the game the emotions switch places depending on who’s in what role in the
interaction.
339
CHAPTER TWENTY • THE FOUR FUN KEYS
provide opportunities for people to express themselves or create their own special
way to play enhances the emotions that players feel. Increase People Fun emotions
by balancing these aspects to increase the amount of player interaction and increase
amusement and social bonding.
FIGURE People Fun: Playing with friends creates amusement and social bonding
20.10
How to increase emotion from
choices involving social
interaction.
People Fun: choices with others increase emotions and social bonds
Cooperative and competitive gameplay increase the opportunities for People Fun.
In Top Spin Tennis players feel one way playing across the net and another interacting
with their tennis partner on the same side of the court. For example each car in
Mario Kart has two seats, one player drives and the other throws stuff. Players coop-
erate to win and compete against others. This also allows junior players to learn
340
20.14 A FEW SUGGESTIONS FOR APPLYING THE FOUR FUN KEYS
how to play from the back seat and eventually drive their own cars creating naches
for their mentors.
We found other interesting results in a study we ran people playing multiplayer
games and using social media (Lazzaro, 2008). The opportunity for emotions came
across three channels: how the service allowed players to connect and make new
friends, the messages that were passed between them, and the actions they could
take. The shape of these channels, how they worked, affected how and what type of
engagement they created between players. By offering different features the services
created different emotions.
People Fun is the difference between eating a cheese sandwich and eating fon-
due with friends. The additional lines of interaction structure social actions to create
more emotions whether it is helping someone with a long string of cheese or fight-
ing over the piece that fell off someone’s fork.
341
CHAPTER TWENTY • THE FOUR FUN KEYS
each type of fun can be used to focus design discussion of how to support particular
playstyles. Balancing the game between personas will widen the base of appeal by
increasing the opportunities for emotion and providing more ways to enjoy the game.
Player Experiences and emotions need to be designed and measured from the
beginning of the project, not just tacked on at the end. Starting with a playstyle
such as Hard Fun, identify the desired player emotions. Then choose the mechan-
ics (the choices and feedback) that create these types of emotions. Finally, tune the
mechanics by looking at how these choices and feedback work together as a system
to create the intended player response.
Towards the end of the process, testing a playable build with players against
opportunities for fun from each of the four Fun Keys offers a way to fine tune the
emotions from gameplay. Analyzing player behavior and responses to the game’s
Hard Fun, Easy Fun, Serious Fun, and People Fun can identify weaknesses in game
design as well as expose opportunities for deeper engagement beyond what is pos-
sible with pure usability methods.
20.15 References
Ahn, L. von, & Dabbish, L. (2004). Labeling images with a computer game. Proceedings
Association for Computing Machinery (ACM) Special Interest Group on Computer-Human
Interaction Conference (CHI), 319–326. Vienna, Austria. Publisher: ACM Press New York,
NY, USA.
Bartle, R. 1996(a). Hearts, clubs, diamonds, spades: Players who suit MUDs. MUSE Ltd,
Colchester, Essex., UK. Retrieved December 29, 2005, from http://www.brandeis.edu/pubs/
jove/HTML/v1/bartle.html
Bartle, R. (2003a). A self of sense. Retrieved December 29, 2005, from http://www.mud.co.uk/
richard/selfware.htm
Bartle, R. (2003b). Designing virtual worlds. New Riders Games. Berkeley, CA: Peach Pit
Press.
Boorstin, J. (1990). Making movies work. Beverly Hills, CA: Silman-James Press.
Csikszentmihalyi, M. (1990). Flow: The psychology of optimal experience. New York: Harper &
Row Publishers Inc.
Damasio, A. (1994). Descartes’ error: Emotion, reason, and the human brain. New York: Quill
Penguin Putnam.
Ekman, P. (2003). Emotions revealed. New York: Times Books Henry Hold and Company,
LLC.
Gee, J. (2003). What video games have to teach us about learning and literacy. New York:
Palgrave Macmillan.
Hassenzahl, M., Platz, A., Burmester, M., & Lehner K. (2000). Hedonic and ergonomic quality
aspects determine a software’s appeal. Proceedings Association for Computing Machinery
(ACM) Special Interest Group on Computer-Human Interaction Conference (CHI), 201–208,
The Hauge, The Netherlands.
Jordan, P.W. (2000). Designing pleasurable products: An introduction to the new human fac-
tors. London: Taylor & Francis.
Kim, A.J. (2000). Community building on the Web. Berkeley, CA: Peach Pit Press.
342
20.15 REFERENCES
Lazzaro, N., & Keeker, K. (2004). “What’s My Method?” A game show on games. (pp. 1093–
1094) Proceedings Association for Computing Machinery (ACM) Special Interest Group on
Computer-Human Interaction Conference (CHI), Vienna, Austria.
Lazzaro, N. (2004a, Winter). Why we play games. (pp. 6–8) User Experience Magazine, 8.
Lazzaro, N. (2004b). Why we play games: Four keys to more emotion in player experiences.
Proceedings of the Game Developers Conference, San Jose, California, USA. Retrieved
December 28, 2005, from www.xeodesign.com/whyweplaygames.html
Lazzaro, N. (2005). Diner dash and the people factor. Retrieved March 2, 2005, from www.
xeodesign.com/whyweplaygames.html
Lazzaro, N. (2007). Editors Jako, J. & Sears, A. Why We Play: Affect and the Fun of Games:
Designing Emotions for Games, Entertainment Interfaces and Interactive Products The
Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies and
Emerging Applications, (pp 679–700) Lawrence Erlbaum Associates, Inc., Mahwah, NJ.
Lazzaro, N. (2008). Halo vs. Facebook: Emotions that Drive Play. Proceedings of the Game
Developers Conference, San Jose, California, USA. Retrieved April 13, 2008 from www.
xeodesign.com/whyweplaygames.html
LeBlanc, M., Hunicke, R., Zubek, R. (2004). MDA: A formal approach to game design and
game research. Retrieved March 2, 2005, from http://www.cs.northwestern.edu/~hunicke/
pubs/MDA.pdf
Malone, T. (1981). Heuristics for designing enjoyable user interfaces: Lessons from computer
games. Proceedings Association for Computing Machinery (ACM) Special Interest Group on
Computer-Human Interaction Conference (CHI), (pp. 63–68).
Norman, D.A. (2004). Emotional design: Why we love (or hate) everyday things. New York:
Basic Books.
Piaget, J. (1962). Play, dreams, and imitation in childhood. New York: Norton.
Tiger, L. (1992). The pursuit of pleasure. (pp. 52–60). Boston: Little, Brown & Company.
Wright, P., McCarthy, J., & Meekison, L. (2003). Making sense of experience. In M.A. Blythe, K.
Overbeeke, A.F. Monk, & P.C. Wright (Eds), Funology: From usability to enjoyment
(pp. 43–53), Dordrecht, The Netherlands: Kluwer Academic Publishers. 00020
343
CHAPTER
TWENTY-ONE
Matrix of Issues
and Tools
This chapter offers two different approaches to selecting among the methods
described in the book. The first is a list of the development phases, with techniques
that may be of use in each. If you are in the midst of a project right now, you may
find this outline helpful in determining what might be of use to you. You can follow
chapter references to learn more about the techniques in the list.
The second is a table that compares the various methods described in the book,
in terms of resources and expertise required, with pointers to chapters that outline
these methods in more detail.
Finally, we’ve included a list of common complaints or concerns about usabil-
ity, and some answers you can use to help people in your team warm up to these
methods.
I. Development stages and user research techniques to use
Before a project even begins:
Management buy-in. A first step in successful user research is getting everyone
on board for what you are doing, and setting good processes up for including
the results of research in your development process. (Chapter 2).
Researching how other developers use user research. You may want to read the
interviews and writings by game developers about how they use these tech-
niques, to get ideas and inspirations (Chapter 3; Chapter 4; Chapters 11, 15,
and 17).
Instrumenting your game. If you know you want to collect metrics and use these
to influence design, you will want to build this into your process from the very
beginning (Chapters 9 and 15).
Consider setting up company standards for usability (Chapter 8) that all teams
can re-use.
2. Concept phase:
As you make decisions about your core audience and the genre and platform of
your game, you may find some of the advice in Section 3 helpful. You can also
347
CHAPTER TWENTY-ONE • MATRIX OF ISSUES AND TOOLS
seek out heuristics relevant to your particular genre and collect heuristics from
your team based on their prior knowledge (Chapter 6).
As you start to shape your design, you may want to set fun types to aim for
(Chapter 20) and make note of social psychological effects you know you’ll want
to use and test (Chapter 19).
3. Pre-production:
You may want to use expert evaluation within your team during this phase, to
make sure you stay on track with user experience goals (Chapter 7).
Two game designers have written about how they use prototyping and testing
to keep their games on-track personally—you may want to read these chapters if
you haven’t already (Chapters 17 and 18).
4. Production phase:
In this phase, traditional usability methods as adapted to games are particularly
helpful. The chapter from Microsoft has an excellent overview (Chapter 4) and
Chapter 5 offers advice on using think-aloud user testing.
If you have the time and resources, you might find it useful to try out physi-
ological measures that can confirm emotional responses to your games (Chapters
13 and 14).
5. Post-production phase:
If your game will require post-launch updates and patches, you’ll want to
keep doing some usability and playtesting as needed (overviews in Section II).
Collecting metrics on play once a game is released is also a great way to tune
future releases and games in the same genre (see Chapter 15 for a brief discus-
sion of this strategy).
(Continued)
348
CHAPTER TWENTY-ONE • MATRIX OF ISSUES AND TOOLS
(Continued)
349
CHAPTER TWENTY-ONE • MATRIX OF ISSUES AND TOOLS
Some typical complaints about usability and user research, with responses you can use.
Complaints Responses
But the game is supposed to be hard! But there’s such a thing as being too difficult.
Players will hate your game on a deep personal
level if it makes them feel stupid. Also, there’s
a difference between the challenge that you
intended and an unintended challenge that
makes a game unplayable.
Players will learn to use it! Usability addresses learnability. They may need to
learn, but your game has to help.
Why can’t I just test it with myself? You’re not a typical player. Compared to the
I’m a player, right? typical player, you have much more game
experience and much more knowledge about
your game.
I already know what our players will No, you don’t. Even usability professionals
like and do! frequently get surprised by some of the problems
real users have. You won’t really know your players
until you see some representative players interact
with your game.
(Continued)
350
CHAPTER TWENTY-ONE • MATRIX OF ISSUES AND TOOLS
Complaints Responses
It’s too expensive! Can you afford to make a game that people
won’t buy because they heard it’s unplayable?
Can you afford the returns and customer service
calls when players get totally stuck? The return
on investment (ROI) of usability is typically huge.
And there are ‘discount’ techniques, such as
using heuristics that can help to keep the costs
manageable.
It takes too much time! The process is integrated into the design process,
so it doesn’t take too much extra time. Doing
testing early can save time later in development,
correcting issues before they are too embedded
to change. As several of the authors in this
book point out, an unplayable game is a time-
consuming problem that you don’t want to
have . . .
351
CHAPTER
TWENTY-TWO
Interview with Don
Norman, Principal in
the Nielsen-Norman
Group, and Professor,
Northwestern
University
Interviewer:
Katherine Isbister
353
CHAPTER TWENTY-TWO • INTERVIEW WITH DON NORMAN
Science from the Franklin Institute. He is well known for his books The Design of
Everyday Things and Emotional Design. His latest book, The Design of Future Things,
discusses the role that automation plays in such everyday places as the home, and
automobile. He lives at www.jnd.org.
What about the argument, and I’m sure you’ve heard this in a lot of your
consulting work, and it’s especially acute in the game development industry, that
there simply isn’t time and money to do user testing?
I agree with the arguments: product development is always behind schedule and
over budget. Any company that is proud of its user testing is a company in trou-
ble. Instead, you need game designers and experience designers working together
to make the best possible experience. You want to engage the player. Doing this
right will save money and time. In fact, it does far better than just saving money:
it makes money by increasing sales.
354
CHAPTER TWENTY-TWO • INTERVIEW WITH DON NORMAN
But studying people, observing players, and doing quick tests need not take
much time, nor need it be expensive. If you have a small team of people always
testing, they could be looking and making observations while you’re doing today’s
game that will be applied to tomorrow’s game. That means that when you’re ready
to do tomorrow’s game, you have a team of people who already can offer you
advice.
And the team can be very small, even as small as one. So it doesn’t have to be
a large team. But finding errors and finding problems early on is a great time saver
and money saver. That’s why it’s always a good call to be continually evaluating
what you’re doing.
Any thoughts on the future of games and where user research fits in?
I think the most exciting developments are that we are finally breaking away from
the traditional high-intensity, high-graphics dominated games and exploring a
variety of experience, including music, the arts, virtual experiences, social interac-
tion, and physical activities. Three cheers to Nintendo’s Wii for showing the way!
Some of the stuff I have experienced in today’s research labs will be in the homes
in ten years: really exciting, mind-blowing experiences.
The power comes from imaginative, creative new situations and experiences,
enhanced by inexpensive sensors that let people move around, jump, run, twist
and turn. Add new actuators that can exert force and movement, tactile sensations
(these are all called “haptics”), coupled with surround sound and visual displays
that take an entire wall or even, as in some cases, all eight walls and the possibilities
are endless.
New sensors allow you to tell just where the person is looking. How they’re stand-
ing, where they’re facing. If there are several people, then how they’re interacting.
I also see a huge potential with games in the educational field. Games are won-
derful learning opportunities. To play a game with skill requires a tremendous
amount of study and practice and learning. What an opportunity to transform this
from the artificial world to games to the real world! How much that might be valu-
able for us in the real world!
I think the real opportunities are with non-traditional game players, and so we
need to go out and study them. For example, why do so many people not play
games? I suspect that they could actually be interested in the correct type of game.
So we have to look at them and try to understand what is it they would do if it were
available. There is a field that we call ethnographic observation which is used heav-
ily in product design and that’s what we need to do. We need to invest time watch-
ing more non-gamers to understand what will appeal to them.
355
CHAPTER
TWENTY-THREE
“Gamenics” and its
Potential—Interview
with Akihiro Saitο–*,
Professor, Ritsumeikan
University, College
of Image Arts and
Sciences; Director,
Bmat Japan
Interviewer: Kenji Ono, game
journalist, IGDA, Japan
*The contents of this article are based solely in the following interview and do not represent the official
opinions of the companies mentioned herein.
357
–
CHAPTER TWENTY-THREE • INTERVIEW WITH AKIHIRO SAITO
● Involved in animation design since middle school, and worked for both Suntory
Limited and LaForet as a CM director while attending Tama Art University.
● Involved in the development of several games from the dawn of the Nintendo
Entertainment System (NES). Established his own game studio, DICE Co. Ltd., in
1991. First started working as a game designer for Nintendo, and has since devel-
oped game software for several other companies.
● He became a professor at Ritsumeikan University in 2005. In addition to his
research and lectures that advocate “gamenics theory,” he continues to make
advances in the development of sensibility reasoning AI technology for the infor-
mation age.
There are two mottos for Japanese videogame development: “young and old alike,
anyone can start playing a game without reading the instruction manual” and “play-
ers will continue to improve their skills while they are absorbed in a game.” This
phenomenon is particular to videogames and is often lacking in other home elec-
tronics or Internet services.
Especially well-designed videogame user interfaces (UI) made this possible.
Players will not get absorbed in a game just because it is a game. Instead, it is the
massive amount of know-how built into a game that enables players to become
“totally absorbed” during a gaming session. The reason both the Nintendo DS and
Wii are such explosive hits is due not only to their innovative input devices, but
also the know-how represented primarily by their user interfaces that allows players
to get absorbed in a game.
The know-how that goes into the development of videogame user interfaces is
potentially applicable to a wide variety of commercial fields. Professor Akihiro
– of Ritsumeikan University, a former game designer, has codified this practice as
Saito
“gamenics,” and is currently applying the principles of his theory to the development
– about gamenics and its potential.
of actual products. I recently asked Professor Saito
358
23.1 WHAT IS “GAMENICS”?
To what extent has gamenics theory penetrated the Japanese game market?
Gamenics isn’t actually a word in common use yet, but you can see game develop-
ment based on its principles at a lot of companies. Nintendo is respected for this
more than any other company, because I codified the word based on the know-how
I learned while developing games for their systems.
I have a Toshiba Digital High-Vision TV and an HD DVD recorder at home, and
they are both hooked up with HDMI cable. But the remote controls are difficult to
use, and they have a lot of functions that I don’t know how to use. I’ve read the
instruction manual for them a number of times, but I can’t make any sense of the
directions. I’m actually quite an electronics nerd, and can figure out how to work
just about anything. But the UIs for these controls are not well thought-out, and
most digital home electronics are falling into the same ruts because of the “func-
tions” war between competing companies.
This is unacceptable for videogames. Most people start playing a new game with-
out even reading the instruction book. They get sucked into the world of a game
and ultimately finish it after passing a number of challenges. This doesn’t happen
just because they are playing a game, but rather because a game designer has incor-
porated all sorts of workings into it. Gamenics is the engineering that makes it pos-
sible for anyone “to use a product without reading the manual” and “internalize
every function without even know it.”
359
–
CHAPTER TWENTY-THREE • INTERVIEW WITH AKIHIRO SAITO
a wide array of impediments, and they experience pleasure when they overcome
them. That’s the basic repetitive structure of games. Game design lays down the
rules for this cyclical structure, and must not make a player feel stress outside of
this cycle.
If a player feels stress before they even start playing a game because the game’s
operability is no good or they don’t know what they are supposed to do, then the
player won’t be able to endure the stress the game designer has built into the game.
That’s why gamers are better served if they are not aware of the controller, which
should exist in their hands as much as possible like air. Gamenics is the science
devoted to making this sort of thing possible.
In the sense that information is displayed and managed with touch screen sen-
sors, the DS is similar to ATMs and ticket dispensers, but continual operation of an
ATM would be a real pain. Once a player is sucked into a game without their know-
ing it, they can continue playing for a long time.
A new input device is not enough to keep a player playing a game. Players might
be interested at first, but they will get bored very soon.
UIs are generally those “operational procedures” found in device and screen
designs, but gamenics sounds like it’s applicable to a broader industrial field.
That’s right. Just being able to play a game right away is not enough. A player must
be able to play a game for long periods of time. The president of Nintendo, Satoru
Iwata, often says that their ideal game is “broad in scope, and deeply structured,”
which is emblematic of gamenics theory. High-quality game design is not enough.
Good UI design rooted in gamenics theory is also necessary.
360
–’S CAREER
23.2 GAMENICS AND PROFESSOR SAITO
Nintendo DS(upper) and Wii(lower). When I analyze the success factor of two game
consoles, I understand that not only the change of the input device but also the know-how
based on gamenics was important.
361
–
CHAPTER TWENTY-THREE • INTERVIEW WITH AKIHIRO SAITO
mid-1970s the anime industry was still in its infancy and somewhat open. That
seems so unimaginable now.
Why did you move from commercial film production to game production?
Primarily because of Mr. Iwata. When we were working together he was always so
upbeat and good at praising others, which is a major reason why Iwata later became
the president of Nintendo. Second, the fields of anime and CF visual production
have a long history, and as a result there is something like an apprentice system
already in place. Film making methodology is already established, and it isn’t an
environment in which younger artists can make unique projects. However, the vid-
eogame industry was very open when the NES was the major console, which is
characteristic of media still in an early stage.
Why did you decide to do academic work focusing on gamenics theory rather than
continue with game development?
I actually didn’t initially enter the videogame industry because I liked playing and
making games. I started making games because of Iwata’s influence and the appeal
of the open-minded environment associated with games. That’s why I became
362
–’S CAREER
23.2 GAMENICS AND PROFESSOR SAITO
aware of the limits of being a game creator relatively early into my career. Games
like Mario Kart and Super Smash Bros. that enjoy worldwide success could never
have been made by me, because I cannot fathom what makes them so interesting to
others. I’m very different from other game creators in this way.
Actually, the games that I made received very high marks from Super Mario Club.
Itoi Shigesato no Bass Tsuri No. 1, Definitive Edition, for example, once received the
highest class game rating from Super Mario Club. But it didn’t break any records in
terms of sales.
Super Mario Club is the organization that conducts debugging tests and quality
assurance for Nintendo, right?
That’s right. I don’t know how they handle things overseas, but at Nintendo Japan
a game used to be evaluated extensively at Super Mario Club during the final stages
of development by letting numerous testers play the game from the perspective of
a broad demographic. If a game didn’t receive above a certain score, it wasn’t sold
as a Nintendo game. Some of the things they took into special consideration when
assigning their evaluations were: “Can anyone play the game without reading the
manual?” and “Can anyone get absorbed in the game for long periods of time while
they are playing?” They subjected games to very rigorous tests, and then sent them
back with all sorts of comments. It was also possible to have the review checked by
a third party if desired.
Those last two standards are the same as the objectives of gamenics.
True. Again, I’m not sure how they develop games in Europe and the United States,
but UIs for Japanese games have been made comparatively well for quite some time
now. One reason for this is that Nintendo’s principles for making games with a play-
er’s intentions in mind gradually spread to third parties through Super Mario Club.
Like I said before, gamenics theory is based on my organized understandings of the
know-how I acquired when I developed games for Nintendo.
So even though my games were especially well made in terms of gamenics, they
weren’t hits, perhaps because the game design ingenuity was insufficient. It was
at about that time when I began to see myself as someone who develops gamenics
products and services rather than as a game creator, which is why I broke off from
Nintendo. Then again, if I had stayed at Nintendo, I might have had a chance to
work on touch generation DS games and Wii Fit.
363
–
CHAPTER TWENTY-THREE • INTERVIEW WITH AKIHIRO SAITO
These two elements were especially important to NES, since it was originally
designed for kids, right?
Exactly. And gamenics also has four principles, each of which are further subdivided.
The Four Principles of Gamenics:
1. Intuitive user interface (emphasizing ease of use)
2. Operability that does not require a manual (designed so that users are not con-
fused about what they can do)
3. Engaging choreography, gradual learning curve (devices that promote enthusiasm)
4. Beyond gaming (links things to reality so they seem real)
(Editors’ note: It may be helpful to readers to think of the gamenics principles as
a class of heuristics for good game user interface design. For more information about
heuristics, see Chapter IIC.)
Move the gun battery left and right with the joystick and attack the invaders by
shooting missiles with the button, right?
That’s right. A player knew how to play the game by just looking at the screen and
control panel. It was this relationship between the screen and the controller that made
Space Invaders such a skillfully created game. And it’s important to remember that this
was an arcade game. No one would have kept playing the game if they inserted a coin
only to die without figuring out how to play. This is a good example of the first princi-
ple, “intuitive UI.” The same point could be made for the world’s first videogame, Pong!
364
23.3 THE TWO OBJECTIVES AND FOUR PRINCIPLES OF GAMENICS
The second principle is “operability that does not require a manual.” In short,
these are mechanics that “ensure users will not be confused about what they can
do.” You could say that this is about workings that allow a user to understand intui-
tively not only what is on the screen, but also the game rules and systems built into
what is displayed. Remember how Super Mario Bros. (Nintendo) starts?
©1985 Nintendo
Super Mario Bros. (1985, Nintendo). The first sequence of Super Mario functions as the
main tutorial.
Sure. A couple of goombas start walking toward Mario from the right side of the
screen, which you have to avoid, and Mario hits a block from below to get the Super
Mushroom and other power-ups.
Right. This opening scene is a very well made tutorial for the entire Super Mario
Bros. game. The game’s basic actions are summarized in the following three actions,
all of which can be experienced in this scene:
1. Move toward the right through the levels to reach the goal within a certain time
limit while evading the enemy characters,
365
CHAPTER TWENTY-THREE • INTERVIEW WITH AKIHIRO SAITO
–
The fact that these fundamental actions can be learned while playing the game
for just 30 to 60 seconds is one of the strong points of Super Mario Bros., and is a
good example of the second principle.
The third principle has two elements, “engaging choreography” and “gradual learn-
ing curve,” right?
Yes. The two are interrelated and difficult to separate. By “engaging choreography”
I mean the attempt to hold a player’s attention by using choreographed animations
and sound effects in timely manner and devices to get players to feel a sense of
accomplishment when they collect items and level up. By “learning curve” I mean
techniques that encourage a player to push on until they achieve their goal by grad-
ually raising the difficulty level or having them discover new things on their own.
366
23.3 THE TWO OBJECTIVES AND FOUR PRINCIPLES OF GAMENICS
the intermediate spell “behoimi,” and the strongest spell “behoma.” As a result a
player can infer a spell’s effect by the name alone when their character acquires it.
The learning curve is also very well designed for the objectives of this game.
As soon as a player starts Dragon Warrior 1, for example, the final objective, the
“Dragonlord’s castle,” is displayed on the screen. But the Dragonlord’s castle is pro-
tected by a channel of water and you can’t get there until your character travels
around the world. So as soon as you start the game, the end goal, the “main objec-
tive of the game, is clear.
Also, in order to overcome the Dragonlord, a player’s character must bring down
lower-level enemies, level up, gather money and get well equipped. These are the
“minor objectives” of the game. And there are secret underground passages, pow-
erful items, and various puzzles and devices placed throughout the world. These
are “intermediate objectives.” Among all of these, the “major” and “minor” objec-
tives are necessary for completing the game, while the “intermediate objectives”
are those that the player can choose on their own. It is for these reasons that a
player will feel as if they’re always moving toward a self-imposed objective as they
progress through the game.
367
–
CHAPTER TWENTY-THREE • INTERVIEW WITH AKIHIRO SAITO
© 2007 NBGI
Pro Yakyu- Family Stadium (1986, Namco Bandai Games, Inc.) exaggerated the elements of
real professional baseball well and succeeded in evoking a quality of realism more than
previous baseball games (image shows Pro Baseball Netsu Stadium 2007 (PS2) in “Family
Stadium” mode).
That was one of the best-known games sold for the NES in Japan, right?
Yes. The rules of the game are based on those of real baseball. However, the graph-
ics and movements were simplified due to the limitations of the NES. The great-
est difference is with the game length. Rather than take more than three hours like
a real baseball game, a session of Pro Yakyu– Family Stadium ends in about 20 to
30 minutes. No one got bored with the game and it went on to become a big hit,
because it condenses and enhances the appeal of real baseball. The same can be
said for soccer and golf games that are well made.
Just making them look real doesn’t mean they will have wide appeal.
Exactly. Players don’t want videogames that will feel as real as reality. The impor-
tant point is to choreograph an “extra-real” experience by abstracting and enhanc-
ing reality well.
368
23.4 ENHANCING BUTTON REALIBILITY
and a “mouse” because suitable conditions change depending on the device. I based
this on my own experiences making games, so it is nothing more than a hypothesis
at this point, but I plan to move on to experimental studies soon.
Right. And operational conventions such as navigating menus with the direc-
tional keypad, selecting with the A button, and cancelling with the B button came
into use for the first time with Dragon Warrior 1. This became the de-facto standard
in Japan as games got more complicated, and for the first time players could start
playing games right away without getting confused about the operations.
And this has remained the case even as the number of controller buttons
increases. It’s especially important that the B button remain the cancel button. In
Japan the 䊊 button on the PlayStation is always for “select” and the button is
always for “cancel.” It’s the opposite overseas, but their functions are fixed as well.
This is normal for videogames, but surprisingly there are a lot of commercial elec-
tronics that don’t do this.
369
–
CHAPTER TWENTY-THREE • INTERVIEW WITH AKIHIRO SAITO
© 1985 Nintendo
In Super Mario, pressing the A button performs the main action, and pressing the B button
performs the assistant action. This allows us to perform complicated actions such as
swimming (top), or jumping (bottom) intuitively.
370
23.4 ENHANCING BUTTON REALIBILITY
increase the number of buttons for other functions. That’s why hierarchical menus
are utilized, but there are still too many different buttons for “cancel” depending
on the mode. There are too many buttons on DVD recorder remotes and the button
for canceling is not fixed, so they are impossible to operate without looking at them
like you can with game controllers.
The iPod, on the other hand, is an example of good operability. The design of the
iPod controller is very similar to the NES. The basic operation of the current iPod is
that you can move up and down through a menu with the touch wheel, the right
button (fast forward) opens hierarchical menus, the left button (rewind) backs out
of hierarchical menus, and the down button (play/pause) plays music. The hierar-
chical menu screen design and button functions are exactly the same as the method-
ology for the NES. The sensation of operating the touch wheel as it emits pleasant
sound effects is similar to the pleasure associated with using a game console, and is
enough to make the user feel happy.
The combination of the touch wheel and the hierarchical menu of the iPod is very simi-
lar to the screen operation with the cross button and A, B buttons of the NES (Nintendo
Entertainment System).
371
CHAPTER TWENTY-THREE • INTERVIEW WITH AKIHIRO SAITO
–
with directional pad, select with the A button, and cancel with the B button”
method.
How would you analyze the iPhone and iTouch UIs using gamenics theory?
The iPhone isn’t for sale in Japan, but I have an iTouch and love it. It has pho-
tos and movies of my kids, and I can take them with me wherever I go. In terms
of the first principle, “intuitive UI,” it’s excellent. And the device and software are
integrated very well. I showed my iTouch to my mother-in-law once, and I was so
surprised when she started browsing through the albums loaded on it without any
explanations about how to use it. And she’s seventy-eight years old! It employs ani-
mation and is fun to play with.
Using the touch screen of the iPhone is very comfortable. It is because the device and
software are integrated very well just as in a videogame.
372
23.4 ENHANCING BUTTON REALIBILITY
technology originally designed for PlayStation game development and is now used
for notebook computers, DVD recorders, and Digital High-Vision TVs. Of course,
it’s also used for the PS3. That’s why it’s easy to link a DVD recorder to a PSP and
transfer recorded programs from one device to the other. I think people who are
accustomed to how the PlayStation operates can just look at the screen and know
what to do.
The UI technology of the Play Station is converted into the XMB (Cross Media Bar) of the
Sony product (picture shows Aplicast on Sony’s High-Vision TV Bravia).
Multitasking has also become very common. Due to our “multi-gadget” life-
styles, whereby people use multiple digital gadgets in combination, it’s necessary
that basic operations be the same. I think that’s why the XMB occupies such a
favorable position. Apple, in the same way I think, want the iPhone or iTouch to be
able to control the information of all digital components.
373
–
CHAPTER TWENTY-THREE • INTERVIEW WITH AKIHIRO SAITO
Polyphony Digital (developers of Gran Turismo) developed the multi-function meter of the
new model GT-R jointly with Nissan.
374
23.4 ENHANCING BUTTON REALIBILITY
It’s been said that one of the reasons it’s difficult for the gaming industry to collabo-
rate with other fields is that the cultures are so different. Taking this into considera-
tion, what would be required in order to actually develop a gamenics product?
The main cultural differences between game and home electronics development is
that one builds things by trial and error while the other does so according to cer-
tain specifications. It’s often said that home electronics engineers have to build things
according to certain specifications, and to make an enhancement they have to start
with the specifications. That’s its culture. It would be impossible to make a gamenics
product in this way for budgetary reasons. It’s necessary to understand this difference.
And it’s important to develop a good balance between hardware and software
for a gamenics product. The Nintendo game console is a good example of this. One
might say that software exercises some control over the development of hardware.
So it’s important to have a systems integration programmer who can understand
gamenics sense.
375
–
CHAPTER TWENTY-THREE • INTERVIEW WITH AKIHIRO SAITO
playing games, they won’t react negatively to gamenics and they will understand
how it works perceptually.
Mobile phones are a typical example of products that have multiple functions. In
addition to the communication functions, Japanese mobile phones also include an
Internet browser, camera, television reception, virtual money economic function, play
music, and increasing function more as an information handset.
That’s true. For those who have one, the monitor on their mobile phone is the “gate-
way to the world.” Mobile phone monitors must have a display that is as exciting
376
23.6 JAPANESE CULTURE AND GAMENICS
as that of a game, but they don’t. Makers are mistakenly focusing on multifunction-
ality instead of whether or not a phone is “exciting to operate.” It should be more
like what Iwata and Miyamoto at Nintendo often say: “the user is king.” I want
other companies to learn this from Nintendo.
We’re entering an era when every house will have a home server, and people
will use a network with multiple digital home electronics. With a television hooked
to the internet, you can search for resort information and send maps to the car navi-
gator before heading out. But if each operation system is not integrated at this time,
things won’t work together smoothly. The operation network of home electronics
must be integrated into a single system for gamenics theory to be applicable.
The field of medicine is also adopting game technology. It is well known that the
home version of Konami’s dancing game Dance Dance Revolution is being used at
West Virginia State University to prevent obesity in children. The Wii Fit is a huge
hit in Japan, and it will be an even bigger hit once it’s released overseas. There is
plenty of potential for health products that utilize gamenics theory.
Space Invaders (Taito), right? It wasn’t just a huge hit in Japan, though. It was later
reworked for the Atari 2600 and went on to be a greatly influential game.
That’s true. What’s so special about this game is that it was the first Japanese
arcade game with a platform loaded with a CPU. As a result, it was different from
other games that were designed with TTL circuits, and it was easy to fine tune the
game by improving the programming. It came at a time when more attention was
being paid to software rather than hardware and it was possible to do things from
trial and error, which is when Japanese game design became so cutting edge.
I mentioned this at the beginning of the interview, but the essence of what makes
a game interesting is a cyclical fluctuation between stress and pleasure. In order to
do this, we need to deal with human mentality and have a fine sensibility. But that’s
not all. In order for pleasure to be conveyed to a player appropriately, a UI must be
thought out in detail. And in order to be realized on a system with low powered
hardware like the NES, it is necessary to place more emphasis on software develop-
ment. By predicting player psychology, it’s possible to hide help information and to
make button operation more fun, add choreographed animations and special effects.
377
–
CHAPTER TWENTY-THREE • INTERVIEW WITH AKIHIRO SAITO
It’s for that very reason that the NES was such a huge hit all over the world, right?
Exactly. A basic way to think about gamenics theory is that it’s about making soft-
ware so that players will be able to predict what they are supposed to do without
feeling any stress about performing the action. This idea is closely related to tra-
ditional Japanese hospitality that is evident in the tea and flower arranging arts.
In Japan, the birthplace of this hospitality culture is Kyoto. Nintendo was born
in Kyoto, and the company takes great pride in this culture. I think that’s why
Nintendo was able to make the NES the way it is.
On the other hand, there are a lot of games made in Europe and the United States
that have great user interfaces. Games like Gears of War and God of War have
enjoyed high reviews from Japanese game developers. I feel like they have analyzed
the good points of Japanese game UIs and applied them well to their own games.
That’s because gamenics know-how, which was developed in Japan, has started to
spread throughout the world. Super Mario Bros. and Pokémon are major world hits,
meaning that user interfaces based on gamenics theory have spread throughout the
world.
© 1985 Nintendo
The know-how of Gamenics from Japan spreads throughout the world as Japanese
videogames evolve.
Gamenics theory will expand products and services in the global market as it is
applied to more fields outside of the gaming industry, and what’s important will
378
23.6 JAPANESE CULTURE AND GAMENICS
be revealed in the process. The gaming era will continue to influence global con-
sumption for years to come. The key for game development know-how to transcend
nation and culture is for game developers to do what they think is fun. This shows
that the application of gamenics to more fields outside of videogames is important
for the development of products and services in the world market. The gaming gen-
eration will dominate global consumption for years to come, and I am sure that
game developers would be very happy if game development know-how turns out to
be the key to transcending nation and culture
379
Index
381
INDEX
382
INDEX
383
INDEX
Goals, 106
Gold, Rich, 123
I
IBI. See Interval between consecutive heartbeats
Goldeneye, 39
(IBI)
Grand Theft Auto (GTA), 328
IEEE. See Institute for Electrical and Electronics
Gran Turismo game, 373
Engineers (IEEE)
Group composition, for testing social/party
Important problems, 101
games, 49–50
Inokon, Angel, 162–163
Group size, for testing social/party games,
Insaniqarium, 155
47–49
Institute for Electrical and Electronics Engineers
Group testing, 47
(IEEE), 268
GSR. See Galvanic skin response (GSR)
Instrumentation, 260
GTA. See Grand theft auto (GTA)
Interface design standards, 113–114
Guitar Hero, 151
International Game Developers Association, 309
Gunn, Daniel V., 35–36
Internet users and casual games, 147
Interval between consecutive heartbeats (IBI),
H 215–216
HAL Laboratory, 362 Interviewing, 72–75
Halo, 2, 247–251 limitations, 74–75
Hardcore casual games, 156–158 preparing an, 73–74
Hardcore game play, 145–146 IPod, 371
Hard Fun, 324 Itoi Shigesato no Bass Tsuri No., 1, 362, 363
emotions, 324–325 Iwata, Satoru, 362
mechanics, 325–326
workflow, 326–328
Harrison, Chuck, 36 J
HD DVD recorder, 359 Japanese culture, gamenics, 377–379
HDMI cable, 359 Japanese market, 159–164
Heart rate (HR), 208, 215–216 Jewel Quest, 156
Heart rate variability (HRV), 211, 215–216 JumpingFlash!, 171–172
Heat Maps use, level design, 130–131
Help provision, 104
HEP (Heuristic Evaluation for Playability),
K
Kana characters, 174–175
81–82 Kansai University of International Studies, 376
Heuristic Evaluation for Playability (HEP), Karaoke games
81–82 group size, 49
Heuristics, 79–80 Karaoke games, 46
advantages of, 83 Kills-to-purchase ratio, social/party games, 54
disadvantages of, 84 Komago, Kohei, 376
before game, 80–81
gameplay, 105–107
in games, 81–83 L
implementation, 84–87 LaForet, 358
for mobile phone games, 82 Laitinen, Sauli, 91
from research, 83 Lazzaro, Nicole, 128–130, 317
specific to the platform and game type, 108 Left mouse button
usability, 102–105 casual games and, 148–150
High definition (HD), 180 Lego Mindstorms community bulletin boards,
High-scoring games, themes from, 122–123 213
Hoonhout, Henriette (Jettie) C.M., 65–66 Level design, 121
HR. See Heart rate (HR) bio-sensors, use in, 133–135
HRV. See Heart rate variability (HRV) Heat Maps use, 130–131
Hyperventilation, 218 Time Spent reports, use in, 131–133
384
INDEX
385
INDEX
386
INDEX
387
INDEX
W
Ways to Die, 126–127
Weapons purchased, social/party games, 54
388