6.4 Usability Testing
6.4 Usability Testing
usability testing, which is perhaps the first thing that comes to mind
when people talk about usability evaluation. And certainly, it is perhaps
the most common, although it's not the quickest and the easiest or the
cheapest method of evaluating usability. And it isn't necessarily always
the most appropriate for the reasons that we've talked about elsewhere,
2 00:00:31 --> 00:01:00and I will also repeat to an extent in the coming
slides. Notice, by the way, that we're calling it "usability testing" and not
"user testing". The thing about user testing is that it implies that it's the
users being tested, which is certainly *not* the case. So, if you wanted to
make it absolutely explicit, you would say, "usability testing with users".
But that's a little bit long-winded. This is a fairly typical simple lab setup.
3 00:01:00 --> 00:01:38It's quite an old system – or quite an old
photograph, I should say. But the basic concepts are the same, even
today. We have a simple table with a chair for the user, a chair for the
moderator, a display, a camera – which, these days, would just be a
webcam on top of the display or we might even have a laptop with a built-
in webcam – and a viewing room, behind, where you could have part
of the design team or the development team watching first-hand what's
going on in the next room.
4 00:01:38 --> 00:02:04So, that certainly makes the whole thing much more
engaging from the team's perspective. This is a much more elaborate
setup for presumably something like a financial system, where we've
got the user looking at about four different displays simultaneously, and
obviously quite a much more elaborate viewing room as part of that as
well.
5 00:02:04 --> 00:02:31These days, you really don't need a lab
unless you're trying to address particular issues with viewing, because we
can do all of this with special software. This is an example from
TechSmith; their product called Morae. There are others
around, obviously, and there are others which are noticeably cheaper than
Morae.
6 00:02:31 --> 00:03:03Morae is quite an expensive solution, but it does
work moderately well in my experience. But one of the things that Morae
allows you to do is to share the videos from your sessions live with the
rest of the team. So, you don't need a bespoke lab. You could just use a
meeting room and beam the sessions out to a team room somewhere else
for the developers or the customer to watch.
7 00:03:03 --> 00:03:30So, most usability testing focuses on difficulties
users are having in achieving their goals, but metrics themselves can also
be important. So, you've got things that you can measure. Clearly, one of
those is the task success rate – whether people are actually getting
through the task at all typically within a reasonable time. And if you don't
have any ideas really about *reasonable time*,
8 00:03:30 --> 00:04:04one of the most frequent measures that I've
certainly used over the years is about *90 seconds*. If something takes
longer than 90 seconds, chances are the user is actually not going to
persevere unless it's a matter of life and death or their job, which is
sometimes the case. Actually measuring the time that they take is also
helpful, although it's very hard to compare task time across users. So, you
can give one user a variety of tasks and look at the relative times there,
but you can't compare task times between users, because some users are
faster than others.
9 00:04:04 --> 00:04:30And then, also we can look at the number of errors,
and the definition of an error tends to be fairly simple – that if somebody
has to navigate backwards or redo something or undo something, then
that is an error. Certainly in terms of web-based solutions, backward
navigation is an immediate indication of error. Elsewhere, you might
consider that having to cancel a dialogue,
10 00:04:30 --> 00:05:02which is effectively the same as
navigating backwards, might count as an error. We do tend to still focus
on 5–7 users per community, which is from a paper that Jakob Nielsen did
years ago based on another paper, I think by Tom Tullis, where they were
suggesting that you started to get much reduced return, that you stopped
finding new problems after around about five, six, seven users.
11 00:05:02 --> 00:05:34You can certainly measure this yourself just by
looking or counting up the number of novel issues that you get. And if
you're starting to see a lot of the same issues a lot of the time with
subsequent users, then you probably have reached saturation point. But
there are often unexpected surprises. Sometimes you get the last
participant on two days of testing and they're doing things in a completely
different way and pointing out to you entirely new problems
12 00:05:34 --> 00:06:04that you never realized were there because other
people were just doing it differently. Sessions tend to be typically between
60 and 90 minutes. I've often done testing with much shorter sessions:
40–45 minutes. Over 90, you've got some serious problems with tiredness,
stress, all kinds of things to do with just being asked to do a lot of things
for
13 00:06:04 --> 00:06:35a fairly concentrated and intense period whilst
you're being videoed and whilst you're perhaps not feeling terribly
comfortable with the setup, especially if you're not a very experienced
user of the technology in question. So, we try to keep sessions short,
especially if we think users might be stressed by them. Sometimes,
you're trying to find out or assess whether you're dealing with
experienced users or
14 00:06:35 --> 00:07:01novice users. And this is something that was talked
about in a paper a long time ago. And it's now pretty much standard
practice. It used to be in the early days of the web that we would just ask
people how long they'd been using the web for. And that doesn't work
anymore – you can't do that. So, we instead ask them really how
sophisticated are they in their use of the web.
15 00:07:01 --> 00:07:30So, do they bank online? Do they play games
online? Do they shop online? Do they participate in chats? Do
they participate in social media? – and so on. And you could go now as far
as saying things like "Have you bought a house online?" or "Have you
bought cars online?" or "Do you do all your grocery shopping online?". So,
those are fair measures; so, just trying to find out how sophisticated
users are isn't as simple as it used to be.
16 00:07:30 --> 00:08:02But these are the kinds of questions that you'd want
to include. When you're preparing for usability testing, obviously you need
to have some tasks in mind. Some people, especially when they're doing
research, have very general tasks. A friend of mine – Jared Spool at User
Interface Engineering – used to invite people in to just spend some money
on something that they wanted, and he would watch how they got on.
When you're doing testing, though, as part of a project,
17 00:08:02 --> 00:08:32then you typically have some specific issues that
you want to address. You may have some designs that you're not entirely
comfortable with or that you have a feeling may be causing a problem.
So, you agree the tasks with your team. You need to be specific. You need
to actually make sure that users understand what it is you want them to
do, unless you're deliberately trying to find out how they feel about the
product or service in general terms.
18 00:08:32 --> 00:09:00But typically we will ask them something very
specific like "Go and buy this thing and have it delivered or arrange for
collection." or "Go and order this stuff." or "Start the process of
conveyancing a house." and giving them details. Don't expect users, by
the way, to give
19 00:09:00 --> 00:09:30over too much in the way of personal information
unless you can give them assurances that it will not go outside the room.
And, even then, you would be better off giving them dummy data unless
it's really important to you that they feel more in tune with the data that
they're using. So, if they're using their own data, they feel more attached
to it, but there are obviously concerns with information privacy in that
respect. Start with easy tasks to build confidence.
20 00:09:30 --> 00:10:01You certainly don't want to have somebody come in,
sit down and immediately given a very challenging problem; so, we tend
to try to build up to more complicated problems. I know that that isn't,
strictly speaking, how one would do things in a research setting, because
by giving them that early experience you're making it slightly easier,
presumably, for them to do the more complicated tasks. But we don't
want to alienate our participants.
21 00:10:01 --> 00:10:31We want them to feel good about the process; we
want them to feel positive about the process. And that's particularly true if
you're actually going to try to get some post-test views from them. If
you've given them difficult tasks in a difficult order, then they're not going
to be feeling all that happy about the entire process and possibly even
about your product or service. Make sure that they understand what
successful completion looks like –
22 00:10:31 --> 00:11:03so, be clear in what you've asked them to do, and
typically if it's something like ordering, then you might tell them that they
should go as far as paying for it. If you've got dummy data and a system
that would allow you to, you could even allow them to go through the
checkout process, as an example. And set a time limit for task success,
but you don't actually advertise this to users. Nor do you actually strictly
enforce it – if somebody's in the middle of doing something and you've
gone beyond the
23 00:11:03 --> 00:11:30one minute or 90 seconds that you set as the time
limit, then you just let them carry on, if *reasonable*. If they're kind of
struggling or they're getting really upset because they can't do it,
then just let them off the hook *lightly*. Don't say, "Time's up!" and move
on to the next one! We want these to be happy people when they leave
us. *Do no harm* is really the target that we're looking for there.
24 00:11:30 --> 00:12:03And make sure – as far as the tasks themselves are
concerned – that you've used *natural language*, that they will
understand this task from their perspective, that you're not deliberately
using terminology that's been employed in the user interface if that isn't
the *natural language* that you would expect to hear from users. *Don't
lead users* by – like I mentioned – using terms that appear directly on the
menus or screens. You may as well, by the way, not do usability testing if
you're going to base it on what's on the screen.
25 00:12:03 --> 00:12:30You should be able to say something; you should be
able to give users a goal in fairly natural language and have them
translate that into the screen terminology. I'm not saying that you should
actually deliberately make it obscure language, but just make sure
that it's something that they would be comfortable in talking about or
doing themselves. So, for example, ask users to change the expiry date
on their credit card
26 00:12:30 --> 00:13:04rather than edit payment details. So, that is actually
an excellent example because people think of it that way. You know, "My
credit card expiry date has changed. I want to change the expiry date on
my credit card." And they've got to translate that through the whole
process that we've talked about in many other places by having this goal
– translating that into actions, and looking for those actions in the user
interface. Now, that actually is a little bit more challenging than it ought to
be in many, many places
27 00:13:04 --> 00:13:31because they have to find their payment details and
then they're going to probably have to translate it into a menu item
entitled "Edit payment details". But we've asked them the goal in a fairly
natural way, and that really should be how we prepare the tasks. Strategy
is that you can't test everything, so focus on things which you've
established are critical:
28 00:13:31 --> 00:14:00something that really has to work very smoothly or
very quickly or that you've already established for whatever reason might
be problematic, that you've either got evidence that people are having
problems here. You might have – for an existing site or application you
might have data that suggests that people are failing to proceed from a
particular point in the process, and so you might deliberately set up some
usability testing to find out what's going on – what the problem is.
29 00:14:00 --> 00:14:33Like I mentioned, you use other sources to
*guide the focus*; so, *server logs* – you might have data from your
support desks to say that "We're getting an awful lot of calls about this
particular feature or people having problems with this particular aspect of
the process." You might have data from an outside expert – a heuristic
evaluation – saying that they are not very confident that this particular
part of the solution is going to be easy to use.
30 00:14:33 --> 00:15:04*Don't make the focus too narrow* unless
you've already tested or planned to test at a higher level. You can cover
especially more complicated solutions or products in many, many
different ways; so, don't get carried away with just looking at the fine
detail. You will have to make sure that users can actually navigate the
whole site or the entire product. So, by way of example, if these were
menus,
31 00:15:04 --> 00:15:30then you want to make sure that people can get
down to the bottom left-hand corner, let alone not just whether those
three menus in the left actually work in isolation. So, you need a good
plan for the whole thing. As far as actually conducting usability tests are
concerned, well, it's usually
32 00:15:30 --> 00:16:02recommended to have somebody who looks after
participants, what we call a meeter and greeter, and they would provide a
drink; they would make people comfortable; they *usually* give them the
pre-test questionnaires, the instructions, the *permission forms*. You
really must have a signed permission form if you're going to make
recordings. And that is much more easily done prior to the session
starting. And, of course, they can also give a sheet explaining the test
procedure,
33 00:16:02 --> 00:16:30and that can also be done once the participant is in
the room. And confirm with them that it's not them that's being tested
and that if they have any problems with the product or service that
they're going to be interacting with, that it's something that you want to
know about and report back to the team, rather than something that they
should feel bad or embarrassed about. Let participants work through each
task – encouraging them occasionally to think aloud.
34 00:16:30 --> 00:17:04Some people have users thinking aloud all the time.
I don't actually believe that's a very fruitful process, partly because
thinking aloud has a cost of its own in terms of the effort that people have
to go through to do that. So, I tend to use think aloud occasionally. So, I'll
tell people to let me know when they're having problems. And if I see that
somebody's actually struggling or looking about on the page, then I'll ask
them a question about it, but I try not to interrupt them otherwise.
35 00:17:04 --> 00:17:35Encourage participants to work through the
problems rather than actually telling them what they need to do.
Occasionally, I will tell people what they need to do just so that they don't
get too frustrated at what's going on, particularly if we've already
established that lots of people have had this problem. So, we can just get
them to kind of skip over that, and that makes them feel a bit happier and
also makes the testing go a little bit quicker than it might do otherwise.
And, like I was suggesting earlier, allow tasks to come to a natural
conclusion
36 00:17:35 --> 00:18:00rather than stopping the participant as soon as the
allowed time has passed. I don't think in all the years that I've been doing
usability testing that I've ever told anybody that they should stop because
we've run out of time. I might urge them on to the next task, casually, or I
might ask them to agree that they're not actually
37 00:18:00 --> 00:18:34concluding this task particularly effectively for
reasons nothing to do with them and that perhaps we should move on to
the next one. *Post-test* – well, often we have post-test interviews, and
there might even be a post-test questionnaire. But the post-test interview
is usually around general impressions and if they were given a variety of
different technologies to try out, whether they had any preferences. I
have to say the problem with that is that if you've given somebody more
than about three different things to try,
38 00:18:34 --> 00:19:03– so often done sort of "A, B and C" site tests for the
same process – *more* than about that and people really just aren't going
to remember very effectively which one was the best for them or the one
that they like the most. When the post-test interview is completed, we
return them to the meeter and greeter, and they are given a post-test
questionnaire if there is one, and then they would be provided with the
incentive payment.
39 00:19:03 --> 00:19:32And it's almost unheard of for people to do this out
of the goodness of their heart, I have to say. Often if you're working with a
large organization who has their own customer base, then you can find
that they will come along for minor rewards, but often we're talking about
moderately substantial chunks of money – maybe 50 pounds, dollars or
euros for an average one-hour-long session.
40 00:19:32 --> 00:20:03Then you compile the report, and I say "compile
report" like it's something that's going to only take you half an hour; it is a
time-consuming process. If you've done a day's testing, you can expect to
spend at least one to two days compiling a good report on that testing,
including screenshots and an analysis of how everyone found individual
features and tasks. Bear in mind that not all problems users have are
necessarily usability issues.
41 00:20:03 --> 00:20:30Somebody – just a person tripping up on something:
that can be what we call "a slip". So, I tend not to report something as a
usability issue unless at least a couple of users have made it or if I think
that that particular issue breaks *usability guidelines* – if I think that that
user had problems *because* you've done something that isn't
recommended in usability heuristics.
42 00:20:30 --> 00:21:01Look out for known design issues even if users do
not have trouble during testing. So, that's what I was just referring to. So,
you might have issues that you've come across before. I certainly do
where, for example, a favorite one of mine is *headings that are links* –
users often can't decide whether it's a heading or whether it's a link. And
sometimes they will actually miss it because the underlining – if you're
using it – makes it look even more like a heading and less like a link.
43 00:21:01 --> 00:21:20So, if I see people stumbling in that kind of area,
then I will often report it even if it's just the one person. I might even
report even if I think if no one had the problem, although I will point out
that nobody had that particular issue in the testing. And, of course, we
use screenshots to illustrate problems and solutions.