Stephanne Hess Some Lessons in Stated Choice Survey Design
Stephanne Hess Some Lessons in Stated Choice Survey Design
Stephane Hess
University of Leeds
[email protected]
John M. Rose*
The University of Sydney
[email protected]
Abstract
A growing majority of discrete choice studies are now based on data collected through
stated preference (SP) surveys, primarily in the form of stated choice (SC) questionnaires.
The state-of-the-art in this area has evolved dramatically over recent years, as witnessed in
a burgeoning literature. At the same time however, the state-of-practice has stagnated,
especially in some countries. Additionally, the growing emphasis on theoretical
developments, primarily to do with efficiency, has meant that a number of fundamental
issues are often no longer talked about. In the present paper, we look in detail at the entire
process going from initial survey planning to actual data collection, discuss, often with
examples, a number of common but avoidable mistakes, and provide some guidance for
good practice.
1. Introduction
Over recent years, there has been a hype of activity in the field of experimental design
for stated preference (SP) surveys, leading to a move away from orthogonal design
techniques to efficient design techniques. The advantage of these design techniques in a
practical context is that more robust results can be obtained with smaller sample sizes,
potentially leading to significant financial savings, especially with surveys involving face to
face interviews.
Whilst these developments represent theoretical advancements that are gradually
making their way into applied research, the literature as a whole appears to have largely
focused on these advances to the neglect of more fundamental issues. The design and
implementation of surveys for the collection of choice data is an in-depth process that adds
to the already existing complexities of more traditional questionnaire construction and data
collection. It is to these issues that the present paper returns, specifically dealing with the
basic principles of good practice in the field of survey design.
The topics covered in this paper are survey technique, survey context, choice set design,
experimental design, survey testing, and survey administration. For each topic, we discuss
1
© Association for European Transport and contributors 2009
the basic issues, highlight possible mistakes, often with the help of examples, and provide
some guidance for good practice.
2. Survey technique
The majority of surveys looking at hypothetical scenarios are of the stated choice (SC) type, in
which a respondent is faced with a choice between a finite number of mutually exclusive
alternatives. Figure 1 shows an example of this type of response format for an unlabelled route
choice experiment. It is this type of experiment that the majority of this paper focuses on.
Nevertheless, it should be acknowledged that there exist a number of alternative (or in some cases
complimentary approaches) that we will now touch on briefly. Indeed, the choice of SC should not
be an automatic one, once the analyst has settled on SP methods rather than RP methods.
Together, ranking and rating type response data combine to form a single SP methodology referred
to in some literature as traditional conjoint analysis or simply conjoint analysis. This differs to choice
type responses (which for historical reasons is referred to as choice based conjoint in the same
2
© Association for European Transport and contributors 2009
literature) in a number of important ways. Firstly, the analysis used for such data typically relies on
linear regression models as opposed to non-linear logit or probit type models often employed for
choice data. Although ordered discrete choice models may also be used on such data, the literature
dealing with traditional conjoint methods typically ignore such models in favour of linear regression
models. This is because such literature usually seek to derive individual specific models as opposed
to a model estimating the population ‘average’ parameters. This has proven somewhat controversial
given that linear regression models assume interval or ratio scaled data for the dependent variable,
with debate raging as to whether rating or rankings data meet this criterion. Secondly, the response
metric (and in particular ratings scales) has also proven somewhat controversial from a psychological
perspective with many researchers questioning whether different respondents assign the same
psychological value to the values of the scale (i.e., does a rating of 4 on a 1 to 10 point scale have the
same meaning to 2 different respondents). Discrete choices do not suffer from this issue.
Nevertheless, ratings and rankings tasks offer two significant advantages over discrete choice tasks;
firstly they provide full information on the relative preferences of all alternatives unlike choice which
informs the analyst only what is the most preference option, and secondly, ranking of responses may
allow the analyst to rank explode the data which may provide more observations per respondent
from which to model with.
3
© Association for European Transport and contributors 2009
Figure 4a: Partial ranking best worst response format
Originally proposed by Finn and Louviere (1992), best worst scaling offers yet another alternative
response mechanism for collecting SP type data. Best worst scaling methods differ significantly to
the other response methods in that the response mechanism is not active at the level of the
alternatives, but rather at the level of the attributes (see Figure 5). Rather than present respondents
with a number of alternatives to choose from amongst, the best worst scaling approach presents
respondents with a single alternative and asks them to select the best and worst attribute for that
alternative based on the attribute levels shown. The (log of the) frequency of times a particular pair
of attributes is selected as the best and worst combination is then used as the dependent variable in
a linear regression model to determine the desirability of each attribute and attribute level for
different respondents.
Proponents of best worst scaling point to two significant benefits in its use over more traditional SP
response formats. Firstly, they argue that the traditional pick one choice responses, which represent
the predominant data collection method used to date, are largely inefficient in terms of the amount
of data obtained from the respondent. This criticism, whilst warranted, has been partially addressed
4
© Association for European Transport and contributors 2009
via the other response mechanisms outlined here. The second criticism relates mainly to the ability
of discrete choice type models to untangle the base levels of categorical variables that are dummy
coded from any estimated alternative specific constants (ASCs). Whilst effects or orthogonal coding
overcomes this, there still remains a problem in interpreting the wiliness to pay (WTP) values
obtained for such categorical attributes. Such coding structures allow for a determination of the
WTP values for the non-base levels however these values are estimated relative to the base level,
the WTP value of which is not calculable. For example, consider a categorical attribute ‘comfort’ with
levels, low, medium and high. Assuming this attribute is dummy coded with low as the base level,
then further assuming the experiment contains a cost attribute, the WTP for the for the medium and
high attribute levels can be determined. Unfortunately, these WTP values are relative to the base
low level, the WTP for which is not known. Effects and orthogonal coding unconfound the base
levels from the ASCs; however, interpretation of the WTP outputs remains equally problematic. The
best worst scaling response format allow for the calculation of the WTP for all attribute levels, and
hence is argued as being preferred if an experiment has non numeric attributes (see Marley and
Louviere, 2005).
1
In a case dealing with the valuation of music, the Copyright Tribunal of Australia in 2007 ruled that transfer
price methods were inappropriate and that discrete choice modelling was the preferred method for valuation
studies (PPCA (the Nightclubs Matter) CT2/2004 [2007] ACopy1).
5
© Association for European Transport and contributors 2009
in such direct questioning approaches that is one of the motivations behind using multi-attribute
hypothetical choice scenarios, masking at least up to a degree the true aim of the research. Even
though, in some work, the results from the transfer price exercise seem to be largely consistent with
those from the traditional SC work, the authors of the present paper are not convinced as to the
motivation for retaining transfer price as a tool for future studies.
3. Survey context
In some cases, the topic of a study directly determines the context of the SC surveys, such as for
example in the case of mode choice experiments. In some cases however, most notably in the
context of work looking at valuation of travel time (VTT) measures, the analysis is more results than
context driven, and various possible approaches arise, as discussed in the following subsection. Later
on in this section, we also discuss issues with inappropriate and unrealistic contexts.
6
© Association for European Transport and contributors 2009
Significant effort has also gone into using mode choice experiments in the study of VTT
measures, notably in Switzerland (see e.g., Axhausen et al., 2008). Such experiments can be useful in
producing VTT measures jointly for different modes, while they arguably also have an advantage in
masking the aim of the work. However, mode choice studies often face major issues with mode
allegiance, with many respondents being unwilling to switch mode even in return for large time
savings. This is then however not necessarily a reflection of a high or low VTTS, but simply of high
modal allegiance.
In an attempt to avoid problems with toll road and mode choice studies, VTT studies in many
countries have made use of abstract choice scenarios, presenting respondents with explicit time
money trade-offs. Not only are there potential issues with unrealistic time cost trade-offs, as
addressed in the next section, but such abstract scenarios bear little resemblance to real world
scenarios. This in itself can pose significant problems.
David Hensher, one of the leading advocates for realism in SP design, has put forward the notion
of “experientially meaningful configurations”, i.e., ensuring that respondents are presented with
choices that would be reflective, at least up to a degree, of real life scenarios so as to ensure an
acceptable degree of realism and response quality. This is arguably not the case in such abstract
choice scenarios, and it is not immediately clear whether getting respondents to make such a leap of
faith in completing a SP scenario can be guaranteed to have no influence on results. Crucially, there
is very little evidence on this issue to date, but abstract scenarios continue to be used quite widely in
an applied context.
7
© Association for European Transport and contributors 2009
3.2.2 Example 1: Swissmetro
Swissmetro is a hypothetical underground railway system, using maglev technology and
travelling at speeds of over 400 km/h under the whole of Switzerland, with extensions to other
European cities (see Figure 7). The highly ambitious project is arguably not likely to ever be
completed. Nevertheless, a SC survey was conducted, giving respondents a choice between car, rail
and the Swissmetro (cf. Bierlaire et al., 2001). With a headline figure of Zurich to Berne in 12 minutes
(where a conventional train takes 57 minutes), the advantages of the Swissmetro option are
however so big that it should come as no surprise that Swissmetro was chosen in 58 percent of
choice sets.
8
© Association for European Transport and contributors 2009
and one of the most frequently left out part of many studies, is the use of qualitative research to
refine the questionnaire design.
Consider a bush fire evacuation choice study conducted in Sydney, Australia in 2003. The project
was designed to examine what factors would result in respondents evacuating their residence given
an approaching bush fire. A preliminary examination of the literature resulted in a traditional grid
like choice task design, where respondents were to be asked to select which bushfire they would be
most likely evacuate from given two possible fires. An example of the proposed choice task is shown
in Figure 8.
Fortunately, qualitative research was conducted whereby focus group participants were shown the
above choice task and asked whether it made sense to them and whether they could answer the
question accurately. The focus group participants were unable to understand the task, arguing that
in reality, individuals were unlikely to be faced with having to choose between which of two
different bushfires they would evacuate from. Furthermore, asking questions as to the likely timing
of evacuation, as was proposed, was not realistic given that such decisions are based on quickly
changing circumstances. Finally, focus group participants indicated that the decision to evacuate was
not as simple as choosing to evacuate or not, with many indicating that they may only evacuate
some of the household members, whilst others would remain behind. Given the above as well as
discussions related to the specific attributes and the levels that they could assume, the final version
of the survey used a completely different choice context, as well as relying on graphics and videos
for presentation. An example of the final survey task is shown in Figure 9.
9
© Association for European Transport and contributors 2009
Figure 9: Final bushfire choice task based on qualitative research
4.1
Alternatives and attributes
A choice situation in a SP survey presents a respondent with a fixed number of mutually
exclusive alternatives, each described by a number of attributes. In generating a design for a survey,
the analyst first needs to decide on the number of alternatives and attributes.
Many designs used in applied work still rely on binary choice experiments, i.e., involving only two
alternatives in each choice situation2. Here, there is a major gap between theory and practice, with a
large share of applied work relying on simplistic binary choice sets, while work of a more academic
nature regularly presents respondents with choices involving three or more alternatives, sometimes
up to five or six. The main argument in favour of using binary designs has been that of a reduction in
respondent burden. However, work has not only shown that respondents can adequately deal with a
larger number of alternatives, but that unnecessarily restricting the number of alternatives may in
fact make the surveys too simplistic and transparent, while also bearing little resemblance to real life
scenarios (see e.g., Caussade et al., 2005, who recommend four as the optimal number of
alternatives). Additionally, a case can be made for increasing the number of alternatives on the
grounds that this allows for greater variability in each choice set, increasing data richness while also
reducing the overall sample size requirements. Finally, as discussed in the vehicle choice example
2
In some applied work, the reliance on paper based surveys plays at least a partial role in the use of binary
experiments.
10
© Association for European Transport and contributors 2009
below, it is not just the number of alternatives but also the type of alternatives that is of great
importance.
Many surveys make use of only the most relevant attributes, typically time and cost. Such
simplistic choice scenarios clearly avoid any risk of overburdening respondents, and this, in
conjunction with the use of paper based surveys, was the main motivation for such an approach.
However, simplistic scenarios can also be criticised, primarily on the grounds that they bear little
resemblance to the more complex real life choices undertaken by travellers.
The incorporation of other attributes into the choice situations, such as departure time,
reliability, or different travel time components, may be advantageous for three reasons. Firstly, they
lead to a higher degree of realism, potentially improving response quality. Secondly, they mask the
aim of the study, arguably reducing the risk of political voting. Finally, they obviously allow for the
study of valuations in a broader context. As mentioned above, the main argument against increasing
the complexity of stated choice scenarios is that of respondent burden. However, it has now been
shown conclusively that not only are respondents able to cope with relatively complex scenarios (see
e.g., Caussade et al., 2005; Chintakayala et al., 2009a), but that making choice sets relevant by
including all important information may in fact improve response quality (see e.g., Hensher, 2006).
11
© Association for European Transport and contributors 2009
In selecting randomly sized vehicles to construct each choice task, inevitably, many respondents
who took part in a pilot of the survey instrument were confronted with situations of having to select
vehicles that were priced twice as much as their most recent purchase. The result of this was that
the current vehicle was selected almost 90% of the time as the most preferred vehicle. For the main
field phase, the experimental design was changed so that at least one hypothetical vehicle would be
the same size as the current vehicle, one would be either smaller, the same size or one size larger,
and the other would final third alternative would be randomly selected from any size model. This
resolved the issue, with respondents then trading off between the available alternatives.
Around the same time as the Australian study, a study on vehicle type and fuel type choice was
also carried out in California, and similar problems were noted where respondents were being
presented with the choice between say a small current vehicle, and a very large alternative vehicle
(an example choice task is shown in Figure 11). Similarly, respondents were initially presented with
choices between often very different fuel types. As a result, a weighting approach was used,
ensuring that the more relevant options had a higher probability of being included while however
still guaranteeing that all possible combinations had a non-zero probability.
12
© Association for European Transport and contributors 2009
4.2. Presentation of difficult attributes: the case of variability
While attributes such as travel time or cost are generally easily understood by most respondents,
complications may arise with more abstract attributes, such as for example comfort. In this section,
we focus on one such attribute, namely travel time variability. While some studies are purely
dedicated to the study of the WTP for improvements in reliability, and can hence make use of e.g.
graphical representation, the majority of studies simply want to include travel time variability as one
additional attribute. This however raises the important question of how to present it. An example of
where the attribute might possibly have been better chosen or represented is shown in Figure 13.
Figure 13 presents an example choice task from a toll road study conducted in Sydney Australia in
2004. In the study, travel time reliability was considered to be an important attribute influencing
route choice. The attribute was shown as a ± value around the current travel time, a representation
which proved problematic for two reasons. Firstly, the experiment dealt with the travel times and
costs for a specific trip, whereas travel time variability presented in this way represents an
accumulation over many trips. Secondly, the presentation of the attribute as both a plus and a minus
can be somewhat confusing to respondents as well as the analyst, as it is not certain whether they
are reacting to the possibility of arriving earlier or later to the intended arrival time. In this way the
attribute might be considered somewhat ambiguous, which may certainly explain why it often
produces random parameter estimates with zero mean, but significant standard deviations (see e.g.,
Hess and Rose 2009a or Hensher et al. 2006).
Figure 13: Example of possible poor attribute representation (travel time reliability)
Figure 14 represents a more recent stylisation of travel time reliability from an experiment
conducted in Brisbane Australia in 2008. In it, the travel time reliability attribute is presented as
probabilities or more accurately percentages (as respondents understood the concept of
percentages much better than probabilities) of arriving earlier, on-time or later than expected.
Qualitative research and pilot studies showed that this representation of the attribute was much
more realistic for respondents and far less ambiguous as to its meaning (Li et al. 2009).
13
© Association for European Transport and contributors 2009
Figure 14: Example of a potentially better attribute representation (travel time reliability)
14
© Association for European Transport and contributors 2009
Rose and Hensher (2006) discuss the generation of experiments that adapt to the reported
availability to respondents of different alternatives. The purpose behind such experiments is to
construct SC experiments in which the alternatives present within the choice tasks are respondent
specific, and hence, reflect individual differences in the choice contexts that are likely to exist within
real markets. In this way, alternatives that would never be available to specific individual
respondents are not shown to them, and hence the preference structure that they reveal in
undertaking the survey is much more likely to mirror that which they would exhibit in real markets.
Figures 12a and b show one such adaptive survey where the alternatives shown to respondents was
determined by whether the respondent had access to a car or not for a recent surveyed trip, and the
origin and destination of that trip.
15
© Association for European Transport and contributors 2009
Figure 12b: Example mode choice experiment with 4 alternatives
A further issue with reference point designs which is only now starting to become apparent is
that such designs appear to induce a significant proportion of respondents to exhibit inertia or non
trading behaviour. Whilst such effects may indeed be the norm in many real markets (and hence
suggest that the SP task is somewhat realistic or at the very least, produces realistic behaviour),
inertia or non-trading does cause model estimation problems. If respondents always choose an
alternative irrespective of the attribute levels of that and other alternatives, then unrealistic
parameter estimates may result (e.g., the reference alternative may have a higher travel time, which
if always chosen may produce a positive travel time parameter when modelled). Furthermore,
respondent non-trading between alternatives may not provide any information as to the trade-offs
that respondents are willing to make between the various attributes. If that is the case, then there
might exist a possible trade-off for researchers between making choice experiments more realistic
and making choice experiments that force trade-offs which may be useful in modelling respondents
preferences. This is one of the arguments used by proponents of abstract experiments.
16
© Association for European Transport and contributors 2009
will likely obtain from using that design) is a function of the choice probabilities. Contrary to what
many might believe, the more attribute levels that are used, the more constrained the design will be
in terms of the possible choice probabilities that it can achieve. For this reason, end point designs
(designs with two levels at the extremes of the attribute levels) have often been found to produce
the most statistically efficient results. However, this is also related to the range used for the
attribute levels, as too wide a range may result in completely dominated alternatives. As such, there
are several trade-offs that need to be considered in selecting what and how many attribute levels to
use in a SC study.
17
© Association for European Transport and contributors 2009
such a range of different design types would be understandable if the objectives of these papers
were to explore the impact of different design methodologies, not a single one of these studies was
specifically addressing experimental design issues. As such, despite decades of experience with SP
studies, the disparity of design types employed suggests that the practical implications of using one
design type over another is yet to be recognised within the literature.
18
© Association for European Transport and contributors 2009
preference or utility space and hence generating designs with more choice tasks will produce better
model outcomes. Bliemer et al. (2009) compared the results for the same choice problem using
designs created with either 18 or 108 choice tasks. In that study, it was found that the an efficient
design with specifically chosen alternatives outperformed an orthogonal design with 108 choice
tasks in terms of producing much smaller standard errors. This finding suggests that using more
choice tasks is not necessarily better. What is important is how much information each choice task
provides in terms of the trade-offs respondents are required to make. This also means that analysts
should strive to produce designs that do not contain choice tasks that provide no additional
information (e.g. dominated choices).
19
© Association for European Transport and contributors 2009
because so called main effect and/or interaction effect designs are generated to produce the
smallest possible standard errors for the parameter estimates as well as reduce to zero the
parameter covariances (i.e., they are designed to produce independent parameter estimates of each
effect). As Rose and Bliemer (in press) show, unlike linear models, once the parameters of a discrete
choice model are no longer zero, then the parameter variances and covariances are no longer zero,
with the values of the covariance matrix becoming larger for orthogonal designs as the parameters
move further away from zero. In effect, this suggests that if an experiment has the desired outcome,
that being to estimate non-zero parameter estimates (very few researchers upon suspecting an
attribute will not play an important role in terms of observed choice behaviour will include that
attribute in the experiment), then the more orthogonal the design, the worse the standard errors
will be. As such, generating a design that have zero correlations for the main effects and/or selected
interaction effects does not necessarily mean that such a design will deliver independent parameter
estimates when discrete choice models are estimated upon data collected using the design.
Notwithstanding recent developments, which we return to below, the majority of SP
questionnaires are still based on orthogonal designs. In an orthogonal design, the different columns
in the design are uncorrelated. However, the use of orthogonal designs also poses a number of
complications, primarily to do with dominance. Especially in simplistic designs, a potentially large
number of choice sets will include dominated alternatives. Many studies largely ignore this issue,
and retain such choice situations in the design, not realising that presenting respondents with such
no brainer choices not only adds nothing to our understanding of the choice processes but
potentially also has detrimental effects on response quality. Other studies take a more aggressive
approach, simply removing these problematic choice situations. A problem with this approach is that
it often leads to a loss of orthogonality, and almost invariably also leads to a loss of attribute level
balance. Whilst not always perfect, efficient design techniques will mostly be able to be adapted to
incorporate constraints and be set up to avoiding dominance.
Finally, it should be noted that manual designs, such as the so called Bradley design (see e.g.
AHCG, 1996), and the design proposed by Hess & Adler (2009), also move away from orthogonality
with a view to avoiding dominated choice scenarios and encouraging trading between relevant
attributes.
20
© Association for European Transport and contributors 2009
of the experiment was far more important than the underlying experimental design. In a similar vain,
Bliemer et al. (2009) also empirically examined the impact of blocking, examining the effects of
maintaining equal representation of blocks within a data set versus allowing an uneven sampling
across each block. In that study, they found that statistical differences occur in terms of the standard
errors of the parameter estimates that were found depending on the sampling over blocks that
occur. As such, it is recommended that formal blocking columns be used in SC surveys and that
random assignment of choice tasks be avoided where possible. Sequential blocking is of course
worse still.
Finally, it should be noted that the use of inappropriate blocking approaches will jeopardise the
characteristics of the data. Indeed, analysts sometimes forget that what matters are the qualities of
the data, not of the base design. Even if the underlying design is perfectly orthogonal, using random
blocking will mean that the final data is not, especially with small sample sizes.
6. Survey testing
6.1 Inclusion of consistency checks
It is becoming more popular of late to include so called no brainer choices in surveys, generally in
the form of dominated choices, and to eliminate any respondents failing these tests. While we
recognise the importance of such tests, especially in the case of a departure from current methods,
we feel it is important to mention that such tests have to date often been performed in a potentially
inappropriate manner. As an example, the recent Danish VTT study included a dominated choice as
the sixth (out of nine) choice task. The problem with this approach is that, very much in the same
way as retaining dominated choices in standard orthogonal designs, the presence of this choice
scenario may lead to respondents not taking the remainder of the survey seriously. Work by Hess et
al. (2009) shows some evidence of different behaviour before and after this choice scenario. For this
reason, it is our recommendation that if such tests are to be included, this should be done at the end
of the survey, that way avoiding any biasing influence on the remainder of the data.
21
© Association for European Transport and contributors 2009
outcomes associated with discrete choice data. For example, Hess et al., 2009 produced results that
suggest that respondents lacked the ability to adequately distinguish between the alternatives in
presented in the Danish VTT survey, where simulations had not revealed any problems.
22
© Association for European Transport and contributors 2009
Figure 15: Example of Pre-test CAPI screen in Excel
7. Survey administration
7.1 What other information should be collected?
SC surveys invariably collect some form of additional information on top of the data relating to
the respondents’ preferences. This includes socio-demographic and attitudinal data, as well as data
relating to the questions put to respondents to get them to explain the process that led to their
choices, e.g. information processing strategies. Analysts should always attempt to collect whatever
additional data they feel may be useful at the modelling stage, and should use as a warning the large
number of studies that fail, for one reason or another, to collect a vital piece of information and
then need to rely on arbitrary processes such as imputation. However, analysts should also be
mindful of the fact that some of these additional questions can be of a personal nature, such as for
example to do with attitudes and attributes such as income. For this reason, it is crucial that such
data be collected after the actual choice experiments, so as to avoid influencing the actual
behaviour. Analysts also need to make careful trade-offs between asking for enough information
and not to overdo it and risk non-response, for example when using too high a level of
disaggregation for income.
23
© Association for European Transport and contributors 2009
population, Rose and Bliemer (2006) show how optimal sample size requirements can be
determined for each segment. Whilst Bliemer and Rose (2009) suggest that sample sizes derived
from such equations should represent a minimum bound in terms of the actual sample sizes
required for data collection, Bliemer et al. (2009) found that the real sample sizes required were
remarkably close to those suggested by the equations.
Nevertheless, in studies aimed at drawing wide and general conclusions, a certain degree of
representativeness needs to be achieved in the sample. If only a small sample is collected, there
does exist the potential that sampling bias will mean that the sample selected will not be
representative of the overall population from which it is drawn. As such, even if smaller sample sizes
are required than are generally collected, as suggested by Bliemer and Rose (2009), other external
requirements may necessitate larger sample sizes be collected in practice.
24
© Association for European Transport and contributors 2009
used, which may create problems if the number of separate surveys is low, i.e., the same savings are
used for potentially quite different journey times.
All the above listed problems can to a large degree be avoided by making use of a computer
based survey, either in the form of an interviewer assisted survey or an internet based survey. Such
surveys allow for a high degree of customisation and also carry out calculations automatically,
avoiding numerical issues while also guaranteeing a correspondence between those values used in
the survey and those values used in the models. Furthermore, such surveys can rely on percentage
variations, which may produce more realistic attribute levels, while also increasing data richness.
Finally, a point that is rarely discussed is that in paper based surveys, respondents see all choice
situations at the same time, potentially leading to cross-scenario comparisons of alternatives, an
issue that does not arise in computer based surveys where one screen is used per scenario.
The cost of interviewer assisted surveys may be prohibitively high, paving the way for internet
based surveys. Here, issues of sample representativeness may be avoided by still sampling
respondents in the same way (e.g., roadside) and by handing out login details for the internet based
survey, an approach that is again becoming more common place, for example in many river crossing
and toll road surveys in the United States.
25
© Association for European Transport and contributors 2009
dPi X jk dV j (2)
E XPi jk X jk Pj
dX jk Pi dX jk
This does not mean that the calculation of values such as elasticities (and likewise marginal effects)
should never be contemplated. Indeed, the opposite is true. Where one wishes to compare the
results across different models or data sets, then the generation of elasticities should be considered.
However, unless the model constants have been calibrated, the actual values of the elasticities
should be interpreted with care.
Additionally, there is a clear risk that the scale in SP data is different from that in RP data, i.e.
that respondents’ response to changes in attribute values is higher or lower than it would be in a
real life scenario. Thus, it is not only the ASCs that need recalibrating, but also the scale of the
marginal utility coefficients.
9. Conclusions
In this paper, we have presented examples of previous SP studies to demonstrate some practical
aspects of conducting such studies. In doing so, we have sought to provide practitioners and
academic researchers alike with recommendations that might prove useful in order to avoid
problems and mistakes that have been made in the past. We have also sought to provide discussion
on other aspects of using SP surveys that hopefully will improve the reporting and use of SP survey
results. Our discussions herein have led us to a number of conclusions, primary of which is that
qualitative research is a must and that piloting and pretesting of SP surveys is a necessity.
We wish to conclude however by stating that such survey problems do not only afflict SP
surveys. Many of the issues identified here may equally impact upon RP data collection as well. To
this end, we offer the following example, taken from Hensher et al. (2005). “Some years ago a
student undertook research into a household’s choice of type of car. The student chose to seek
information on the alternatives in the choice set by asking the household. Taking one household
who owned one vehicle, their chosen vehicle was a Mazda 323. When asked for up to three
alternatives that would have been purchased had they not bought the Mazda 323, the stated
vehicles were Honda Civic, Toyota Corolla and Ford Escort. After collecting the data and undertaking
the model estimation it was found that the vehicle price attribute had a positive sign (and was
marginally significant). After much thought it became clear what the problem was. By limiting the
choice set to vehicles listed by the respondent, we were limiting the analysis to the choice amongst
similarly priced vehicles. Consequently more expensive vehicles (and much less expensive ones)
were not being assessed in the data although some process of rejection had clearly taken place by
the household. The price attribute at best was a proxy for quality differences amongst the vehicles
(subject to whatever other vehicle attributes were included in the observed part of the utility
26
© Association for European Transport and contributors 2009
expression). Price would be better placed in explaining what alternatives were in or out of the choice
set. If the student had simply listed all vehicles on the market (by make, model, vintage) and
considered all eligible, then regardless of which grouping strategy was used (as discussed above) one
would expect price to have a negative parameter; and indeed the model if well specified should have
assigned a very low likelihood of the particular household purchasing a vehicle in a higher and a
lower price range. An important lesson was learnt.”
Acknowledgements
The first author acknowledges the financial support of the Leverhulme Trust in the form of a
Leverhulme Early Career Fellowship.
References
AHCG (1996), Value of Travel Time on UK Roads, report by Hague Consulting Group and Accent
Marketing & Research for the UK Department for Environment, Transport and the Regions.
Axhausen, K.W., Hess, S., König, A., Abay, G., Bates, J.J. & Bierlaire, M. (2008), Income and distance
elasticities of values of travel time savings: New Swiss results, Transport Policy, 15(3), pp. 173-185.
Batley, R., Grant-Muller, S., Nellthorp, J., de Jong, G., Watling, D., Bates, J., Hess, S. & Polak, J.W.
(2008), Multimodal travel time variability, final report for the UK Department for Transport.
Bierlaire, M., Axhausen, K. and Abay, G. (2001). Acceptance of modal innovation: the case of the
Swissmetro, Proceedings of the 1st Swiss Transportation Research Conference, Ascona, Switzerland
Bliemer, M.C. and Rose, J.M. (2009) Efficiency And Sample Size Requirements for Stated Choice
Experiments, Transportation Research Board Annual Meeting, Washington DC January.
Bliemer, M.C., Rose, J.M. and Beelaerts van Blokland, R. Experimental Design Influences on Stated
Choice Outputs, European Transport Conference, Leeuwenhorst, October 5-7.
Black, I., Efron, A., Anthony, C.I. and Rose, J.M. (2005) Designing and implementing internet
questionnaires using Microsoft Excel, Australasian Marketing Journal, 13(2), 62-73.
Brazell, J.D., Diener, C.G., Karniouchina, E., Moore, W.L., Severin, V. and Uldry, P.F. (2006) The no-
choice option and dual response choice designs, Marketing Letters, 17, 255-268.
Brownstone, D. and Small. K. (2005) Valuing time and reliability: assessing the evidence from road
pricing demonstrations. Transportation Research Part A, 39, 279-293.
Carlsson, F and Martinsson, P. (2001) Do hypothetical and actual marginal willingness to pay differ in
choice experiments? Journal of Environmental Economics and Management, 41, 179-192.
Chintakayala, P.K., Hess, S., Rose, J.M. & Wardman, M.R. (2009a), Effects of stated choice design
dimensions on estimates, paper presented at the inaugural International Choice Modelling
Conference, Harrogate.
Chintakayala, P.K., Hess, S., & Rose, J.M. (2009b), Using second preference choices in pivot surveys
as a means of dealing with inertia, paper presented at the European Transport Conference,
Noordwijkerhout, The Netherlands.
27
© Association for European Transport and contributors 2009
Dhar, R. and Simonson, I. (2003) The effect of forced choice on choice, Journal of Marketing
Research, 40(2), 146-160.
Efron A., Rose, J.M., and Roquero D. (2003) Truck or Train? A Stated Choice Study on Intermodalism
in Argentina, presented at XVII Congresso de Pesquisa e Ensino em Transportes, Rio de Janeiro,
Brazil, November 10th -14th.
Garrod, G.D., Scarpa, R. and Willis, K.G. (2002). Estimating the Benefits of Traffic Calming on Through
Routes: A Choice Experiment Approach, Journal of Transport Economics and Policy, 36(2), 211-232.
Harrison, G.W. and Rutström, E.E. (2006) Experimental evidence on the existence of hypothetical
bias in value elicitation methods, In: Handbook of Experimental Economics Results, C.R. Plott and
V.L.Smith, Eds., Amsterdam: North-Holland.
Hensher, D.A. (2008) Hypothetical bias and stated choice studies, submitted to Transportation
Research Part B.
Hensher, D.A. (2001a) The valuation of commuter travel time savings for car drivers: evaluating
alternative model specifications, Transportation, 28(2), 101-118.
Hensher, D.A. (2001b) Measurement of the Valuation of Travel Time Savings, Journal of Transport
Economics and Policy, 35(1), 71-98.
Hensher, D.A., Greene, W.H. and Rose, J.M. (2006) Deriving willingness to pay estimates of travel
time savings from individual-based parameters, Environment and Planning A, 38, 2365-2376.
Hensher, D.A. and King, J. (2001) Parking demand and responsiveness to supply, pricing and location
in the Sydney central business district, Transport Research Part A, 35(3), 177-196.
Hensher, D.A., Rose, J.M. and Greene, W.H. (2005) Applied Choice Analysis: A Primer, Cambridge
University Press, Cambridge.
Hess, S. & Adler, T. (2009), Experimental designs for the real world, ITS working paper, Institute for
Transport Studies, University of Leeds.
Hess, S. and Rose, J.M. (2009a) Allowing for intra-respondent variations in coefficients estimated on
stated preference data, Transportation Research Part B, 43(6), 708-719.
Hess, S. and Rose, J.M. (2009b) Should reference alternatives in pivot design SC surveys be treated
differently?, Environment and Planning A, 42(3), 297-317.
Hess, S., Smith, C., Falzarano, S. & Stubits, J. (2008) Measuring the effects of different experimental
designs and survey administration methods using an Atlanta Managed Lanes Stated Preference
survey, Transportation Research Record, 2049, 144-152.
Hess, S., Rose, J.M. & Polak, J.W. (2009), Non-trading, lexicographic and inconsistent behaviour in
stated choice data, Transportation Research Part D, accepted for publication, January 2009.
Kanninen, B.J. (2002) Optimal Design for Multinomial Choice Experiments, Journal of Marketing
Research, 39, 214-217.
Li, Z., Hensher, D.A. and Rose, J.M. (2009) Willingness to Pay for Travel Time Reliability for Passenger
Transport: A Review and some New Empirical Evidence, submitted to Transportation Research Part
E.
List, J. and Gallet, G. (2001) What experimental protocol influence disparities between actual and
hypothetical stated values? Environmental and Resource Economics, 20, 241-254.
Louviere, J.J., Street, D., Burgess, L., Wasi, N., Islam, T. and Marley A.A.J. (2008) Modeling the
choices of individual decision-makers by combining efficient choice experiment designs with extra
preference information, Journal of Choice Modelling, 1(1), 128-163.
28
© Association for European Transport and contributors 2009
Lusk, J. and Schroeder, T. (2004) Are choice experiments incentive compatible? A test with quality
differentiated beef steaks, American Journal of Agricultural Economics, 86(2), 467-482.
Marley, A.A.J. and Louviere, J.J. (2005) Some probabilistic models of best, worst, and best–worst
choices, Journal of Mathematical Psychology, 49, 464–480.
Murphy, J., Allen, P., Stevens, T. And Weatherhead, D. (2004) A meta-analysis of hypothetical bias in
stated preference valuation, Department of Resource Economics, University of Massachusetts,
Amherst, January.
Orme, B. (1998) Sample Size Issues for Conjoint Analysis Studies, Sawtooth Software Technical
Paper, http://www.sawtoothsoftware.com/technicaldownloads.shtml#ssize.
Ortúzar, J. de D,, Iacobelli, A. and Valeze, C. (2000) Estimating demand for a cycle-way Network,
Transport Research Part A, 34(5), 353-373.
Rose, J.M. and Bliemer, M.C.J. (in press) Constructing Efficient Stated Choice Experimental Designs,
Transport Reviews.
Rose, J.M. and Bliemer, M.C. (2006) Designing Efficient Data for Stated Choice Experiments,
presented at 11th International Conference on Travel Behaviour Research, Kyoto, August 16-20, 2006,
Japan.
Rose, J.M., Bliemer, M.C., Hensher and Collins, A. T. (2008) Designing efficient stated choice
experiments in the presence of reference alternatives, Transportation Research Part B, 42(4), 395-
406.
Rose, J.M. and Hensher, D.A. (2006) Handling individual specific non-availability of alternatives in
respondent's choice sets in the construction of stated choice experiments, Stopher, P.R. and Stecher
C. (eds.) Survey Methods, Elsevier Science, Oxford, pp347-371.
Rose, J.M. and Hess, S. (2009) Dual Response Choices In Reference Alternative Related Stated Choice
Experiments, Transportation Research Board Annual Meeting, Washington DC January.
Shiflan, Y. and Bard-Eden, R. (2001) Modeling Response to Parking Policy, Transport Research
Record, 1765, 27-34.
van der Waerden, P., Timmermans, H. and Borgers, A. (2002) PAMELA: Parking Analysis Model for
Predicting Effects in Local Areas, Transport Research Record, 1781, 10-18.
Wardman, M. (2001) A review of British evidence on time and service quality Valuations,
Transportation Research Part E, 37, 91-106.
Wittink, D.R., Huber, J., Zandan, P. and Johnson, R.M. (1992) The Number of Levels Effect in
Conjoint: Where Does It Come From and Can It Be Eliminated?, Sawtooth Software Conference
Proceedings.
Wittink, D.R., Krishnamurthi, L. and Nutter, J.B. (1982) Comparing Derived Importance Weights
Across Attributes, Journal of Consumer Research, 8, 471 -4.
Wittink, D.R., Krishnamurthi, L. and Reibstein, D.J. (1989) The Effects of Differences in the Number of
Attribute Levels on Conjoint Results, Marketing Letters, (2), 113-23.
29
© Association for European Transport and contributors 2009