Conjoint Analysis To Measure The Perceived Quality in Volume Rendering
Conjoint Analysis To Measure The Perceived Quality in Volume Rendering
Rendering
Joachim Giesen Klaus Mueller Senior Member, IEEE Eva Schuberth Lujin Wang
Peter Zolliker Member, IEEE
AbstractVisualization algorithms can have a large number of parameters, making the space of possible rendering results rather
high-dimensional. Only a systematic analysis of the perceived quality can truly reveal the optimal setting for each such parameter.
However, an exhaustive search in which all possible parameter permutations are presented to each user within a study group would
be infeasible to conduct. Additional complications may result from possible parameter co-dependencies. Here, we will introduce an
efcient user study design and analysis strategy that is geared to cope with this problem. The user feedback is fast and easy to
obtain and does not require exhaustive parameter testing. To enable such a framework we have modied a preference measuring
methodology, conjoint analysis, that originated in psychology and is now also widely used in market research. We demonstrate our
framework by a study that measures the perceived quality in volume rendering within the context of large parameter spaces.
Index TermsConjoint Analysis, Parameterized Algorithms, Volume Visualization
1 INTRODUCTION
The main purpose of visualization is to produce images that allow
users to gain more insight into the illustrated data. This is a complex
issue, depending on many factors of the visualization system, starting
from human-computer interaction, to rendering speed, to rendering
style and algorithm, and nally human perception and cognition. With
the exception of the last component all of these factors have been de-
signed by humans and many diverse technologies have emerged, and
are still emerging, over the years. But in the end, human perception
is the ultimate judge that determines which of these are the most ef-
fective. A popular focus of the eld of visualization is the modeling
and optimization via engineering and mathematics tools and frame-
works, and often the designer/engineer him/herself judges the success
of the method. Here, the easiest parameters to measure are rendering
speed and memory consumption and others, which are all engineer-
ing quantities. However, in light of the importance of the last element
in the chain, the human observer, a more recent focus has become to
also conduct adequate user studies to measure the success of a pro-
posed method. This practice is already common place in the eld
of human-computer interaction, and to a more limited extent also in
information visualization, but less so in scientic and medical visu-
alization. In essence, user studies are always considered burdensome
since in many cases there are a large number of parameters and algo-
rithmic alternatives, requiring many trials, that is, human subjects and
experiments, to produce statistically signicant results. This has been
a major obstacle in assessing a methods success in terms of the hu-
man perceptive and cognitive system. The pressing question is: can
we make this task easier by introducing a more methodical and or-
ganized approach. For this it pays to look at other elds, especially
those driven by heavy monetary investments. One then nds that user
studies play a major and dominant role in product marketing, where
it is important to tune the various parameters of a product before it
is being launched to market (or determine its launch at all). Clearly,
i
and variances
2
i
2
, i.e., all the variances are the same.
The idea is to assign to each item i the value
i
, on an interval scale,
which still has to be dened. To do so we need to estimate all the
i
s
from the paired comparison data that we have available. It turns out
that it is easier to estimate the differences
i
j
. We use the latter
and assign to each item i the value
1
n
n
j=1
(
i
j
) =
i
1
n
n
j=1
j
=:
i
.
That is, we only shift the scale that assigns
i
to item i by the value
, i.e., as interval scales both scales are the same. Note that by the
properties of normal distributions, the differences S
i
S
j
are normally
distributed with expectations
i
j
and variance 2
2
. This yields
P[S
i
S
j
> 0] =
1
4
2
0
e
(x(
i
j
))
2
4
2
dx =
,
where is the cumulative distribution function
(x) =
1
y
2
2
dy
of the standard normal distribution. Hence,
j
=
P[S
i
S
j
> 0]
.
We can estimate P[S
i
S
j
> 0] by the observed quantity F
ij
, i.e.,
the relative frequency that item i was preferred over j and thus es-
timate
i
j
by
2
1
F
ij
2
n
j=i
F
ij
2
i1
+
2
i2
=
2
j1
+
2
j2
1 for all attributes A
i
and A
j
,
here the value 1 is arbitrary (we just need to choose one xed value).
Note that if we knew the values
i2
, then these equalities would deter-
mine the values for the
i1
(that we kept variable so far) and by that
make the scales for all the attributes comparable. We can estimate the
i2
from the scaled observed scale values
i1
s
i j
by taking the (biased)
estimator of the standard deviation, i.e., by
2
i1
k
i
1
k
i
j=1
s
2
i j
, remember that
1
k
i
k
i
j=1
s
i j
= 0.
Using this estimate we can solve
2
i1
+
2
i2
= 1 for
i1
to estimate
i1
as
i1
=
1
1+
1
k
i
1
k
i
j=1
s
2
i j
.
That is, in order to make the scales for the different attributes com-
parable we need to rescale the s
i j
that we computed with Thurstones
method (with constant variance 1) by this estimate of
i1
.
Since now the scales of all the attributes are comparable we can get
the scale value for an item just as the sum of the scale values of the at-
tribute levels involved, i.e., the scale value of (a
1j
1
, . . . , a
nj
n
), a
i j
i
A
i
is given as
n
i=1
i1
s
i j
i
. Note also that on comparable scales each value
i2
can be interpreted as a measure of how important attribute A
i
is
(contributes larger values to the sum). But we have to be careful, ad-
ditivity only holds when the attributes are independent. For example
a black foreground and a black background might independently con-
tribute a lot to the perceived quality in visualization, but their com-
bined contribution is negative.
5.3 Error analysis
Analytic error estimate. Let us briey describe our error analy-
sis. Our observed quantities are the relative frequencies F
ij
. We as-
sume that any comparison of items i and j is an independent Bernoulli
trial with success probability p (here success means that i is pre-
ferred over j). We want to estimate p by F
ij
. For Bernoulli trials,
F
ij
converges to p when the number of trials goes to innity, but
here we make only a nite number m
i j
of comparisons which procures
some error. This error can be estimated by the standard deviation
F
ij
1F
ij
m
i j
.
To compute errors of our scale values we use error propagation.
Resampling error estimates. We will also simulate errors by
randomly dividing the respondents into two groups. For each group
we can compute the scale values for all attribute levels (on compa-
rable) scales as described above. So we get for each attribute level
a scale value from each group. Averaging the absolute difference of
these two scale values over several random groupings of the respon-
dents provides us with an experimental error for the scale value of this
attribute level.
Similarly we also compute experimental errors by randomly divid-
ing the paired comparisons into two groups.
5.4 Testing the model
In our model of scale values we made two assumptions, one on the at-
tribute level and one on the item level. The assumption on the attribute
level is, that the scale values for all levels of a given attribute are un-
correlated and have the same variance, and the assumption on the item
level is, that the scale value of an item is the sum of the scale values
of its attribute levels. The latter is essentially the assumption that the
attributes are preferentially independent.
Preferential independence (additivity). Here we want to de-
scribe how to test the second assumption of our model, i.e., the ad-
ditivity (or preferential independence) assumption. Let A
1
and A
2
be
two attributes and let C = A
1
A
2
be the new attribute that results
from combining A
1
and A
2
and let c
1
, . . . , c
k
be its levels. We com-
pute scale values for the levels of C in two different ways. First, for
every level c
i
= (a
i1
, a
i2
) with a
i1
A
1
and a
i2
A
2
we add up the
comparable scale values for a
i1
and a
i2
that we compute as described
before. Let s
1
, . . . , s
k
be the resulting scale values. Second, we apply
Thurstones method directly to the combined attribute C and make the
resulting scale values comparable with the scales values of all levels
of attributes different from A
1
and A
2
. This results in scale values
s
1
, . . . , s
k
.
If additivity holds, then we expect that s
i
s
i
. Thus, our hypothesis
is that s
i
= s
i
for all 1 i k. As test statistic we use
2
=
k
i=1
(s
i
s
i
)
2
2
i
+
2
i
,
where
i
and
i
are computed by error propagation from the errors of
the observed frequencies. If the hypothesis is true then the test statistic
2
is approximately
2
-distributed with k1 degrees of freedom. The
hypothesis is rejected at a signicance level of if
2
>
2
1,k1
where
2
1,k1
is the 1 quantile of the
2
-distribution with k 1
degrees of freedom.
Mostellers test. On the attribute level we make the assump-
tions that the scale values are uncorrelated and normally distributed
with equal variances. A test for this assumption was devised by
Mosteller [11] and is also described in [4]. Here we only briey review
Mostellers test, which boils down to test if our model can explain the
observed frequencies F
ij
. To this end we compute
p
i j
=
1
2
0
e
(x(s
i
s
j
))
2
4
dx =
s
i
s
j
,
where we use s
i
and s
j
as computed by Thurstones method with
= 1. Then we transform both F
ij
and p
i j
into angles
i j
and
i j
,
respectively, using the arcsine transformation given by
i j
= arcsin
2F
ij
1
and
i j
= arcsin
2p
i j
1
i j
and variance 1/m
i j
for all i < j. As test
statistic we use
2
=
i<j
m
i j
(
i j
i j
)
2
.
If the hypothesis is true then the test statistic
2
is approximately
2
-
distributed with
n1
2
n1
2
degrees of freedom.
6 DATA ANALYSIS: RESULTS
In this section we report on how we applied our data analysis method
that we described in Section 5 to obtain meaningful scale values for
our four conjoint studies. All subsequent results refer to respondents
that are more than 10 years old
1
and have passed the Ishihara test for
color blindness. Among all respondents fullling these two criteria
317 respondents participated in the two studies with test question DE-
TAIL and 366 respondents participated in the other two studies with
test question AESTHETICS.
In a rst step we computed scale values using the method described
in Section 5.2. These scale values need not be meaningful since model
assumptions that underlie these computations might not be met in our
studies. Hence we discuss in the following how to obtain meaningful
scale values from the initially computed ones.
6.1 Testing preferential independence
As pointed out earlier, if the parameters are preferentially independent,
then the scale values for different parameters are comparable and we
can determine the scale value of an image (rendering for a specic
choice of parameter values) by adding up the scale values for the pa-
rameter values used to render the image. The top ranked image that
we get this way for the FOOT data set and AESTHETICS question does
not look like a reasonable rst choice, see Figure 1. The reason is not
surprising: the parameters COLOR and BACKGROUND are not prefer-
entially independent for this study.
We tested all pairs of parameters on interdependencies for all four
studies using the additivity test described in Section 5.4. Table 1 sum-
marizes the result of this test for all combinations of parameters.
Based on the outcome of the additivity test we decided to combine
the parameters RENDERING and STEPSIZE into a single new parame-
ter RENDERING-STEPSIZE for all four studies. For the FOOT data set
and both questions we also combined the parameters COLORMAP and
BACKGROUND into the new parameter COLORMAP-BACKGROUND
2
.
1
We found no signicant differences between respondents younger and
older than 17, respectively. See also Section 7.
2
Note that though the top ranked image for the study [FOOT, DETAIL] looks
reasonable, see Figure 1, it turns out that we have to combine the two color
parameters also for this study.
Fig. 1. The images with the highest scale values for the studies
[FOOT,AESTHETICS] (left) and [FOOT, DETAILS] (right) before taking
care of parameter dependencies.
FOOT
C R1 V R2 S B
Colormap C * 4 1
Rendering R1 2 * 5 3 2
Viewpoint V *
Resolution R2 *
StepSize S 4 * 6
Background B 1 3 *
ENGINE
C R1 V R2 S B
Colormap C *
Rendering R1 * 1
Viewpoint V *
Resolution R2 *
StepSize S 1 *
Background B 2 *
Table 1. Test for pairwise preferential independence of the parameters
with signicance level =0.01. The shown numbers denote the rank or-
der of relevance (only for signicant dependencies), i.e., smaller values
indicate more relevant dependencies. The values below the diagonal
are for the AESTHETICS question and the values above the diagonal are
for the DETAILS question.
That is, we compute new scale values for the combined parameters and
use them to replace the scale values for the original parameters. This
already gives our nal scale values that we summarize in Table 3 and
Figure 2. The gure shows the best ten and the worst ten renderings
for each of the four studies.
6.2 Mostellers test
We also tested our model assumptions on the parameter (attribute)
level using Mostellers test, see Section 5.4, on all parameters (includ-
ing the combined ones). With a few exceptions all parameters passed
the test at the = 0.01 signicance level. All exceptions concerned
the RENDERING parameter. Possible reasons are unequal variances of
the distributions of the scale values for different levels, inappropriate-
ness of a one-dimensional scale or an underestimation of the error.
To further investigate the last point, underestimation of the error, we
compared the computed sample size error with the two experimental
errors described in Section 5.3. All computed theoretical sample size
errors are within 15% of the experimental errors, except for the pa-
rameter RENDERING which shows an underestimation of up to 40%.
This nding also puts the results on the preferential independence tests
involving the RENDERING parameter into a new perspective. Some of
the detected interdependencies in Table 1 are not signicant anymore
if the error estimates for RENDERING are adjusted.
7 RESULTS
In this section we discuss the results obtained for our four visualization
case studies.
Relative importance of parameters. As we pointed out at the
end of Section 5.2 the standard deviation
i2
for attribute A
i
can be
interpreted as the relative importance of attribute A
i
. In our setting the
attributes are the parameters of the visualization algorithm. Using the
estimated standard deviation we get the rank ordering of the param-
eters as shown in Table 2. From these results it is safe to conclude
that overall the rendering mode (combined parameter RENDERING-
STEPSIZE) is the most important parameter. The importance of this
parameter is relatively higher for the DETAIL than for the AESTHET-
ICS question. A second important parameter is the color scheme used
(or the background), although this nding is not as pronounced. The
viewpoint is somewhat important (mostly for the FOOT), while the res-
olution is somewhat important for the ENGINE. The other parameters
are relatively unimportant, at least at the levels we have measured.
Most preferred levels. The results of Tables 2 and 3 as well as
Figure 2 reveal a good deal of useful information. We observe that
the algorithms XRAY and MIP are not considered useful by our re-
spondents (but note that these were non-expert viewers doctors can
see a lot more in those renderings). The DVRGM algorithm performs
(slightly) better than DVR, which performs better than DVRNS. This
ranking shows that the more structure enhancement, the better.
There is also a clear preference for achromatic backgrounds. Only
blue is also found to be somewhat useful, possibly because blue is
a monocular depth cue in that colors very far away shift to the blue
spectrum, or because of the background shade of blue and the object.
Highly saturated backgrounds are generally disliked. Interestingly,
there are also differences between the two achromatic backgrounds: a
black background is considered more aesthetic, whereas white seems
to show detail better. This is particularly true for the ENGINE which
is overall a more complex dataset. It is most likely also an object that
is less familiar to the respondents. Therefore they require more detail;
higher resolution is also more important (than for the less complex
FOOT).
For the ENGINE, the color map applied does not seem to matter as
much, but for the DETAIL question, the FOOT (bone) is strongly pre-
ferred to be seen in a color resembling that of bright bone (skin grey).
This indicates that for object inspection, viewers like to see objects in
colors that are most natural and at the same time bright (when such a
color is generally agreed on), but for objects less dened in that respect
the color choice is a matter of taste (as is the case for the ENGINE), as
long as they are bright and dene contrast well. In the AESTHETICS
category viewers still preferred a natural color (for the FOOT), but the
brightness condition was no longer so important (by denition of the
task criterion).
An interesting observation can also be made with respect to the
viewpoint. A common feature is that viewers prefer to see ob-
jects at oblique angles, which generally gives objects a more three-
dimensional appearance and also reveals more features (such views are
also used for product advertisements). But the engine was in general
preferred to be situated as standing on a surface the views where
the engine was rotated at an arbitrary angle (and appeared as it were
ying towards the viewer) were rated low. On the other hand, the
foot was acceptable at most orientations. We believe that the ying
engine was deemed unrealistic, and perhaps even dangerous and there-
fore unappealing, while a foot is seen commonly at general orientation
in real life (just not as a bone).
Dependency on the respondent. We observed that the exper-
imental error, see Section 5.3, was larger when dividing respondents
into different sets than when dividing choice tasks into different sets.
This indicates that although the respondents answered only 20 choice
tasks for each data set, we can already detect a dependency on the in-
dividuals preferences, i.e., preferences are not homogeneous over the
population.
We also analyzed preferential differences between different sub-
groups male vs. female and young vs. old, respectively) of our popu-
lation respondents
3
:
3
We also collected preference data from 37 persons showing color decien-
cies, but the sample size was not sufcient to detect signicant differences to
AESTHETICS DETAIL
FOOT 1. RENDERING-STEPSIZE (0.31) RENDERING-STEPSIZE (0.52)
2. COLORMAP-BACKGROUND (0.3) COLORMAP-BACKGROUND (0.35)
3. VIEWPOINT (0.14) VIEWPOINT (0.12)
4. RESOLUTION (0.05) RESOLUTION (0.08)
ENGINE 1. RENDERING-STEPSIZE (0.56) RENDERING-STEPSIZE (0.77)
2. BACKGROUND (0.19) RESOLUTION (0.09)
3. RESOLUTION (0.12) VIEWPOINT (0.08)
4. VIEWPOINT (0.09) BACKGROUND (0.05)
5. COLORMAP (0.05) COLORMAP (0.01)
Table 2. Rank order of the parameters used in our four studies. the rank order is derived from estimated variances (shown in brackets).
We only found signicant differences between male and female re-
spondents for the COLORMAP parameter in the [FOOT,AESTHETICS]
study: female respondents mostly prefer BLUECYAN (scale value:
0.07(3))
4
, which is also liked by the male respondents (0.07(2)) but
not as much as SKINGRAY (0.99(2)), which is the least preferred color
of the females (-0.04(3)). Magenta is least preferred by the males (-
0.12(2)), whereas females (-0.03(3)) prefer it over SKINGRAY.
In general we found no signicant differences between the two
age classes 17 years or younger (teenagers) and older than 17 years
(adults). We only found two exceptions concerning the AESTHETICS
question. For adults the preferences within the RENDERING parame-
ter are more pronounced than for teenagers, though the rank order of
the individual levels is the same. On the other hand teenagers tend to
have more pronounced preferences concerning the background color,
again with basically the same order on the individual levels as for the
adults.
Altogether these ndings have interesting consequences if one
wants to personalize visualization systems: it seems hard to do so
based on socio-demographic data (as age and gender) only.
Dependency on the data set. Preferences obtained for the
FOOT differ signicantly from preferences for the ENGINE dataset.
This difference is most pronounced for the combined parameter
RENDERING-STEPSIZE
5
, which is much more important for the EN-
GINE dataset for both questions.
Dependency on the question. The observed preferences in the
DETAIL studies are signicantly different from the preferences in the
AESTHETICS studies. The question about detail separates the prefer-
ences for different parameter values better. This means that there is
more mutual consent in the test population about detail. We believe
this is due to the fact that the question about detail is more specic,
and less subject to personal taste. The question about details separates
the preferences on the ENGINE data set into two distinct preference
classes (DVRXX against XRAY/MIP). This separation does not show
in the [ENGINE,AESTHETICS] study.
Parameter interdependence. As discussed earlier, our additiv-
ity test shows that the independence assumption is not fullled for
the parameters COLORMAP and BACKGROUND for the FOOT data
set. This nding seems very reasonable since similar object and back-
ground color certainly should have a negative impact on the perceived
image appearance. Furthermore details are better visible if the contrast
between foreground and background color is high.
The additivity test also shows that the parameters RENDERING and
STEPSIZE are not independent. The observed interdependency is less
intuitive than the one between COLORMAP and BACKGROUND, but
can be explained also. The scale values for the combined parameter
show that the changes in STEPSIZE do not induce the same magnitude
the rest of the population.
4
Numbers in parenthesis show the estimated standard deviation in units of
the last shown digit.
5
The parameters VIEWPOINT and COLOR can not be compared directly for
the two datasets, because different colors and viewpoints were used as param-
eter levels.
of change for the scale values of the different RENDERING levels. In
particular for XRAY and MIP levels the changes in STEPSIZE seem
to have no or only marginal inuence. This can be due to the fact that
MIP and XRAY algorithms lack coherency in structure and are mostly
used for quick survey modalities, but not for careful diagnosis. Our
study indicates that the visual system cannot detect all errors or even
inconsistencies, and thus viewers do not become aware of possible
errors,
8 DISCUSSION
We too rst steps to demonstrate that conjoint analysis can be a useful
and efcient tool to gauge inuences of a rich set of rendering param-
eters on human perception in visualization tasks. We believe that the
data analysis technique that we have developed here can even be used
to analyze data gathered in the rst phase of the human-in-the-loop
method of House, Bair and Ware [6]. Note that our analysis method
only needs paired comparisons between renderings that even can be
obtained from measurement of how well a test person performs a task
on different renderings.
We have tested the framework within a familiar visualization envi-
ronment, a parameterized volume renderer, where we have taken great
care to reduce the effects of competing adverse parameters, such as
image size and occlusion, without reducing the effects of the relevant
tested parameters, such as color schemes and rendering precision and
algorithm. In this process we veried a few known results, such as
the effect of rendering delity, but we also teased out some lesser-
known but important results, such as preferred object orientations,
color schemes, and the relationship of step size and rendering modal-
ity. Another interesting nding is that our conjoint analysis method
can help to resolve tradeoff decisions. In particular for the DVRGM
algorithm it is not necessary to go down to step size 0.2step size 0.5
even gives perceptually better results. That is, it is often not worth-
while to spend the extra computing time required by smaller step size
(time-quality tradeoff). A second tradeoff concerns perceived quality
and le size, which is to a large extent determined by the resolution.
Our methods allow us to quantify this tradeoff, i.e., to answer the ques-
tion of how much quality gets sacriced when the le size (resolution)
decreases.
With our careful error analysis we obtained insights beyond gaug-
ing of preferences by scale values: we were able to conclude from
the computed experimental errors that preferences depend on the indi-
vidual, which in itself is not so surprising, but we also found that one
cannot predict an individuals preferences from the socio-demographic
data available to us (age and gender).
In future work we want to investigate limitations of the applicability
of conjoint analysis to visualization. Possible concerns are: the large
number of respondents needed (though the burden on each respon-
dent is low); need for more systematic ways to estimate the number
of required respondents; important parameters may over-shadow the
results for not so important ones (rendering statements about the latter
dubious); restrictiveness of the distribution assumptions; inuence of
framing effects or the surrounding in general (we conducted our study
in a controlled environment and tried to control for framing effects by
alternating questions from two studies).
DETAIL
AESTHETICS
DETAIL
AESTHETICS
DETAIL
AESTHETICS
DETAIL
AESTHETICS
Fig. 2. On top: Best ten renderings (ranking decreasing from left to right). On bottom: Worst ten renderings (ranking increasing from left to right)
for our four conjoint studies.
Our vision is to create a (web based) user study analysis suite that
can be used by researchers to conduct and analyze multi-parameter
user studies. Conjoint analysis should be an integral component of
such a suite.
ACKNOWLEDGEMENTS
Joachim Giesen and Eva Schuberth are partially supported by the
Swiss National Science Foundation in the project Robust Algorithms
for Conjoint Analysis. Klaus Mueller and Lujin Wang are par-
tially supported by NSF CAREER grant ACI-0093157 and NIH grant
5R21EB004099-02.
REFERENCES
[1] www.sawtoothsoftware.com.
[2] S. Bergner, T. M oller, D. Weiskopf, and D. J. Muraki. A spectral anal-
ysis of function composition and its implications for sampling in direct
volume visualization. IEEE Transactions Visualization and Computer
Graphics, 12(5):13531360, 2006.
[3] S. Bruckner, S. Grimm, A. Kanitsar, and M. Gr oller. Illustrative context-
preserving exploration of volume data. IEEE Transactions Visualization
and Computer Graphics, 12(6):15591569, 2006.
[4] P. G. Engeldrum. Psychometric Scaling, A Toolkit for Imaging Systems
Development. Imcotek Press, Winchester MA, USA, 2000.
[5] A. Gustafsson, A. Herrmann, and F. Huber. Conjoint analysis as an in-
strument of market research practice. Conjoint Measurement. Methods
and Applications. A. Gustafsson, A. Herrmann and F. Huber (editors),
pages 545, 2000.
[6] D. H. House, A. Bair, and C. Ware. An approach to the perceptual op-
timization of complex visualizations. IEEE Trans. Vis. Comput. Graph.,
12(4):509521, 2006.
[7] S. Ishihara. Ishiharas tests for colour blindness. Isshinkai, 1962.
[8] G. Ji and H. Shen. Dynamic view selection for time-varying volumes.
IEEE Transactions Visualization and Computer Graphics, 12(5):1109
1116, 2006.
[9] Y. Kim and A. Varshney. Saliency-guided enhancement for volume vi-
sualization. IEEE Transactions Visualization and Computer Graphics,
12(5):925932, 2006.
[10] J. Kniss, S. Premoze, M. Ikits, A. Lefohn, C. Hansen, and E. Praun. Gaus-
sian transfer functions for multi-eld volume visualization. Proceedings
of IEEE Visualization 2003, pages 497504, 2003.
[11] F. Mosteller. Remarks on the method of paired comparisons. Psychome-
trika, 16:207, 1951.
[12] P. Shanbhag, P. Rheingans, and M. desJardins. Temporal visualization of
planning polygons for efcient partitioning of geo-spatial data. Proceed-
ings of the 2005 IEEE Symposium on Information Visualization, page 28,
2005.
[13] S. Takahashi, I. Fujishiro, Y. Takeshima, and T. Nishita. A feature-driven
approach to locating optimal viewpoints for volume visualization. Pro-
ceedings of IEEE Visualization 2005, pages 495502, 2005.
[14] L. L. Thurstone. A law of comparative judgement. Psychological Review,
34:273286, 1927.
[15] H.-C. Wong, H. Qu, U.-H. Wong, Z. Tang, and K. Mueller. A percep-
tual framework for comparisons of direct volume rendered images. IEEE
Pacic-Rim Symposium on Image and Video Technology, 2006.
DATA SET: ENGINE DATA SET: FOOT
QUESTION: QUESTION:
PARAMETER PARAM. VALUE AESTHETICS DETAILS PARAM. VALUE AESTHETICS DETAILS
COLORMAP MagentaBlue -0.061(17) -0.006(18) SkinGrey 0.039(17) 0.146(18)
RedYellow 0.065(17) -0.001(18) BlueCyan 0.070(17) -0.079(18)
BlueGreen -0.004(17) 0.007(18) Magenta -0.109(17) -0.067(18)
BACKGROUND Black 0.378(26) 0.049(28) Black 0.419(27) 0.246(28)
White -0.034(26) 0.078(28) White -0.063(26) 0.047(27)
Green -0.162(26) -0.018(28) Green -0.105(26) -0.097(28)
Blue -0.063(26) -0.062(28) Blue 0.007(26) -0.103(28)
Yellow -0.120(26) -0.046(28) Yellow -0.258(26) -0.093(28)
RENDERING DVR 0.514(25) 0.719(28) DVR 0.095(26) 0.361(27)
DVRNS 0.353(24) 0.530(25) DVRNS -0.001(25) 0.058(26)
DVRGM 0.385(24) 0.629(26) DVRGM 0.484(26) 0.561(27)
XRAY -0.305(23) -1.005(31) XRAY -0.308(26) -0.758(28)
MIP -0.947(28) -0.872(28) MIP -0.270(25) -0.223(26)
STEPSIZE 0.5 0.028(17) 0.026(18) 0.5 0.035(17) 0.065(18)
0.2 0.051(17) 0.066(18) 0.2 0.061(17) 0.038(18)
1.0 -0.078(17) -0.093(18) 1.0 -0.096(17) -0.103(18)
VIEWPOINT side-front 0.132(30) 0.118(32) side-60 0.126(26) 0.174(28)
side-back 0.052(30) 0.071(33) top-90 -0.158(26) -0.118(28)
side-top 0.060(30) 0.027(32) top-0 -0.133(26) -0.151(28)
side-down -0.120(30) -0.041(32) side-30 0.208(27) 0.098(28)
front -0.007(30) -0.073(32) top-135 -0.044(26) -0.003(28)
side -0.117(29) -0.101(32)
RESOLUTION high 0.115(10) 0.091(11) high 0.045(10) 0.080(11)
low -0.115(10) -0.091(11) low -0.045(10) -0.080(11)
RENDERING DVR, 0.5 0.60(5) 0.81(7) DVR, 0.5 0.00(5) 0.17(5)
-STEPSIZE DVR, 0.2 0.51(5) 0.86(6) DVR, 0.2 0.29(5) 0.64(5)
DVR, 1.0 0.41(5) 0.49(5) DVR, 1.0 0.02(5) 0.26(5)
DVRNS, 0.5 0.18(4) 0.41(5) DVRNS, 0.5 0.08(5) 0.17(5)
DVRNS, 0.2 0.44(4) 0.64(5) DVRNS, 0.2 -0.01(5) -0.03(5)
DVRNS, 1.0 0.40(4) 0.49(5) DVRNS, 1.0 -0.09(5) 0.02(5)
DVRGM, 0.5 0.63(5) 0.85(6) DVRGM, 0.5 0.67(5) 1.07(6)
DVRGM, 0.2 0.48(5) 0.71(5) DVRGM, 0.2 0.60(5) 0.64(5)
DVRGM, 1.0 0.07(4) 0.32(4) DVRGM, 1.0 0.16(5) 0.04(5)
XRAY, 0.5 -0.29(4) -0.95(5) XRAY, 0.5 -0.31(5) -0.80(5)
XRAY, 0.2 -0.32(4) -1.00(6) XRAY, 0.2 -0.36(5) -0.77(5)
XRAY, 1.0 -0.29(4) -1.05(8) XRAY, 1.0 -0.25(5) -0.73(6)
MIP, 0.5 -0.89(5) -0.89(5) MIP, 0.5 -0.24(5) -0.26(5)
MIP, 0.2 -0.93(5) -0.86(5) MIP, 0.2 -0.26(5) -0.23(5)
MIP, 1.0 -1.03(6) -0.83(6) MIP, 1.0 -0.31(5) -0.19(5)
COLORMAP MagBlu-BBlk 0.29(5) -0.05(5) SkinGray-BBlk 0.73(5) 0.74(6)
-BACKGROUND MagBlu-BWht -0.12(5) 0.05(5) SkinGray, Wht -0.29(5) -0.30(5)
MagBlu-BGrn -0.22(5) 0.07(5) SkinGray, Grn -0.11(5) 0.11(5)
MagBlu-BBlu -0.10(5) -0.11(5) SkinGray, Blu 0.24(5) 0.47(5)
MagBlu-BYel -0.17(5) 0.01(5) SkinGray, Yel -0.40(5) -0.24(5)
RedYel-BBlk 0.44(5) 0.20(5) BluCya, Blk 0.30(5) -0.13(5)
RedYel-BWht -0.08(5) 0.11(5) BluCya, Wht 0.26(5) 0.36(5)
RedYel-BGrn -0.06(5) -0.13(5) BluCya, Grn 0.02(5) -0.11(5)
RedYel-BBlu 0.07(5) -0.01(5) BluCya, Blu -0.26(5) -0.75(6)
RedYel-BYel -0.04(5) -0.17(5) BluCya, Yel 0.04(5) 0.17(5)
BluGrn-BBlk 0.40(5) -0.00(5) Mag, Blk 0.20(5) 0.16(5)
BluGrn-BWht 0.10(5) 0.06(5) Mag, Wht -0.14(5) 0.08(5)
BluGrn-BGrn -0.20(5) 0.01(5) Mag, Grn -0.19(5) -0.29(5)
BluGrn-BBlu -0.17(5) -0.06(5) Mag, Blu 0.03(5) -0.07(5)
BluGrn-BYel -0.15(5) 0.02(5) Mag, Yel -0.42(5) -0.19(5)
Table 3. Scale values for all parameter levels (also combined ones) of the four conjoint studies (ENGINE studies on the right and FOOT studies on
the left). Numbers in parenthesis show the estimated standard deviation in units of the last shown digit.