0% found this document useful (0 votes)
48 views7 pages

Minimally Important Differences For Interpreting

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views7 pages

Minimally Important Differences For Interpreting

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

JNCI Cancer Spectrum (2019) 3(3): pkz037

doi: 10.1093/jncics/pkz037
First published online June 4, 2019
Article

ARTICLE

Minimally Important Differences for Interpreting


EORTC QLQ-C30 Scores in Patients With Advanced

Downloaded from https://academic.oup.com/jncics/article/3/3/pkz037/5511405 by guest on 06 April 2022


Breast Cancer
Jammbe Z. Musoro*, Corneel Coens, Frederic Fiteni, Pogoda Katarzyna,
Fatima Cardoso, Nicola S. Russell, Madeleine T. King, Kim Cocks,
Mirjam Ag Sprangers, Mogens Groenvold, Galina Velikova,
Hans-Henning Flechtner, Andrew Bottomley; on behalf of the EORTC Breast
and Quality of Life Groups
See the Notes section for the full list of authors’ affiliations.
*Correspondence to: Jammbe Musoro, PhD, European Organization for Research and Treatment of Cancer, EORTC Headquarters, 83/11 Avenue E. Mounier, 1200
Brussels, Belgium (e-mail: [email protected]).

Abstract
Background: We aimed to estimate the minimally important difference (MID) for interpreting group-level change over time,
both within a group and between groups, for the European Organisation for Research and Treatment of Cancer Quality of Life
Questionnaire core 30 (EORTC QLQ-C30) scores in patients with advanced breast cancer.
Methods: Data were derived from two published EORTC trials. Clinical anchors (eg, performance status [PS]) were selected
using correlation strength and clinical plausibility of their association with a particular QLQ-C30 scale. Three change status
groups were formed: deteriorated by one anchor category, improved by one anchor category, and no change. Patients with
greater anchor changes were excluded. The mean change method was used to estimate MIDs for within-group change, and
linear regression was used to estimate MIDs for between-group differences in change over time. For a given QLQ-C30 scale,
MID estimates from multiple anchors were triangulated to a single value via a correlation-based weighted average.
Results: MIDs varied by QLQ-C30 scale, direction (improvement vs deterioration), and anchor. MIDs for within-group change
ranged from 5 to 14 points (improvement) and 14 to 4 points (deterioration), and MIDs for between-group change over
time ranged from 4 to 11 points and from 18 to 4 points. Correlation-weighted MIDs for most QLQ-C30 scales ranged from
4 to 10 points in absolute values.
Conclusions: Our findings aid interpretation of changes in EORTC QLQ-C30 scores over time, both within and between groups,
and for performing more accurate sample size calculations for clinical trials in advanced breast cancer.

Patient-reported outcomes such as health-related quality of life differences in mean scores can be statistically significant, even
(HRQOL) are increasingly assessed as important endpoints in when clinical relevance is absent. The minimally important dif-
cancer clinical trials. As a result, there is growing interest to im- ference (MID) approach aids interpreting differences and
prove the interpretation of HRQOL data in cancer clinical trials changes in HRQOL scores as clinically meaningful (2–7). MID can
(1). It is recognized that interpreting HRQOL scores merely via be defined as the smallest change in a HRQOL score that is per-
statistical significance might be misleading because small ceived as “important” by a patient or by a third party (eg, a

Received: March 5, 2019; Revised: April 24, 2019; Accepted: May 20, 2019
© The Author(s) 2019. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/
licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
For commercial re-use, please contact [email protected].
1 of 7
2 of 7 | JNCI Cancer Spectrum, 2019, Vol. 3, No. 3

clinician), which may indicate a change in the patient’s man- QLQ-C30, with standard scoring applied to the scales (15). For
agement (2). consistency in signs, all scales were scored such that 0 repre-
MIDs are commonly estimated using anchor-based and sents the worst possible score and 100, the best possible score.
distribution-based methods (7). Anchor-based methods express Financial impact was omitted from the analysis because suit-
differences or change in HRQOL scores using other familiar vari- able anchors were not available.
ables that have clinical relevance (3,7–9) or to patient and/or
physician-derived ratings of change in the specific domain (4–
6). Distribution-based methods use the statistical distribution of Clinical Anchor
HRQOL scores (eg, SD criteria or SEM) and are considered as sup- Anchors were selected from variables that were available in the
portive evidence to anchor-based methods (10). trial datasets (eg, physician examinations and common termi-
This study focused on interpreting the European nology criteria for adverse events [CTCAE]). Anchors were se-
Organisation for Research and Treatment of Cancer Quality of lected for each HRQOL scale based on correlation strength.
Life Questionnaire core 30 (EORTC QLQ-C30) in patients with ad- Spearman rank, polyserial, or polychoric correlation was used,
vanced breast cancer. Guidelines for interpreting the QLQ-C30 depending on the distribution of the pair of variables. Anchors

Downloaded from https://academic.oup.com/jncics/article/3/3/pkz037/5511405 by guest on 06 April 2022


were initially published by King (3) and Osoba et al. (4). King (3) with correlations no less than 0.30 were prioritized (10), and
evaluated published evidence about differences in QLQ-C30 where achievable, anchors with much stronger correlations
scores between groups for multiple cancer sites and clinical were targeted. The retained anchors were further verified for
anchors and found that the score range for small, moderate, clinical plausibility by a panel of six breast cancer and/or
and large effects differed between the scales of the QLQ-C30. HRQOL experts to avoid spurious findings. Multiple anchors
Osoba et al. (4) provided thresholds for interpreting small (5 to could be selected for each HRQOL scale (12).
10 points), moderate (10 to 20 points), and large changes (>20 For trial 1, the retained anchors comprised 1) World Health
points) in QLQ-C30 scores using a global rating of change in Organization PS, scored between 0 (completely active with no
metastatic breast and small-cell lung cancer patients. Based on limitations) and 4 (bedbound); and 2) four CTCAEs (nausea, vom-
King (3) and Osoba et al. (4), mean differences no less than 10 iting, fatigue, and alopecia), graded between 0 (no toxicity) to 4
points are widely regarded as clinically meaningful for the QLQ- (life-threatening). The only anchor retained for trial 2 was the PS.
C30 in randomized clinical trials (11). However, recent guide-
lines revealed that MIDs can differ by QLQ-C30 scale, direction
of change (improvement vs deterioration), and settings (5, 6), Definition of Clinical Change Groups
rendering a widely applicable rule for MIDs highly unlikely. We
therefore need to gather further empirical evidence on patterns Three clinical change status groups (CCG) were defined: deterio-
of MIDs across QLQ-C30 scales and disease sites (12). ration (worsened by 1 anchor category), stable (no change in an-
This study examined MIDs for group-level change in HRQOL chor category), and improvement (improved by 1 anchor
scores over time. In contrast to Osoba et al. (4), we used avail- category). In order not to overestimate the MIDs, change scores
able clinical anchors in the database. Furthermore, the guide- no less than a 2-point change in anchor categories were ex-
lines of King (3) and Cocks et al. (5, 6) were based on meta- cluded from datasets used to estimate MIDs because they were
analyses of published studies, pooling across cancer sites, considered to be above the “minimal” expected change.
whereas this study used individual patient data from archived
trials.
Statistical Analysis

Anchor-Based Methods
Methods Change scores of HRQOL scale and anchor pairs were computed
across all pairwise time points and combined to provide suffi-
Data Description cient data for examining clinically important changes. For exam-
ple, for a subject measured at time points ta, tb, and tc, change
Data were derived from two published phase III EORTC trials.
scores were computed between ta and tb; ta and tc; and tb and tc.
Trial 1 assessed the clinical benefit of a dose-intensive anthra-
Hence, a subject can contribute multiple change scores, and given
cycline-based regimen compared with standard treatment in
their change scores, subjects can contribute to multiple CCGs.
women with locally advanced breast cancer and enrolled 448
Only subjects with HRQOL and anchor data for a given pair of
patients (13). Trial 2 compared a combination of doxorubicin
time points contributed to the calculation of change scores. Data
and paclitaxel vs doxorubicin and cyclophosphamide as first-
from the two trials were pooled to estimate MIDs.
line chemotherapy in advanced (metastatic) breast cancer and
The mean change method was used to estimate MIDs for
enrolled 275 patients (14). Both trials assessed HRQOL using the
within-group change over time. MIDs for improvement and de-
QLQ-C30 at baseline, during treatment, and at several follow-up
terioration were computed as the mean HRQOL change scores
time points after the end of treatment.
for the improvement and deterioration CCGs, respectively. This
is relevant for interpreting change within a single group of
patients, and it is similar to the mean HRQOL change score over
The EORTC QLQ-C30
time for a treatment group in a trial. Effect sizes (ESs) were com-
The EORTC QLQ-C30 comprises 30 items, 24 of which are aggre- puted within each CCG by dividing the mean of the HRQOL
gated into 9 multi-item scales: 5 functioning scales (physical, change scores (derived from all the pairwise time point differen-
role, cognitive [CF], emotional, and social); 3 symptom scales ces) by the SD of the HRQOL change scores over all time points.
(fatigue, pain, and nausea and/or vomiting); and 1 global health- Only mean changes with an ES no less than 0.2 and less than
status scale. The remaining six single items assess symptoms 0.8 were considered appropriate for inclusion as MIDs. This was
of dyspnea, appetite loss (AP), sleep disturbance, constipation, based on Cohen’s (16) recommendations that an ES of 0.2 is
diarrhea, and financial impact. Both trials used version 2 of the small, 0.5 is moderate, and no less than 0.8 is large. The
J. Z. Musoro et al. | 3 of 7

Table 1. Baseline demographic and clinical characteristics of the patients by study (all patients had advanced breast cancer)

Characteristic Study 10921 No. (%) (N ¼ 448) Study 10961 No. (%) (N ¼ 275) Total (N ¼ 723)

Performance status
0 394 (87.9) 119 (43.3) 513 (71.0)
1 54 (12.1) 133 (48.4) 187 (25.9)
2 0 (0.0) 22 (8.0) 22 (3.0)
Unknown 0 (0.0) 1 (0.4) 1 (0.1)
Number of positive nodes
N0–N1 250 (55.8) 144 (52.4) 394 (54.5)
N2 176 (39.3) 26 (9.5) 202 (27.9)
N4þ 0 (0.0) 51 (18.5) 51 (7.1)
Nx 9 (2.0) 41 (14.9) 50 (6.9)
N3 13 (2.9) 13 (4.7) 26 (3.6)
Country

Downloaded from https://academic.oup.com/jncics/article/3/3/pkz037/5511405 by guest on 06 April 2022


France 97 (21.7) 41 (14.9) 138 (19.1)
Netherlands 41 (9.2) 42 (15.3) 83 (11.5)
United Kingdom 11 (2.5) 68 (24.7) 79 (10.9)
Poland 78 (17.4) 0 (0.0) 78 (10.8)
Belgium 48 (10.7) 29 (10.5) 77 (10.7)
Canada 68 (15.2) 0 (0.0) 68 (9.4)
Slovenia 22 (4.9) 26 (9.5) 48 (6.6)
Switzerland 28 (6.3) 8 (2.9) 36 (5.0)
Russia 27 (6.0) 0 (0.0) 27 (3.7)
Italy 0 (0.0) 18 (6.5) 18 (2.5)
Israel 0 (0.0) 16 (5.8) 16 (2.2)
South Africa 3 (0.7) 12 (4.4) 15 (2.1)
Portugal 13 (2.9) 0 (0.0) 13 (1.8)
Czech Republic 12 (2.7) 0 (0.0) 12 (1.7)
Spain 0 (0.0) 9 (3.3) 9 (1.2)
Austria 0 (0.0) 6 (2.2) 6 (0.8)
Age, y
Mean (SD) 50.07 (9.68) 52.27 (9.61) —
Range 26.0–79.0 28.0–70.0 —

rationale was that ESs less than 0.2 reflect changes that are clin- Both within-group and between-group MID estimates for a
ically unimportant, and those no less than 0.8 are obviously given HRQOL scale, from multiple anchors, were triangulated to
more than minimally important. The difference in change a single value via a correlation-based weighted average.
scores between the improvement (or deterioration) CCG and no
change CCG was compared using analysis of variance (ANOVA).
Distribution-Based Methods
A linear regression was used to estimate MIDs for differen-
The SEM, 0.2 SD, 0.3 SD, and 0.5 SD were applied to HRQOL
ces between groups in change over time. For a given HRQOL
scores at two time points common to both trials: 1) start of
scale and anchor pair, the outcome variable was the HRQOL
treatment (t1), time point before or on the first day of treatment,
change score, and the covariate was a binary anchor variable
and 2) end of treatment (t2), last day of protocol treatment.
(coded as stable ¼ 0 and improvement ¼ 1 when modeling im-
Test–retest reliability estimates to compute SEM for the QLQ-
provement [deteriorated observations were excluded], and
C30 were based on Hjermstad et al. (18). All analyses were per-
stable ¼ 0 and deterioration ¼ 1 when modeling deterioration
formed using the SAS software (19).
[improved observations were excluded]).
Because change scores were computed across all pairwise
time points, some patients contributed change scores to more
Results
than one CCG and more than one change score to a particular
CCG. We corrected for the association between multiple change Table 1 summarizes the demographic and clinical characteris-
scores contributed by the same patients by specifying a suitable tics of patients at baseline. The median follow-up time (in
covariance structure using the generalized estimating equa- months) for HRQOL was 5.3 (16.9) for trial 1 and 1.6 (2.8) for trial
tions (17). The slope parameters for the “improved” and 2. An overview of the flow of patients through this study is pre-
“deteriorated” covariates correspond to the MID for improve- sented in Supplementary Figure 1 (available online). Cross-
ment and deterioration, respectively. This approach is similar sectional correlations ranged from 0.20 to 0.62 in absolute value,
to comparing the mean HRQOL change score over time in a with a majority of the correlation coefficients being above the
treatment group to a control group in a trial, which is why these 0.30 threshold (7) (Table 2). Correlations between the change
MIDs are useful for interpreting changes over time between two scores ranged from 0.14 to 0.51. At least one suitable anchor
distinct groups of patients. Furthermore, we compared the two was constructed for 8 of the 14 QLQ-C30 scales that were consid-
trials by adding a “trial” effect in a linear regression model, sep- ered for this study. The distribution of patients and the number
arately for improving and deteriorating HRQOL scores. This was of change observations across the categories of suitable anchors
based on the data with PS as the anchor. are summarized in Supplementary Table 1 (available online).
4 of 7 | JNCI Cancer Spectrum, 2019, Vol. 3, No. 3

Table 2. Correlations over all time points of the EORTC QLQ-C30 scale scores with suitable anchors, and correlations between change scores of
the EORTC QLQ-C30 scales and anchors

Scores Change scores

Scale Anchor n1 (n1R)* Correlation n2 (n2R)* Correlation

PF Performance status 587 (2922) 0.52 548 (8508) 0.30


CTCAE fatigue 355 (2658) 0.30 343 (11102) 0.20
CTCAE vomiting 355 (2656) 0.30 343 (11077) 0.25
RF Performance status 587 (2922) 0.54 547 (8520) 0.20
SF Performance status 594 (2890) 0.34 545 (8390) 0.20
CTCAE fatigue 355 (2630) 0.21 340 (10984) 0.15
CTCAE vomiting 355 (2628) 0.25 340 (10959) 0.20
CF CTCAE fatigue 355 (2638) 0.20 342 (11032) 0.14
QL CTCAE vomiting 355 (2628) 0.39 341 (10892) 0.30

Downloaded from https://academic.oup.com/jncics/article/3/3/pkz037/5511405 by guest on 06 April 2022


CTCAE nausea 355 (2628) 0.39 341 (10892) 0.30
CTCAE alopecia 355 (2629) 0.39 341 (10914) 0.35
Performance status 585 (2893) 0.32 547 (8351) 0.25
FA Performance status 587 (2915) 0.40 546 (8476) 0.23
CTCAE nausea 355 (2644) 0.21 341 (11 014) 0.15
CTCAE vomiting 355 (2644) 0.22 341 (11 014) 0.16
NV CTCAE nausea 355 (2654) 0.60 343 (11 050) 0.51
CTCAE vomiting 355 (2654) 0.62 343 (11 050) 0.48
AP CTCAE nausea 355 (2621) 0.58 343 (10 816) 0.44
CTCAE vomiting 355 (2621) 0.59 343 (10 816) 0.48

*n1 (n1R) and n2 (n2R) can vary by anchor and EORTC QLQ-C30 scale. AP ¼ appetite loss; CF ¼ cognitive functioning; EORTC QLQ-C30 ¼ European Organisation for
Research and Treatment of Cancer Quality of Life Questionnaire core 30; FA ¼ fatigue; n1 ¼ number of patients with at least 1 matched EORTC QLQ-C30 and an anchor
form; n1R ¼ number of repeated anchor and HRQOL matched forms across all subjects; n2 ¼ number of patients with at least 2 matched EORTC QLQ-C30 and an anchor
form (at least 2 forms are needed to compute change scores); n2R ¼ number of repeated EORTC QLQ-C30 scale and anchor change scores across all subjects; NV ¼ nau-
sea and/or vomiting; PF ¼ physical functioning; QL ¼ global quality of life; RF ¼ role functioning; SF ¼ social functioning.

Table 3. Range of anchor-based MID estimates from the mean The MIDs varied according to the scale, direction of change
change method and linear regression scores (improvement vs deterioration), and anchor (Figure 1).
Estimates were always in the expected direction according to
Mean change method* Linear regression†
the anchor (ie, positive vs negative change scores within the im-
Scale Improvement Deterioration Improvement Deterioration provement vs deterioration CCG, respectively). Statistically sig-
nificant differences (ANOVA P < .05) were observed between the
PF 7 to 10 11 to 10 7 to 9 10 to 8 HRQOL change scores for all improvement and deterioration
RF No MID 6 No MID 4 CCGs vs no change CCG.
SF 7 to 9 9 to 5 6 to 7 11 to 5 MIDs for within-group change (based on the mean-change
CF 5 4 4 4 method) ranged from 5 to 14 points (improvement) and 14 to 4
QL 10 to 14 11 to 5 8 to 11  13 to 6 points (deterioration), and MIDs for between-group change (based
FA 8 9 to 7 8 8 to 6 on the linear regression) ranged from 4 to 11 points and from 18
NV No MID 12 No MID 14
to 4 points (Table 3). For the majority of the QLQ-C30 scales, the
AP No MID 14 No MID 18
estimated MIDs ranged from 4 to 10 points in absolute values.
*The mean change method is useful for interpreting within-group change over
Adding a trial effect to the regression models showed no statisti-
time. The symptom scores were reversed to follow the functioning scales inter- cally significant differences in change scores between the two tri-
pretation (ie, 0 represents the worst possible score and 100 the best possible als, hence, supporting the combination of the two trials.
score); “no MID” is used where no MID estimate is available either because of The MIDs in Table 3 are summarized to single MID values
the absence of a suitable anchor or ES was either <0.2 or 0.8. All of the ESs for per scale in Table 4 and ranged from 4 to 10 points in absolute
the no change group were <0.2. AP ¼ appetite loss; CF ¼ cognitive functioning;
values for most HRQOL scales. Table 4 also compares the
ES ¼ effect size; FA ¼ fatigue; MID ¼ minimally important difference; NV ¼ nau-
sea and/or vomiting; PF ¼ physical functioning; QL ¼ global quality of life; RF ¼
anchor-based estimates to the distribution-based estimates at
role functioning; SF ¼ social functioning. t1. The distribution-based estimates at t2 for each HRQOL scale
†The linear regression is useful for interpreting between-group differences in were similar to t1, mostly within a less-than-1-point range. All
change over time. anchor-based estimates were no less than 0.2 SD, with most
estimates being less than 0.5 SD. The anchor-based estimates
tended to be closer to both the 0.3 SD and the 1 SEM.

Table 3 shows the range of MIDs from the mean change


method (useful for interpreting within-group change over time)
Discussion
and the linear regression (useful for interpreting between-group
differences in change over time) for each HRQOL scale, This study examined MIDs for interpreting group-level change
across multiple anchors. Detailed results are presented in of EORTC QLQ-C30 scores over time in patients with advanced
Supplementary Table 2 (available online). breast cancer. Anchors for each HRQOL scale were selected
J. Z. Musoro et al. | 5 of 7

Downloaded from https://academic.oup.com/jncics/article/3/3/pkz037/5511405 by guest on 06 April 2022


Figure 1. Mean change and 95% confidence interval for improvement and deterioration EORTC QLQ-C30 scales, across multiple anchors and averaged across different
time periods. Estimates are available only for scales with at least 1 suitable anchor and with effect size 0.2 and <0.8 within the “deteriorate” and “improve” groups, re-
spectively. These mean change scores are useful for interpreting within-group change over time. AP ¼ appetite loss; CF ¼ cognitive functioning; CTCAE ¼ common ter-
minology criteria for adverse events; deteriorate ¼ worsened by 1 anchor category; EORTC QLQ-C30 ¼ European Organisation for Research and Treatment of Cancer
Quality of Life Questionnaire Core 30; FA ¼ fatigue; Improve ¼ improved by 1 category; NV ¼ nausea and/or vomiting; PF ¼ physical functioning; QL ¼ global quality of
life; RF ¼ role functioning; SF ¼ social functioning.

Table 4. Summary of anchor-based MIDs for within- and between-group changes compared with distribution-based estimates

Anchor-based MID for Anchor-based MID for between-groups Distribution-based


within-group change* difference in change* QL scores at t1 (n ¼ 415–425)

Scale Improvement Deterioration Improvement Deterioration 0.2 SD 0.3 SD 0.5 SD 1 SEM

PF 9 10 8 9 4.7 7.0 11.7 7.0


RF No MID 6 No MID 4 5.1 7.6 12.7 10.7
SF 8 7 7 8 5.3 7.9 13.1 9.5
CF 5 4 4 4 4.1 6.2 10.3 8.8
QL 12 8 10 10 4.9 7.3 12.2 10.3
FA 8 8 8 7 4.9 7.3 12.2 10.0
NV No MID 11 No MID 14 3.4 5.1 8.5 10.3
AP No MID 14 No MID 18 5.2 7.8 13.1 12.0

*The within-group MIDs (from the mean change method) and the between-group MIDs (from the linear regression) were summarized via weighted averages based on
scale and anchor pair correlation. The symptom scores were reversed to follow the functioning scales interpretation (ie, 0 represents the worst possible score and 100,
the best possible score); “no MID” is used where no MID estimate is available either because of the absence of a suitable anchor or ES was either <0.2 or 0.8. ¼ appetite
loss; CF ¼ cognitive functioning; ES ¼ effect size; FA ¼ fatigue; MID ¼ minimally important difference; n ¼ number of patients; NV ¼ nausea/vomiting; PF ¼ physical
functioning; QL ¼ global quality of life; RF ¼ role functioning; SF ¼ social functioning; t1 ¼ time points for the start of treatment.

based on both the correlation strength and the clinical plausibil- no systematic differences were observed in the magnitude of
ity. When available, multiple anchors were used per HRQOL change between deteriorating and improving scores. However,
scale to provide some reassurance about the plausibility of the other studies reported that estimates for deterioration tended
estimated MIDs. Despite the modest correlation between to be larger than those for improvement (6, 20).
anchors and scales change scores, most MID estimates from We distinguished between MIDs for interpreting the degree
multiple anchors were in a narrow range (often<5 points) and of change within a group (obtained from the mean change
were always in the expected direction according to the anchor method) and MIDs for interpreting the degree of differences be-
change category. tween groups in within-group change (obtained from linear re-
In agreement with recent findings (5–9), our estimates varied gression). Interestingly, estimates from both approaches were
by HRQOL scale and direction of change (improvement vs dete- often in the same range. For many scales, the MIDs were within
rioration). Similar to Maringwa et al. (8, 9). and Musoro et al. (7), the range of 5–10 points that was suggested by Osoba et al. (4)
6 of 7 | JNCI Cancer Spectrum, 2019, Vol. 3, No. 3

and also observed by Cocks et al. (5, 6), Musoro et al. (7), and interpreting QLQ-C30 change scores over time for all 15 scales
Maringwa et al. (8, 9). However, similar to Cocks et al. (5, 6), we using published results from multiple cancer sites. The MID val-
noticed that the thresholds for some scales were much lower. ues obtained for the eight scales considered in this study were
For example, MIDs of 4 points were observed for the CF scale. comparable to those presented by Cocks et al. (6). These increas-
Musoro et al. (7) also reported MIDs that were as low as 3 points ingly robust guidelines advocate a more nuanced approach to
for the CF scale in patients with malignant melanoma. On the clinical relevance beyond a single threshold.
other hand, similar to Musoro et al. (7), we observed a much big- In conclusion, our findings can help clinicians and research-
ger threshold of 18 points for the AP scale. This reinforces the ers interpret the clinical relevance of group-level change of
evidence that there is no single global standard for clinically QLQ-C30 scores over time in patients with advanced breast can-
meaningful change, and scale-specific MIDs should therefore be cer. The fact that MIDs can vary by QLQ-C30 scale and anchor
selected with more caution. suggests that we cannot rely on global standards for defining
Most often, investigators seeking MIDs would desire simple clinically meaningful change. Finally, our results will also in-
guidelines. However, as shown in this article, results are often form more accurate sample size calculations for clinical trials in
varied as a consequence of there being numerous anchors, vari- advanced breast cancer with endpoints that are based on

Downloaded from https://academic.oup.com/jncics/article/3/3/pkz037/5511405 by guest on 06 April 2022


ous distribution-based criteria, and multiple HRQOL scales. EORTC QLQ-C30 scales.
Results shown in Figure 1 and Table 3 represent this diversity
because the range of MIDs varies by the different anchors. We
acknowledge that end users may find such a range of options Funding
confusing. So, to provide a single MID value per scale, we further
This study was funded by the EORTC Quality of Life Group.
simplified this by calculating a correlation-weighted average
across multiple anchors. End users can choose to work with ei-
ther the ranges provided in Table 3 or the single values provided Notes
in Table 4, whichever they feel most comfortable with. Most of
the anchor-based estimates were closer to 0.3 SD and 1 SEM Affiliations of authors: European Organisation for Research
compared to the commonly reported 0.5 (21). and Treatment of Cancer, Brussels, Belgium (JZM, CC, AB);
A limitation of this study is that suitable anchors were not Department of Medical Oncology, University Hospital of
always available, hence, anchor-based MIDs could not be esti- Nı̂mes, France (FF); Institut de Recherche en Cance rologie de
mated for seven of the EORTC QLQ-C30 scales, which were Montpellier, France (FF); University of Montpellier, France
omitted in this article Furthermore, the available anchors (PS or (FF); Maria Sklodowska-Curie Institute-Oncology Center,
CTCAE grades) relied exclusively on clinical observations or Warsaw, Poland (PK); Breast Unit, Champalimaud Clinical
interpretations. Because the two trials that were used in this Centre, Champalimaud Foundation, Lisbon, Portugal (FC);
study evaluated chronic delivery of cytotoxic chemotherapy, Netherlands Cancer Institute, Amsterdam, The Netherlands
clinical anchors such as CTCAE nausea, CTCAE vomiting, and (NSR); University of Sydney, Faculty of Science, School of
CTCAE fatigue were reasonable and relevant. The availability of Psychology, Sydney, NSW, Australia (MTK); Department of
a pretreatment baseline assessment also allows detecting per- Health Sciences, University of York, York, UK (KC); Adelphi
sistent effects such as alopecia. However, such anchors might Values, Bollington, Cheshire, UK (KC); Department of Medical
not be relevant in other settings, treatments, or subtypes of Psychology, Amsterdam University Medical Centers,
breast cancer. The available anchors were also not necessarily Academic Medical Center, University of Amsterdam, Cancer
suitable in all situations. For example, although CTCAE fatigue Center Amsterdam, The Netherlands (MAS); Department of
met the requirements of a plausible clinical relationship with Public Health, University of Copenhagen, and Bispebjerg
the QLQ-C30 fatigue scale, the resulting correlations were too Hospital, Copenhagen, Denmark (MG); Leeds Institute of
low (<0.1) to be retained. The low correlation can be explained Cancer and Pathology, University of Leeds, St James’s
by the discrete nature of the CTCAE scale where only a few Hospital, Leeds, UK (GV); Clinic for Child and Adolescent
high-grade events were scored. Moreover, because of the sub- Psychiatry and Psychotherapy, University of Magdeburg,
jective nature of fatigue, there is likely also misrepresentation Magdeburg, Germany (H-HF).
by physicians compared to patients’ ratings as already reported We thank the EORTC breast disease group members and
by Basch et al. (22).This might also explain the potentially in- their clinical investigators and all the patients who participated
flated MID estimates for the AP scale. Also, anchors that are in the trials that we used for this analysis.
based on patients’ perspective of change (eg, subjective signifi- The authors declare no conflicts of interest.
cance questionnaires) were not available in our study.
Nonetheless, it is reassuring to notice the considerable overlap References
between our findings and those of Osoba et al. (4), which used
1. Bottomley A, Flechtner H, Efficace F. Health related quality of life outcomes
patients’ ratings of change as the anchor. Patients’ self- in cancer clinical trials. Eur J Cancer. 2005;41(12):1697–1709.
assessed ratings across the different QLQ-C30 scales and across 2. Schünemann HJ, Guyatt GH. Goodbye M(C)ID! Hello MID, where do you come
different disease sites are rarely available from retrospective from? Health Serv Res. 2005;40(2):593–597.
3. King MT. The interpretation of scores from the EORTC quality of life ques-
data sources and would need to be planned as future research
tionnaire QLQ-C30. Qual Life Res. 1996;5(6):555–567.
to complement our findings. 4. Osoba D, Rodrigues G, Myles J, et al. Interpreting the significance of changes
Another limitation is that our data originate from in health related quality-of-life scores. J Clin Oncol. 1998;16(1):139–144.
5. Cocks K, King MT, Velikova G, et al. Evidence-based guidelines for determina-
two controlled clinical trials, each with specific selection and
tion of sample size and interpretation of the European Organisation for the
treatment criteria. Although results were consistent between Research and Treatment of Cancer Quality of Life Questionnaire Core 30. J
the trials, extrapolation beyond their specific setting should be Clin Oncol. 2011;29(1):89–96.
made with caution. A number of articles are available that pro- 6. Cocks K, King MT, Velikova G, et al. Evidence-based guidelines for interpret-
ing change scores for the European Organisation for the Research and
vide general guidelines for selecting MIDs for the QLQ-C30 Treatment of Cancer Quality of Life Questionnaire Core 30. Eur J Cancer. 2012;
scales (5, 6, 11). For instance, Cocks et al. (6) published MIDs for 48(11):1713–1721.
J. Z. Musoro et al. | 7 of 7

7. Musoro ZJ, Bottomley A, Coens C, et al. Interpreting European Organisation 14. Biganzoli L, Cufer T, Bruning P, et al. Doxorubicin and paclitaxel
for Research and Treatment for Cancer Quality of Life Questionnaire core 30 versus doxorubicin and cyclophosphamide as first-line chemotherapy in
scores as minimally importantly different for patients with malignant mela- metastatic breast cancer: the European Organization for Research and
noma. Eur J Cancer. 2018;104(0):169–181. Treatment of Cancer 10961 Multicenter Phase III Trial. J Clin Oncol. 2002;
8. Maringwa JT, Quinten C, King M, et al. Minimal important differences for 20(14):3114–3121.
interpreting health-related quality of life scores from the EORTC QLQ-C30 in 15. Fayers P, Aaronson NK, Bjordal K, et al. EORTC QLQ-C30 Scoring Manual (Third
lung cancer patients participating in randomized controlled trials. Support edition). Brussels, Belgium; EORTC Quality of Life Group; 2001.
Care Cancer. 2011;19(11):1753–1760. 16. Cohen J. Statistical Power Analysis for the Behavioural Sciences (2nd Edition).
9. Maringwa J, Quinten C, King M, et al. Minimal clinically meaningful differen- Hillsdale, NJ: Lawrence Erlbaum Associates; 1988.
ces for the EORTC QLQ-C30 and EORTC QLQ-BN20 scales in brain cancer 17. Liang KY, Zeger SL. Regression analysis for correlated data. Annu Rev Public
patients. Ann Oncol. 2011;22(9):2107–2112. Health. 1993;14(1):43–68.
10. Revicki D, Hays RD, Cella D, Sloan J. Recommended methods for determining 18. Hjermstad MJ, Fossa SD, Bjordal K, Kaasa S. Test/retest study of the European
responsiveness and minimally important differences for patient-reported Organization for Research and Treatment of Cancer Core Quality of Life
outcomes. J Clin Epidemiol. 2008;61(2):102–109. Questionnaire. J Clin Oncol. 1995;13(5):1249–1254.
R V
11. Cocks K, King MT, Velikova G, et al. Quality, interpretation and presentation 19. Institute Inc. Base SAS 9.4 Procedures Guide. Cary, NC: SAS Institute Inc;
of European Organisation for Research and Treatment of Cancer Quality of 2013.
Life Questionnaire Core 30 data in randomised controlled trials. Eur J Cancer. 20. Ringash J, O’Sullivan B, Bezjak A, Redelmeier DA. Interpreting clinically
2008;44(13):1793–1798. significant changes in patient-reported outcomes. Cancer. 2007;110(1):
12. Musoro ZJ, Hamel J-F, Ediebah DE, et al. Establishing anchor-based minimally 196–202.

Downloaded from https://academic.oup.com/jncics/article/3/3/pkz037/5511405 by guest on 06 April 2022


important differences (MID) with the EORTC quality of life measures: a meta- 21. Ousmen A, Touraine C, Deliu N, et al. Distribution- and anchor-based meth-
analysis protocol. BMJ Open. 2018;8(1):e019117. ods to determine the minimally important difference on patient-reported
13. Therasse P, Mauriac L, Welnicka-Jaskiewicz M, et al. Final results of a ran- outcome questionnaires in oncology: a structured review. Health Qual Life
domized phase III trial comparing cyclophosphamide, epirubicin, and fluoro- Outcomes. 2018;16(1):228.
uracil with a dose-intensified epirubicin and cyclophosphamide þ filgrastim 22. Basch E, Dueck AC, Rogak LJ, et al. Feasibility assessment of patient reporting
as neoadjuvant treatment in locally advanced breast cancer: an EORTC- of symptomatic adverse events in multicenter cancer clinical trials. JAMA
NCIC-SAKK multicenter study. J Clin Oncol. 2003;21(5):843–850. Oncol. 2017;3(8):1043–1050.

You might also like