Minimally Important Differences For Interpreting
Minimally Important Differences For Interpreting
doi: 10.1093/jncics/pkz037
First published online June 4, 2019
Article
ARTICLE
Abstract
Background: We aimed to estimate the minimally important difference (MID) for interpreting group-level change over time,
both within a group and between groups, for the European Organisation for Research and Treatment of Cancer Quality of Life
Questionnaire core 30 (EORTC QLQ-C30) scores in patients with advanced breast cancer.
Methods: Data were derived from two published EORTC trials. Clinical anchors (eg, performance status [PS]) were selected
using correlation strength and clinical plausibility of their association with a particular QLQ-C30 scale. Three change status
groups were formed: deteriorated by one anchor category, improved by one anchor category, and no change. Patients with
greater anchor changes were excluded. The mean change method was used to estimate MIDs for within-group change, and
linear regression was used to estimate MIDs for between-group differences in change over time. For a given QLQ-C30 scale,
MID estimates from multiple anchors were triangulated to a single value via a correlation-based weighted average.
Results: MIDs varied by QLQ-C30 scale, direction (improvement vs deterioration), and anchor. MIDs for within-group change
ranged from 5 to 14 points (improvement) and 14 to 4 points (deterioration), and MIDs for between-group change over
time ranged from 4 to 11 points and from 18 to 4 points. Correlation-weighted MIDs for most QLQ-C30 scales ranged from
4 to 10 points in absolute values.
Conclusions: Our findings aid interpretation of changes in EORTC QLQ-C30 scores over time, both within and between groups,
and for performing more accurate sample size calculations for clinical trials in advanced breast cancer.
Patient-reported outcomes such as health-related quality of life differences in mean scores can be statistically significant, even
(HRQOL) are increasingly assessed as important endpoints in when clinical relevance is absent. The minimally important dif-
cancer clinical trials. As a result, there is growing interest to im- ference (MID) approach aids interpreting differences and
prove the interpretation of HRQOL data in cancer clinical trials changes in HRQOL scores as clinically meaningful (2–7). MID can
(1). It is recognized that interpreting HRQOL scores merely via be defined as the smallest change in a HRQOL score that is per-
statistical significance might be misleading because small ceived as “important” by a patient or by a third party (eg, a
Received: March 5, 2019; Revised: April 24, 2019; Accepted: May 20, 2019
© The Author(s) 2019. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/
licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
For commercial re-use, please contact [email protected].
1 of 7
2 of 7 | JNCI Cancer Spectrum, 2019, Vol. 3, No. 3
clinician), which may indicate a change in the patient’s man- QLQ-C30, with standard scoring applied to the scales (15). For
agement (2). consistency in signs, all scales were scored such that 0 repre-
MIDs are commonly estimated using anchor-based and sents the worst possible score and 100, the best possible score.
distribution-based methods (7). Anchor-based methods express Financial impact was omitted from the analysis because suit-
differences or change in HRQOL scores using other familiar vari- able anchors were not available.
ables that have clinical relevance (3,7–9) or to patient and/or
physician-derived ratings of change in the specific domain (4–
6). Distribution-based methods use the statistical distribution of Clinical Anchor
HRQOL scores (eg, SD criteria or SEM) and are considered as sup- Anchors were selected from variables that were available in the
portive evidence to anchor-based methods (10). trial datasets (eg, physician examinations and common termi-
This study focused on interpreting the European nology criteria for adverse events [CTCAE]). Anchors were se-
Organisation for Research and Treatment of Cancer Quality of lected for each HRQOL scale based on correlation strength.
Life Questionnaire core 30 (EORTC QLQ-C30) in patients with ad- Spearman rank, polyserial, or polychoric correlation was used,
vanced breast cancer. Guidelines for interpreting the QLQ-C30 depending on the distribution of the pair of variables. Anchors
Anchor-Based Methods
Methods Change scores of HRQOL scale and anchor pairs were computed
across all pairwise time points and combined to provide suffi-
Data Description cient data for examining clinically important changes. For exam-
ple, for a subject measured at time points ta, tb, and tc, change
Data were derived from two published phase III EORTC trials.
scores were computed between ta and tb; ta and tc; and tb and tc.
Trial 1 assessed the clinical benefit of a dose-intensive anthra-
Hence, a subject can contribute multiple change scores, and given
cycline-based regimen compared with standard treatment in
their change scores, subjects can contribute to multiple CCGs.
women with locally advanced breast cancer and enrolled 448
Only subjects with HRQOL and anchor data for a given pair of
patients (13). Trial 2 compared a combination of doxorubicin
time points contributed to the calculation of change scores. Data
and paclitaxel vs doxorubicin and cyclophosphamide as first-
from the two trials were pooled to estimate MIDs.
line chemotherapy in advanced (metastatic) breast cancer and
The mean change method was used to estimate MIDs for
enrolled 275 patients (14). Both trials assessed HRQOL using the
within-group change over time. MIDs for improvement and de-
QLQ-C30 at baseline, during treatment, and at several follow-up
terioration were computed as the mean HRQOL change scores
time points after the end of treatment.
for the improvement and deterioration CCGs, respectively. This
is relevant for interpreting change within a single group of
patients, and it is similar to the mean HRQOL change score over
The EORTC QLQ-C30
time for a treatment group in a trial. Effect sizes (ESs) were com-
The EORTC QLQ-C30 comprises 30 items, 24 of which are aggre- puted within each CCG by dividing the mean of the HRQOL
gated into 9 multi-item scales: 5 functioning scales (physical, change scores (derived from all the pairwise time point differen-
role, cognitive [CF], emotional, and social); 3 symptom scales ces) by the SD of the HRQOL change scores over all time points.
(fatigue, pain, and nausea and/or vomiting); and 1 global health- Only mean changes with an ES no less than 0.2 and less than
status scale. The remaining six single items assess symptoms 0.8 were considered appropriate for inclusion as MIDs. This was
of dyspnea, appetite loss (AP), sleep disturbance, constipation, based on Cohen’s (16) recommendations that an ES of 0.2 is
diarrhea, and financial impact. Both trials used version 2 of the small, 0.5 is moderate, and no less than 0.8 is large. The
J. Z. Musoro et al. | 3 of 7
Table 1. Baseline demographic and clinical characteristics of the patients by study (all patients had advanced breast cancer)
Characteristic Study 10921 No. (%) (N ¼ 448) Study 10961 No. (%) (N ¼ 275) Total (N ¼ 723)
Performance status
0 394 (87.9) 119 (43.3) 513 (71.0)
1 54 (12.1) 133 (48.4) 187 (25.9)
2 0 (0.0) 22 (8.0) 22 (3.0)
Unknown 0 (0.0) 1 (0.4) 1 (0.1)
Number of positive nodes
N0–N1 250 (55.8) 144 (52.4) 394 (54.5)
N2 176 (39.3) 26 (9.5) 202 (27.9)
N4þ 0 (0.0) 51 (18.5) 51 (7.1)
Nx 9 (2.0) 41 (14.9) 50 (6.9)
N3 13 (2.9) 13 (4.7) 26 (3.6)
Country
rationale was that ESs less than 0.2 reflect changes that are clin- Both within-group and between-group MID estimates for a
ically unimportant, and those no less than 0.8 are obviously given HRQOL scale, from multiple anchors, were triangulated to
more than minimally important. The difference in change a single value via a correlation-based weighted average.
scores between the improvement (or deterioration) CCG and no
change CCG was compared using analysis of variance (ANOVA).
Distribution-Based Methods
A linear regression was used to estimate MIDs for differen-
The SEM, 0.2 SD, 0.3 SD, and 0.5 SD were applied to HRQOL
ces between groups in change over time. For a given HRQOL
scores at two time points common to both trials: 1) start of
scale and anchor pair, the outcome variable was the HRQOL
treatment (t1), time point before or on the first day of treatment,
change score, and the covariate was a binary anchor variable
and 2) end of treatment (t2), last day of protocol treatment.
(coded as stable ¼ 0 and improvement ¼ 1 when modeling im-
Test–retest reliability estimates to compute SEM for the QLQ-
provement [deteriorated observations were excluded], and
C30 were based on Hjermstad et al. (18). All analyses were per-
stable ¼ 0 and deterioration ¼ 1 when modeling deterioration
formed using the SAS software (19).
[improved observations were excluded]).
Because change scores were computed across all pairwise
time points, some patients contributed change scores to more
Results
than one CCG and more than one change score to a particular
CCG. We corrected for the association between multiple change Table 1 summarizes the demographic and clinical characteris-
scores contributed by the same patients by specifying a suitable tics of patients at baseline. The median follow-up time (in
covariance structure using the generalized estimating equa- months) for HRQOL was 5.3 (16.9) for trial 1 and 1.6 (2.8) for trial
tions (17). The slope parameters for the “improved” and 2. An overview of the flow of patients through this study is pre-
“deteriorated” covariates correspond to the MID for improve- sented in Supplementary Figure 1 (available online). Cross-
ment and deterioration, respectively. This approach is similar sectional correlations ranged from 0.20 to 0.62 in absolute value,
to comparing the mean HRQOL change score over time in a with a majority of the correlation coefficients being above the
treatment group to a control group in a trial, which is why these 0.30 threshold (7) (Table 2). Correlations between the change
MIDs are useful for interpreting changes over time between two scores ranged from 0.14 to 0.51. At least one suitable anchor
distinct groups of patients. Furthermore, we compared the two was constructed for 8 of the 14 QLQ-C30 scales that were consid-
trials by adding a “trial” effect in a linear regression model, sep- ered for this study. The distribution of patients and the number
arately for improving and deteriorating HRQOL scores. This was of change observations across the categories of suitable anchors
based on the data with PS as the anchor. are summarized in Supplementary Table 1 (available online).
4 of 7 | JNCI Cancer Spectrum, 2019, Vol. 3, No. 3
Table 2. Correlations over all time points of the EORTC QLQ-C30 scale scores with suitable anchors, and correlations between change scores of
the EORTC QLQ-C30 scales and anchors
*n1 (n1R) and n2 (n2R) can vary by anchor and EORTC QLQ-C30 scale. AP ¼ appetite loss; CF ¼ cognitive functioning; EORTC QLQ-C30 ¼ European Organisation for
Research and Treatment of Cancer Quality of Life Questionnaire core 30; FA ¼ fatigue; n1 ¼ number of patients with at least 1 matched EORTC QLQ-C30 and an anchor
form; n1R ¼ number of repeated anchor and HRQOL matched forms across all subjects; n2 ¼ number of patients with at least 2 matched EORTC QLQ-C30 and an anchor
form (at least 2 forms are needed to compute change scores); n2R ¼ number of repeated EORTC QLQ-C30 scale and anchor change scores across all subjects; NV ¼ nau-
sea and/or vomiting; PF ¼ physical functioning; QL ¼ global quality of life; RF ¼ role functioning; SF ¼ social functioning.
Table 3. Range of anchor-based MID estimates from the mean The MIDs varied according to the scale, direction of change
change method and linear regression scores (improvement vs deterioration), and anchor (Figure 1).
Estimates were always in the expected direction according to
Mean change method* Linear regression†
the anchor (ie, positive vs negative change scores within the im-
Scale Improvement Deterioration Improvement Deterioration provement vs deterioration CCG, respectively). Statistically sig-
nificant differences (ANOVA P < .05) were observed between the
PF 7 to 10 11 to 10 7 to 9 10 to 8 HRQOL change scores for all improvement and deterioration
RF No MID 6 No MID 4 CCGs vs no change CCG.
SF 7 to 9 9 to 5 6 to 7 11 to 5 MIDs for within-group change (based on the mean-change
CF 5 4 4 4 method) ranged from 5 to 14 points (improvement) and 14 to 4
QL 10 to 14 11 to 5 8 to 11 13 to 6 points (deterioration), and MIDs for between-group change (based
FA 8 9 to 7 8 8 to 6 on the linear regression) ranged from 4 to 11 points and from 18
NV No MID 12 No MID 14
to 4 points (Table 3). For the majority of the QLQ-C30 scales, the
AP No MID 14 No MID 18
estimated MIDs ranged from 4 to 10 points in absolute values.
*The mean change method is useful for interpreting within-group change over
Adding a trial effect to the regression models showed no statisti-
time. The symptom scores were reversed to follow the functioning scales inter- cally significant differences in change scores between the two tri-
pretation (ie, 0 represents the worst possible score and 100 the best possible als, hence, supporting the combination of the two trials.
score); “no MID” is used where no MID estimate is available either because of The MIDs in Table 3 are summarized to single MID values
the absence of a suitable anchor or ES was either <0.2 or 0.8. All of the ESs for per scale in Table 4 and ranged from 4 to 10 points in absolute
the no change group were <0.2. AP ¼ appetite loss; CF ¼ cognitive functioning;
values for most HRQOL scales. Table 4 also compares the
ES ¼ effect size; FA ¼ fatigue; MID ¼ minimally important difference; NV ¼ nau-
sea and/or vomiting; PF ¼ physical functioning; QL ¼ global quality of life; RF ¼
anchor-based estimates to the distribution-based estimates at
role functioning; SF ¼ social functioning. t1. The distribution-based estimates at t2 for each HRQOL scale
†The linear regression is useful for interpreting between-group differences in were similar to t1, mostly within a less-than-1-point range. All
change over time. anchor-based estimates were no less than 0.2 SD, with most
estimates being less than 0.5 SD. The anchor-based estimates
tended to be closer to both the 0.3 SD and the 1 SEM.
Table 4. Summary of anchor-based MIDs for within- and between-group changes compared with distribution-based estimates
*The within-group MIDs (from the mean change method) and the between-group MIDs (from the linear regression) were summarized via weighted averages based on
scale and anchor pair correlation. The symptom scores were reversed to follow the functioning scales interpretation (ie, 0 represents the worst possible score and 100,
the best possible score); “no MID” is used where no MID estimate is available either because of the absence of a suitable anchor or ES was either <0.2 or 0.8. ¼ appetite
loss; CF ¼ cognitive functioning; ES ¼ effect size; FA ¼ fatigue; MID ¼ minimally important difference; n ¼ number of patients; NV ¼ nausea/vomiting; PF ¼ physical
functioning; QL ¼ global quality of life; RF ¼ role functioning; SF ¼ social functioning; t1 ¼ time points for the start of treatment.
based on both the correlation strength and the clinical plausibil- no systematic differences were observed in the magnitude of
ity. When available, multiple anchors were used per HRQOL change between deteriorating and improving scores. However,
scale to provide some reassurance about the plausibility of the other studies reported that estimates for deterioration tended
estimated MIDs. Despite the modest correlation between to be larger than those for improvement (6, 20).
anchors and scales change scores, most MID estimates from We distinguished between MIDs for interpreting the degree
multiple anchors were in a narrow range (often<5 points) and of change within a group (obtained from the mean change
were always in the expected direction according to the anchor method) and MIDs for interpreting the degree of differences be-
change category. tween groups in within-group change (obtained from linear re-
In agreement with recent findings (5–9), our estimates varied gression). Interestingly, estimates from both approaches were
by HRQOL scale and direction of change (improvement vs dete- often in the same range. For many scales, the MIDs were within
rioration). Similar to Maringwa et al. (8, 9). and Musoro et al. (7), the range of 5–10 points that was suggested by Osoba et al. (4)
6 of 7 | JNCI Cancer Spectrum, 2019, Vol. 3, No. 3
and also observed by Cocks et al. (5, 6), Musoro et al. (7), and interpreting QLQ-C30 change scores over time for all 15 scales
Maringwa et al. (8, 9). However, similar to Cocks et al. (5, 6), we using published results from multiple cancer sites. The MID val-
noticed that the thresholds for some scales were much lower. ues obtained for the eight scales considered in this study were
For example, MIDs of 4 points were observed for the CF scale. comparable to those presented by Cocks et al. (6). These increas-
Musoro et al. (7) also reported MIDs that were as low as 3 points ingly robust guidelines advocate a more nuanced approach to
for the CF scale in patients with malignant melanoma. On the clinical relevance beyond a single threshold.
other hand, similar to Musoro et al. (7), we observed a much big- In conclusion, our findings can help clinicians and research-
ger threshold of 18 points for the AP scale. This reinforces the ers interpret the clinical relevance of group-level change of
evidence that there is no single global standard for clinically QLQ-C30 scores over time in patients with advanced breast can-
meaningful change, and scale-specific MIDs should therefore be cer. The fact that MIDs can vary by QLQ-C30 scale and anchor
selected with more caution. suggests that we cannot rely on global standards for defining
Most often, investigators seeking MIDs would desire simple clinically meaningful change. Finally, our results will also in-
guidelines. However, as shown in this article, results are often form more accurate sample size calculations for clinical trials in
varied as a consequence of there being numerous anchors, vari- advanced breast cancer with endpoints that are based on
7. Musoro ZJ, Bottomley A, Coens C, et al. Interpreting European Organisation 14. Biganzoli L, Cufer T, Bruning P, et al. Doxorubicin and paclitaxel
for Research and Treatment for Cancer Quality of Life Questionnaire core 30 versus doxorubicin and cyclophosphamide as first-line chemotherapy in
scores as minimally importantly different for patients with malignant mela- metastatic breast cancer: the European Organization for Research and
noma. Eur J Cancer. 2018;104(0):169–181. Treatment of Cancer 10961 Multicenter Phase III Trial. J Clin Oncol. 2002;
8. Maringwa JT, Quinten C, King M, et al. Minimal important differences for 20(14):3114–3121.
interpreting health-related quality of life scores from the EORTC QLQ-C30 in 15. Fayers P, Aaronson NK, Bjordal K, et al. EORTC QLQ-C30 Scoring Manual (Third
lung cancer patients participating in randomized controlled trials. Support edition). Brussels, Belgium; EORTC Quality of Life Group; 2001.
Care Cancer. 2011;19(11):1753–1760. 16. Cohen J. Statistical Power Analysis for the Behavioural Sciences (2nd Edition).
9. Maringwa J, Quinten C, King M, et al. Minimal clinically meaningful differen- Hillsdale, NJ: Lawrence Erlbaum Associates; 1988.
ces for the EORTC QLQ-C30 and EORTC QLQ-BN20 scales in brain cancer 17. Liang KY, Zeger SL. Regression analysis for correlated data. Annu Rev Public
patients. Ann Oncol. 2011;22(9):2107–2112. Health. 1993;14(1):43–68.
10. Revicki D, Hays RD, Cella D, Sloan J. Recommended methods for determining 18. Hjermstad MJ, Fossa SD, Bjordal K, Kaasa S. Test/retest study of the European
responsiveness and minimally important differences for patient-reported Organization for Research and Treatment of Cancer Core Quality of Life
outcomes. J Clin Epidemiol. 2008;61(2):102–109. Questionnaire. J Clin Oncol. 1995;13(5):1249–1254.
R V
11. Cocks K, King MT, Velikova G, et al. Quality, interpretation and presentation 19. Institute Inc. Base SAS 9.4 Procedures Guide. Cary, NC: SAS Institute Inc;
of European Organisation for Research and Treatment of Cancer Quality of 2013.
Life Questionnaire Core 30 data in randomised controlled trials. Eur J Cancer. 20. Ringash J, O’Sullivan B, Bezjak A, Redelmeier DA. Interpreting clinically
2008;44(13):1793–1798. significant changes in patient-reported outcomes. Cancer. 2007;110(1):
12. Musoro ZJ, Hamel J-F, Ediebah DE, et al. Establishing anchor-based minimally 196–202.