Why Most Gene Expression Signatures of Tumors Have Not Been Useful in The Clinic
Why Most Gene Expression Signatures of Tumors Have Not Been Useful in The Clinic
CANCER
Why Most Gene Expression Signatures of Tumors Have Not Been Useful in the Clinic
Serge Koscielny
Published 13 January 2010; Volume 2 Issue 14 14ps2
Omics technologies are expected to enhance our understanding of a variety of diseases and to open the door to patient-specific personalized medicine. Despite the extensive literature on the use of gene expression arrays to predict prognosis in cancer patients, poor progress has been made in the translation of gene expression signatures for use in the clinics. Breast cancer provides a ripe arena for an analysis of why such signatures have failed to fulfill their promise.
TUMORS ARE HETEROGENEOUS This recurring statement is present in most clinical papers about DNA microarray analyses. One of the main arguments in the quest for genomic signatures is that patients with the same clinicopathological parameters can have markedly different clinical courses (4). This observation has evolved into the expectation that differences between tumors at the gene level should explain everything. Gene expression profiles currently are used to classify tumors according to two strategies: (i) unsupervised analyses that classify tumors according to gene expression and
Department of Clinical and Translational Research, Institute Gustave-Roussy, and Unit of Cancer Epidemiology (Unit 605), National Institute of Health and Medical Research, Villejuif, France. E-mail: [email protected]
www.ScienceTranslationalMedicine.org
The first reference in PubMed containing the words gene and microarray was published in 1995 (1). About 10 years elapsed between this first report of differential expression measurements of 45 Arabidopsis genes and the marketing of pangenomic cDNA microarrays that allow analysis of more than 40,000 genes simultaneously. As of December 2009, more than 28,000 peer-reviewed articles containing these two words can be found in PubMed. The cost of microarray technology has declined, and genomics is considered to be the most mature of all the omic technologies (2). These and all future related technologies strive to enhance our knowledge in a variety of arenas and to open the door to patient-specific personalized medicine. Before dreaming of new arrays of opportunities (3), however, one should critically consider how useful gene expression signatures have been in the clinics. For such an analysis, I use breast cancer as an example.
(ii) supervised analyses that classify tumors according to clinical characteristics. Unsupervised classifications are based on clustering algorithms. In the case of breast cancer, this strategy was used by Perou et al. (5), who identified four main categories: luminal, basal-like, normal-like, and HER2-enriched tumors. These four categories were initially defined according to the expression of a so-called intrinsic gene set that comprised 496 genes (5). In subsequent publications, the number of genes rose to 1300 (6), then dropped to 37 (7), and, finally, the same team proposed recently that these four categories be defined according to the expression of 50 genes (8). The names of the four categories are used by clinicians to classify tumors. However, instead of using gene expression profiles, clinicians use clinical surrogates based on hormone receptor expression, HER2 status, and immunohistochemical staining of tumor cells with specific antibodies. There are two reasons not to use gene expression to classify tumors. First, the resources required to acquire these measures (people, analytical devices, money) are not available; and second, if they were, none of the algorithms used to define tumor categories using gene expression data have been published and thus none are publicly available. Because the four tumor classification categories are currently used by physicians, one may argue that the gene expression clustering algorithms have been useful in the clinics. However, this classification has been successful largely because an approximate coincidence exists between the computer-generated gene clusters in the four tumor types and tumor classification based on preexisting clinical knowledge, such as HER2-enriched tumors, which are mostly
tumors with a HER2-positive status, or basal-like tumors (also named triple-negative tumors), which have negative hormone receptor and HER2 statuses. If this overlap with the clinical classification had not existed, the gene cluster names would not have been used. And given the overlap, why choose a complicated, time-consuming, and expensive method that requires the acquisition and analysis of complex gene expression data instead of a simple one based on clinical parameters that are routinely available. The four tumor categories differ in terms of their prognosis and response to treatments; these observations confirm what we knew decades agothat the hormone receptor and HER2 statuses of tumors are important prognostic and predictive factors in breast cancer. Supervised classifications answer specific questions. For instance, what is the difference between patients who relapse and those who do not? This question was first addressed in the pioneering work on node-negative breast cancers performed by Vant Veer et al. (9). The method used was remarkably simple. Gene expression was assessed in tumor tissue from 34 patients who relapsed during the 5 years after initial treatment and from 44 patients who did not. The expression level of each of the genes in patients who relapsed was compared to that of the same gene in patients who did not. Genes were ranked according to the P value of the test that compared their expression levels in the two groups. Researchers selected the 70 genes with the smallest P valuesand thus the strongest differences between the two groups. The 70-gene signature corresponded to the mean expression values of these 70 genes in the group of patients who did not relapse. They then used this signature to define a rule by which to classify patients: Individuals for whom the correlation between the good 70-gene signature and tumor expression of the same 70 genes was >0.4 were classified as good prognosis patients, and the other patients were classified as poor prognosis cases. This simplistic analytical strategy, developed in 2002, is still the cream of the crop in genomic analyses, and more complicated strategies have not outperformed it. In 2008, Haibe-Kains et al. (10) showed in a validation setting that complex models are not better prognosis predictors than simpler ones. What we have learned from all of these supervised classifications is that pro-
PERSPECTIVE
liferation ability of tumor cells is a common denominator of many existing prognostic gene signatures (10, 11). Again, this realization confirms what we knew decades ago, that the ability of tumors to proliferate is an important prognostic and predictive factor in breast cancer. These successive validations illustrate the impact of inadequate validations on overestimation of the performance of the signature. The 70-gene signature is probably the only one that has been so extensively validated, and this is also remarkable. The gene microarray literature is polluted with many signatures that have inadequate validation or even no validation at all (Fig. 1). This is not acceptable for many reasons: (i) the findings will not be reproduced, (ii) the studies are referenced by many scientists and so the research problems addressed in the studies are believed to be solved, (iii) inadequate information is never removed from the databases, and (iv) thus, the total number of signatures is artificially increased and the proportion of potentially useful ones decreased. The time of clinicians who are expected to use genomic signatures is too precious to be wasted on judging the relevance and quality of gene microarray signaturesthis validation must be done by scientists, so that robust signatures can be delivered to the clinics. The number of published papers with inadequate validation casts doubt over the complete body of gene microarray literature, and these papers should be expunged from bibliographic databases. I suggest that an adaptable dictionary of published gene expression signatures CLINICAL TRIALS ARE THE BEST WAY TO TEST CLINICAL UTILITY Two clinical trials have been launched to test the clinical usefulness of two prognostic signatures that currently are being used by thousands of physicians in many countries all over the world to identify patients with a low risk of relapse. These patients are expected to derive no survival benefit from chemotherapy, while being exposed to the serious side effects. In Europe, the MINDACT (Microarray In Node-negative and 1 to 3 positive lymph node Disease may Avoid ChemoTherapy) trial is a randomized study designed to compare the ability of MammaPrint (Agendia, Amsterdam, the Netherlands)the commercial version of the 70-gene signatureto identify women with a low risk of relapse with that of a clinico-pathological classification procedure (17). The TAILORx trial [Trial Assigning Individualized Options for Treatment (Rx)] was launched in the United States in 2006. In TAILORx, chemotherapy is assigned or randomized according to the recurrence score (RS) estimated with the Oncotype DX test (Genomic Health, Redwood, CA): Patients with an RS less than or equal to 10 do not receive chemotherapy, patients with an RS above 25 receive chemotherapy systematically, and patients with an intermediate RS between 11 and 25 are randomized between chemotherapy and no chemotherapy groups (18). The 10-year results of these two trials will not be available before the year 2020. Let us assume that MammaPrint ends up being better than the clinico-pathological classification. Studies comparing the 70-gene signature results and the 21-gene prognostic scorewhich forms the basis of Oncotype DXhave shown that the overall concordance is 82% (19). Among the patients with an intermediate RS score, who constitute the very population in which the decision to give chemotherapy is in question, about 50% will have a good prognosis and 50% a poor prognosis according to MammaPrint (20, 21). These patients with an intermediate RS score represent 37% of the Oncotype DX target population (22). If TAILORx shows that patients with an intermediate RS score have to be treated with chemotherapy, then if these patients are treated according to Mammaprint, half will not receive chemotherapy. Conversely, if TAILORx shows that these patients should not be treated with chemotherapy, then if these patients are treated according to Mammaprint, half
13 January 2010 Vol 2 Issue 14 14ps2 2
GAUGING GENE SIGNATURE PERFORMANCE A crucial step in the translation of gene signatures is validation in a clinical setting (12, 13). Ideally, validation of an experimental gene signature should be performed in an independent patient population, by an independent research team. The original validation of Vant Veers signature was published in the same article that defined the signature (9) and was performed in 19 patients from the same patient population but who were not included in experiments that yielded the data that defined the signature. Seventeen of the 19 patients (89%) had an accurate prognosis prediction. However, this excellent result was not matched in subsequent validations. Van de Vijver et al. (14) studied a consecutive series of 295 breast cancer patients, including both nodepositive and node-negative individuals. A major criticism of this validation was that it included 61 node-negative patients who had participated in the original study by Vant Veer et al. (15). This lack of independence between the two populations led to an overestimation of the performance of the signature that became perceptible in subsequent evaluations. When the signature was evaluated in 180 patients who were not involved in the original study and for whom one knew whether they had had a metastasis within the first 5 years of follow-up, the sensitivity (that is, the proportion of patients classified as high-risk with the 70-gene sig- Fig. 1. Too much in, nothing out. The gene microarray liternature among the patients who ature is polluted with many gene expression signatures that relapsed) was equal to 93% (95% have inadequate validation or no validation at all. confidence interval: 8199%); and the specificity (that is, the proportion of patients correctly classified as be created that contains a critical analysis of low-risk with the 70-gene signature among every declared possible use of each signathe patients who did not relapse) was only ture, as well as comments on their statisti53% (4461%). Similar results were ob- cal and clinical validity. A key bottleneck to tained in 307 node-negative breast cancer such a project is the ability to guarantee that patients, with a sensitivity of 90% (7895%) those in charge of the critical analyses have and a specificity of 42% (3648%) (16). no real or perceived conflicts of interest.
CREDIT: C. BICKEL/SCIENCE
www.ScienceTranslationalMedicine.org
PERSPECTIVE
will receive chemotherapy. This means that for 37% of patients, the treatment will be defined by which genomic test is used. CONCLUSIONS Gene microarrays have brought little progress to the clinical management of cancer since Shena et al.s 1995 publication (1). Vant Veer et al. (9) gave us a proof-ofconcept when they showed that the gene microarray information could be used to predict the prognosis. Unfortunately, these predictions of prognosis are not very accurate and have not improved since 2002. This state of affairs is extremely disappointing given the potential of the technology. We still do not know how to read the messages within the genome. New technologies are on the horizon and competition between these and gene microarrays might be the end of the latter (23). However, these new technologies will generate increasingly large databases that will be more and more difficult to analyze. The field urgently needs a breakthrough in the way we analyze such data, or we will end up with a collection of data sufficient to explain everything but unable to predict anything. REFERENCES AND NOTES
1. M. Schena, D. Shalon, R. W. Davis, P. O. Brown, Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270, 467470 (1995). 2. O. Gevaert, B. De Moor, Prediction of cancer outcome using DNA microarray technology: Past, present and future. Expert Opin. Med. Diagnost. 3, 157165 (2009). 3. L. Flintoft, Milestone 21 (1995). The microarray revolution: An array of opportunities. Nat. Rev. Genet. 6, S18 S18 (2005). 4. C. Sotiriou, M. J. Piccart, Taking gene-expression proling to the clinic: When will molecular signatures become relevant to patient care? Nat. Rev. Cancer 7, 545553 (2007). 5. C. M. Perou, T. Srlie, M. B. Eisen, M. van de Rijn, S. S. Jeffrey, C. A. Rees, J. R. Pollack, D. T. Ross, H. Johnsen, L. A. Akslen, O. Fluge, A. Pergamenschikov, C. Williams, S. X. Zhu, P. E. Lnning, A. L. Brresen-Dale, P. O. Brown, D. Botstein, Molecular portraits of human breast tumours. Nature 406, 747752 (2000). 6. Z. Hu, C. Fan, D. S. Oh, J. S. Marron, X. He, B. F. Qaqish, C. Livasy, L. A. Carey, E. Reynolds, L. Dressler, A. Nobel, J. Parker, M. G. Ewend, L. R. Sawyer, J. Wu, Y. Liu, R. Nanda, M. Tretiakova, A. Orrico, D. Dreher, J. P. Palazzo, L. Perreard, E. Nelson, M. Mone, H. Hansen, M. Mullins, J. F. Quackenbush, M. J. Ellis, O. I. Olopade, P. S. Bernard, C. M. Perou, The molecular portraits of breast tumors are conserved across microarray platforms. BMC Genomics 7, 96 (2006). 7. L. Perreard, C. Fan, J. F. Quackenbush, M. Mullins, N. P. Gauthier, E. Nelson, M. Mone, H. Hansen, S. S. Buys, K. Rasmussen, A. R. Orrico, D. Dreher, R. Walters, J. Parker, Z. Hu, X. He, J. P. Palazzo, O. I. Olopade, A. Szabo, C. M. Perou, P. S. Bernard, Classication and risk stratication of invasive breast carcinomas using a real-time quantitative RT-PCR assay. Breast Cancer Res. 8, R23 (2006). 8. J. S. Parker, M. Mullins, M. C. Cheang, S. Leung, D. Voduc, T. Vickery, S. Davies, C. Fauron, X. He, Z. Hu, J. F. Quackenbush, I. J. Stijleman, J. Palazzo, J. S. Marron, A. B. Nobel, E. Mardis, T. O. Nielsen, M. J. Ellis, C. M. Perou, P. S. Bernard, Supervised risk predictor of breast cancer based on intrinsic subtypes. J. Clin. Oncol. 27, 11601167 (2009). 9. L. J. Van t Veer, H. Dai, M. J. van de Vijver, Y. D. He, A. A. Hart, M. Mao, H. L. Peterse, K. van der Kooy, M. J. Marton, A. T. Witteveen, G. J. Schreiber, R. M. Kerkhoven, C. Roberts, P. S. Linsley, R. Bernards, S. H. Friend, Gene expression proling predicts clinical outcome of breast cancer. Nature 415, 530536 (2002). 10. B. Haibe-Kains, C. Desmedt, C. Sotiriou, G. Bontempi, A comparative study of survival models for breast cancer prognostication based on microarray data: Does a single gene beat them all? Bioinformatics 24, 22002208 (2008). 11. P. Wirapati, C. Sotiriou, S. Kunkel, P. Farmer, S. Pradervand, B. Haibe-Kains, C. Desmedt, M. Ignatiadis, T. Sengstag, F. Schtz, D. R. Goldstein, M. Piccart, M. Delorenzi, Metaanalysis of gene expression proles in breast cancer: Toward a unied understanding of breast cancer subtyping and prognosis signatures. Breast Cancer Res. 10, R65 (2008). 12. A. Dupuy, R. M. Simon, Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. J. Natl. Cancer Inst. 99, 147157 (2007). 13. S. Michiels, S. Koscielny, C. Hill, Prediction of cancer outcome with microarrays: A multiple random validation strategy. Lancet 365, 488492 (2005). 14. M. J. van de Vijver, Y. D. He, L. J. vant Veer, H. Dai, A. A. Hart, D. W. Voskuil, G. J. Schreiber, J. L. Peterse, C. Roberts, M. J. Marton, M. Parrish, D. Atsma, A. Witteveen, A. Glas, L. Delahaye, T. van der Velde, H. Bartelink, S. Rodenhuis, E. T. Rutgers, S. H. Friend, R. Bernards, A gene-expression signature as a predictor of survival in breast cancer. N. Engl. J. Med. 347, 19992009 (2002). 15. D. F. Ransoho, Rules of evidence for cancer molecularmarker discovery and validation. Nat. Rev. Cancer 4, 309314 (2004). 16. M. Buyse, S. Loi, L. Vant Veer, G. Viale, M. Delorenzi, A. M. Glas, M. S. dAssignies, J. Bergh, R. Lidereau, P. Ellis, A. Harris, J. Bogaerts, P. Therasse, A. Floore, M. Amakrane, F. Piette, E. Rutgers, C. Sotiriou, F. Cardoso, M. J. PiccartTRANSBIG Consortium, Validation and clinical utility of a 70-gene prognostic signature for women with nodenegative breast cancer. J. Natl. Cancer Inst. 98, 11831192 (2006). 17. J. Bogaerts, F. Cardoso, M. Buyse, S. Braga, S. Loi, J. A. Harrison, J. Bines, S. Mook, N. Decker, P. Ravdin, P. Therasse, E. Rutgers, L. J. van t Veer, M. PiccartTRANSBIG consortium, Gene signature evaluation as a prognostic tool: Challenges in the design of the MINDACT trial. Nat. Clin. Pract. Oncol. 3, 540551 (2006). 18. J. A. Sparano, TAILORx: Trial assigning individualized options for treatment (Rx). Clin. Breast Cancer 7, 347350 (2006). 19. C. Fan, D. S. Oh, L. Wessels, B. Weigelt, D. S. Nuyten, A. B. Nobel, L. J. Vant Veer, C. M. Perou, Concordance among gene-expression-based predictors for breast cancer. N. Engl. J. Med. 355, 560569 (2006). 20. Supplemental Table 1, http://content.nejm.org/cgi/data/ 355/6/560/DC1/1. 21. S. Koscielny, Critical review of microarray-based prognostic tests and trials in breast cancer. Curr. Opin. Obstet. Gynecol. 20, 4750 (2008). 22. G. Palmer, J. Vaughn, D. J. Schneider, B. Haack, The distribution of recurrence scores in Europe and Middle East (EME) compared with the US. Eur. J. Cancer 7, 148 (2009). 23. H. Ledford, The death of microarrays? Nature 455, 847 (2008). 24. I thank E. Benhamou, J.-M. Guinebretire, C. Hill, and V. Ribrag for their critical comments and suggestions, and L. Saint-Ange for editing.
10.1126/scitranslmed.3000313 Citation: S. Koscielny, Why most gene expression signatures of tumors have not been useful in the clinic. Sci. Transl. Med. 2, 14ps2 (2010).
www.ScienceTranslationalMedicine.org