On Statistical Relationship between ADRA2A ... - Springer Link

4 downloads 0 Views 375KB Size Report
Aug 4, 2014 - close to that made using OncotypeDX and MammaPrint test systems. In this case, addition of only one or two supplemental prognostic markers ...
454

Bulletin of Experimental Biology and Medicine, Vol. 157, No. 4, August, 2014 GENERAL PATHOLOGY AND PATHOPHYSIOLOGY

On Statistical Relationship between ADRA2A Expression and the Risk of Breast Cancer Relapse M. Yu. Shkurnikov, V. V. Galatenko*, A. E. Lebedev**, V. E. Podol’skii**, E. A. Tonevitskii**, and D. V. Mal’tseva** Translated from Byulleten’ Eksperimental’noi Biologii i Meditsiny, Vol. 157, No. 4, pp. 450-453, April, 2014 Original article submitted October 3, 2013 The search for novel parameters to predict the risk of relapse in breast cancer was conducted. Significant correlation between the risk of relapse and α-2A adrenergic receptor (ADRA2A) expression was revealed using public microarray datasets. This relationship was confirmed by validation on independent microarray dataset. It was found that when assessing the risk of BC relapse, the accuracy of prediction based solely on the expression of ADRA2A gene is close to that made using OncotypeDX and MammaPrint test systems. In this case, addition of only one or two supplemental prognostic markers (for instance, expression of SQLE gene or SQLE and DSCC1genes) to ADRA2A ensures the accuracy of prediction not inferior to reliability of these test systems. Key Words: gene expression; breast cancer; risk of relapse; α-2A adrenergic receptor Breast cancer (BC) is a heterogeneous disease. Molecular classification of BC proposed by Perou [10] is based on genetic analysis and divides BC into several biological tumor subtypes. The most common luminal A subtype is found in 30-45% BC cases. These tumors are estrogen-dependent, and, as a result, have high sensitivity to hormone therapy (tamoxifen, aromatase inhibitors) [9]. Moreover, luminal A tumors are characterized by the absence of Her-2/neu overexpression and show low proliferative activity (Ki-67 expression is less than 14%). This group is characterized by low relapse rates and high overall survival rate in comparison with other groups [15]. However, despite rather favorable biological characteristics of luminal A subtype, the patients of this group can develop lymphogenous and hematogenous dissemination and different outcomes are possible. Administration of adjuvant polychemotherapy (PCT) to patients with luminal type A BC stages I and

II is one of the most controversial issues in the treatment of the luminal form of BC. Nowadays adjuvant PCT is recommended to most women with BC stages I and II, although this therapy prolongs the 10-year BC-free interval not in all the patients [8]. The accuracy of predicting algorithms based on the “classical factors” can be significantly adjusted using molecular diagnostic test system based on the evaluation of mRNA expression in the tumor sample (OncotypeDX, MammaPrint, etc.). At the same time, predictive accuracy of these methods is still in doubt. Thus, the diagnostic sensitivity and specificity of OncotypeDX test system is 70-80% and 60-65%, respectively [4]. The aim of our study was the search for novel parameters to predict the risk for development of metastasis in BC.

Research Institute of General Pathology and Pathophysiology, Russian Academy of Medical Sciences; *M. V. Lomonosov Moscow State University; **Scientific and Technical Center BioKlinikum, Moscow, Russia. Address for correspondence: [email protected]. V. V. Galatenko

To search for informative indicators predicting the risk for development of metastasis, commercial microarray GSE17705, GSE6532, and GSE12093 kits, the results of genome-wide transcriptome analysis of biopsies by

MATERIALS AND METHODS

0007-4888/14/15740454 © 2014 Springer Science+Business Media New York

M. Yu. Shkurnikov, V. V. Galatenko, et al.

Human Genome U133A array (Affymetrix) were used according to manufacturer’s protocol. Selected microarray data were co-pretreated, and then the expression of samples included in the microarrays sets was evaluated. Microarray data were preprocessed and the expression was evaluated using implementation of RMA [6], as in [13]. Statistical analysis of the logarithms of expressions was performed using Bioconductor limma package [11]. Modification of the Student’s t test (moderated t-statistics [12]) was used. The Benjamini–Hochberg algorithm was applied for multiple testing adjustment. DAVID bioinformatic resources were applied for functional analysis of multiple differentially expressed genes [5]. Dataset GSE17705 was used as training sample. It included two groups of data: group 1, patients without relapse for 7 years after surgery; group 2, patients with relapse within 5 years after mastectomy. The remaining data forming an intermediate (“gray”) area were excluded from the analysis. Number of specimens was 159 and 42 in groups 1 and 2, respectively. Testing samples was composed of GSE6532 and GSE12093 sets. To facilitate interpretations of indicators of prognostic reliability obtained on the basis of selected genetic markers, “gray area” (patients with relapse occurring in more than 5 years after surgery and patients without documented relapse, but followup less than 7 years) were also excluded from the test sample. Patients with stage III of disease and ERnegative or PGR-negative subjects were also excluded. Information on the occurrence of relapse and the interval between the surgery and relapse for GSE6532 set was taken from the fields dmfs (distant metastasis free survival) – t.dmfs and e.dfms. The total number of specimens in the test sample was 31 for relapse group and 106 for relapse-free group.

455 For validation of prognostic value of revealed genes using validation set containing from 1 to 3 genes based on the training sample (the source of the genes), we adjusted the classificator like this one: Σαjl_exprj+γ (the summation runs over all sets of genes in the tested set, l_expj is expression of the j-th set of genes in a logarithmic scale, classification result is determined by the sign of the expression), which is then applied to the data from the test sample. To adjust the classifier (i.e. to select parameter values αj and βj), LIBSVM was realized [3] using soft margin support vector machine (C_SVM classification; linear kernel; different penalty coefficients are 6 and 1 for different groups, respectively).

RESULTS Despite the similarity of compared groups (specimens from patients with the same diagnosis, divided into groups based on the presence or absence of relapse within 5-7 years after sampling), statistical analysis of the test sample revealed genes, which expression levels were significantly related to the fact of relapse. The most significant relationship was found for ADRA2A gene (α-2A adrenergic receptor): adjusted p=1.3×10– 5 . MSRA, DSCC1, and SQLE genes (p=2.4×10–4 adjusted for all) are more than one order of magnitude inferior to the adjusted p value. Adjusted p value did not exceed 0.001 for more 13 genes (E2F8, CDKN3, CCNE2, CTTN, S100P, CX3CR1, WDR67, CEP55, NCAPG, DLGAP5, PRC1, LRP12, and RNF139). The total number of genes for which the adjusted p value did not exceed 0.01 was 179. Functional analysis of this set of genes identified a significant number of oversaturated functional groups including mitosis

Fig. 1. Expression of ADRA2A gene in the specimens of the training sample. a) Mean values and standard deviations of the logarithms of expression levels; b) normalized logarithm values of expression levels (normalization is subtracting the mean and dividing by the standard deviation, where in the mean and standard deviation are calculated for all specimens of training sample).

Bulletin of Experimental Biology and Medicine, Vol. 157, No. 4, August, 2014 GENERAL PATHOLOGY AND PATHOPHYSIOLOGY

456

Fig. 2. Result of applying of classifier using data on the expression of only ADRA2A gene (a, b) and expression of ADRA2A, SQLE, and DSCC1 genes (c, d) and adjusted for the training sample, to the test sample. a, b) Values of classifier; (c, d) Kaplan–Meier curves for patients in high risk-group (lower curve) and low-risk group (upper curve).

(19 genes, p=1.9×10–8), nuclear division (19 genes, p=1.9×10–8), cell cycle (23 genes, p=2.5×10–8), and acetylation (59 genes, p=7.0×10–9), etc. Clustering of the identified groups showed that the most oversaturated cluster (Enrichment score [5] 8.81) was associated with cell division and cell cycle (constituent categories were cell cycle, mitosis, nuclear fission, chromosome segregation, etc.). The results describing the relationship of expression levels of given gene with cancer pathologies in-

cluding BC are known for a considerable part of the identified most differentially expressed genes. At the same time, such results are currently rare for ADRA2A gene, identified by conducted analysis as a gene associated with most significantly higher risk of relapse (p=1.3×10–5; Fig. 1) [1,14]. To validate the prognostic value of ADRA2A for a set consisting only of this gene, the classifier was adjusted for the training sample dividing the patients into groups with high and low risk for relapse, which was then applied

TABLE 1. Results of Applying of Classifiers Adjusted for the Training Sample, to the Test Sample Set of genes

Sensitivity, %

Specificity, %

Percent of relapse in high-risk group, %

Percent of relapse in low-risk group, %

ADRA2A

74.2

53.8

87.7

31.9

ADRA2A, MSRA

80.6

54.7

90.6

34.2

ADRA2A, DSCC1

80.6

52.8

90.3

33.3

ADRA2A, SQLE

83.9

56.6

92.3

36.1

ADRA2A, SQLE, DSCC1

83.9

58.5

92.5

37.1

M. Yu. Shkurnikov, V. V. Galatenko, et al.

457

Fig. 3. Result of the use of the classifiers adjusted for the training sample, to the test sample. a, b) Classifier using data on the expression of ADRA2A and MSRA genes; c, d) classifier using data on the expression of ADRA2A and DSCC1 genes; (e, f) classifier using data on the expression of ADRA2A and SQLE data. a, c, e) Values of classifiers; b, d, f) Kaplan–Meier curves for patients assigned by classifier to high risk-group (lower curve) and low-risk group (upper curve).

to the data from the test sample (See Materials and Methods). A similar procedure of validation was applied to the sets comprising, along with ADRA2A, one of the MSRA, DSCC1, and SQLE genes, which expression was also significantly associated with the risk of relapse according to the results of the statistical analysis of the test sample. Furthermore, this procedure was applied to the set comprising three genes, ADRA2A, SQLE, and DSCC1. The results of

application of built classifiers are presented in Table 1 and in Figs. 2 and 3. The validation showed that when assessing the risk of BC relapse, the accuracy of prediction based solely on expression of ADRA2A gene is close to that made using OncotypeDX and MammaPrint test systems [2,4]. And adding of only one or two additional prognostic markers (for example, expression levels of SQLE gene or SQLE and DSCC1 genes) to ADRA2A

458

Bulletin of Experimental Biology and Medicine, Vol. 157, No. 4, August, 2014 GENERAL PATHOLOGY AND PATHOPHYSIOLOGY

ensures the accuracy of prediction not inferior to reliability of these test systems. The main result of the study is the identifying of significant correlation between the risk of relapse in BC and ADRA2A expression levels and validation of the result based on the analysis of publicly available microarray data. We assume in the future to conduct additional validation by real-time PCR using a specially selected set of appropriate reference genes to improve the reliability [7]. Furthermore, when assessing the risk of BC relapse, prognostic reliability not inferior to OncotypeDX and MammaPrint test systems may be provided by classifiers using data on the expression levels of only three or two genes (ADRA2A, SQLE). The study also suggests the obtaining new meaningful results based on the analysis of publicly available data in the case when objective and/or methods of analysis were significantly different from those that were used in the primary study providing analyzed data. The work was supported by the Federal Target Program Research and Development in Priority Fields of Scientific and Technological Complex of Russia for 2007-2013 (“Development of Methods and Algorithms of Individual Genotyping on a Set of Intellectual Activity Results without Performing the Complete Assembly of the Genome”; State Contract No. 14.514.11.4025 from August 10, 2012) and “Development of Multiparameter Diagnostic Test Systems for the Detection of Individual Molecular Genetic Features of Breast Cancer” (State Contract No. 16.522.11.2004 from March 26, 2012), by the Government of the Russian Federation (Contract No. 11.G34.31.0054 from

November 01, 2011) and Russian Foundation for Basic Research (grant No. 11-01-00354-a).

REFERENCES 1. A. Bruzzone, C. P. Pinero, P. Rojas, et al., Curr. Cancer Drug. Targets, 11, No. 6, 763-774 (2011). 2. M. Buyse, S. Loi, L. van’t Veer, et al., J. Natl. Cancer. Inst., 98, No. 17, 1183-1192 (2006). 3. C. C. Chang and C. J. Lin, ACM Trans. Intell. Syst. and Techn, 2, No. 3, doi: 10.1145/1961189.1961199 (2011). 4. Evaluation of Genomic Applications in Practice and Prevention (EGAPP) Working Group. Recommendations from the EGAPP Working Group: can tumor gene expression profiling improve outcomes in patients with breast cancer? Genet. Med., 11, No. 1, 66-73 (2009). 5. W. Da Huang, B. T. Sherman, and R. A. Lempicki, Nat. Protoc., 4, No. 1, 44-57 (2009). 6. R. A. Irizarry, D. Hobbs, F. Collin, et al., Biostatistics, 4, No. 2, 249-264 (2003). 7. D. V. Maltseva, N. A. Khaustova, N. N. Fedotov, et al., J. Clin. Bioinforma, 3, doi: 10.1186/2043-9113-3-13 (2013). 8. O. Metzger-Filho, Z. Sun, G. Viale, et al., J. Clin. Oncol., 31, No. 25, 3083-3090 (2013). 9. J. S. Parker, M. Mullins, M. C. Cheang, et al., J. Clin. Oncol., 27, No. 8, 1160-1167 (2009). 10. C. M. Perou, T. Sorlie, M. B. Eisen, et al., Nature, 406, No. 6797, 747-752 (2000). 11. G. K. Smyth, Bioinformatics and Computational Biology Solutions Using R and Bioconductor, Eds. R.Gentlemen, et al., New York, (2005), pp. 397-420. 12. G. K. Smyth, Stat. Appl. Genet. Mol. Biol., 3, No. 1, doi: 10.2202/1544-6115.1027 (2004). 13. A. G. Tonevitsky, D. V. Maltseva, A. Abbasi, et al., BMC Physiology, 13, doi: 10.1186/1472-6793-13-9 (2013). 14. S. M. Vazquez, A. G. Mladovan, C. Perez, et al., Cancer Chemother. Pharmacol., 58, No. 1, 50-61(2006). 15. B. Weigelt, A. Mackay, R. A’hern, et al., Lancet Oncol., 11, No. 4, 339-349 (2010).