Application of artificial neural networks to link ... - Future Medicine

3 downloads 120 Views 2MB Size Report
among factors linked to DNA methylation in CRC. Keywords: APC • artificial neural networks • CDKN2A • colorectal cancer. • DNA methylation • folate • MGMT, ...
Research Article For reprint orders, please contact: [email protected]

Application of artificial neural networks to link genetic and environmental factors to DNA methylation in colorectal cancer

Aims: We applied artificial neural networks (ANNs) to understand the connections among polymorphisms of genes involved in folate metabolism, clinico-pathological features and promoter methylation levels of MLH1, APC, CDKN2AINK4A , MGMT and RASSF1A in 83 sporadic colorectal cancer (CRC) tissues, and to link dietary and lifestyle factors with gene promoter methylation. Materials & Methods: Promoter methylation was assessed by means of methylation-sensitive high-resolution melting and genotyping by PCR-RFLP technique. Data were analyzed with the Auto Contractive Map, a special kind of ANN able to define the strength of the association of each variable with all the others and to visually show the map of the main connections. Results: We observed a strong connection between the low methylation levels of the five CRC genes and the MTR 2756AA genotype. Several other connections were revealed, including those between dietary and lifestyle factors and the methylation levels of CRC genes. Conclusion: ANNs revealed the complexity of the interconnections among factors linked to DNA methylation in CRC. Keywords:  APC • artificial neural networks • CDKN2A • colorectal cancer • DNA methylation • folate • MGMT, MLH1, polymorphisms • RASSF1A

Colorectal cancer (CRC) evolves through a stepwise accumulation of mutations and epigenetic modifications that transform normal colonic cells into cancer [1,2] . Among epigenetic mechanisms, DNA methylation has gained particular interest in cancer studies because it was linked to the silencing of tumor suppressor genes and DNA repair genes [3] . Several genes are frequently hypermethylated in sporadic CRC, including MLH1, APC, CDKN2A, MGMT and RASSF1A [4–14] . Common polymorphisms of folate metabolic genes have been largely investigated as CRC genetic risk factors, mainly because folate metabolism (one-carbon metabolism) in required for DNA synthesis and methylation (Figure 1), but literature data in this field are conflicting and often insufficient to clarify their contribution to DNA methylation and CRC risk [3] . This is likely due to the complexity of this metabolic pathway (Figure 1) , and to the fact that traditional

10.2217/EPI.14.77 © 2015 Future Medicine Ltd

statistical algorithms are often unsuitable to dissect the relationship between high number of variables due to the nonlinearity and complexity of their interactions [3] . In addition to genetic factors, dietary habits and lifestyles, such as drinking and smoking, are among the environmental factors suspected to impair DNA methylation [15–17] . In the present pilot study we applied Artificial Neural Networks (ANNs) to identify genetic and dietary/lifestyle factors linked to MLH1, APC, CDKN2A INK4A, MGMT and RASSF1A promoter methylation in sporadic CRC. ANNs aim to understand natural processes and recreate those processes using automated models, and have been used successfully in gastroenterology and cancer studies to understand nonlinear relationships among variables  [18–20] . Particularly, we applied the Auto Contractive Map-Auto-CM algorithm (Auto-CM), which is a peculiar ANN able to define the strength of the associations of

Epigenomics (2015) 7(2), 175–186

Fabio Coppedè*,1,2,3, Enzo Grossi4,5, Angela Lopomo1,6, Roberto Spisni7, Massimo Buscema5,8 & Lucia Migliore1,2,3 1 Department of Translational Research & New Technologies in Medicine & Surgery, Division of Medical Genetics, University of Pisa, Medical School, Via Roma 55, 56126 Pisa, Italy 2 Istituto Toscano Tumori (ITT), Florence, Italy 3 Interdepartmental Research Center Nutrafood ‘Nutraceuticals & Food for Health’, Pisa, Italy 4 Bracco Foundation, Milan, Italy 5 Semeion Research Center, Rome, Italy 6 Doctoral School in Genetics, Oncology & Clinical Medicine, University of Siena, Siena, Italy 7 Department of Surgery, Medical, Molecular & Critical Area Pathology, University of Pisa, Pisa, Italy 8 Department of Mathematical & Statistical Sciences, University of Colorado at Denver, CO, USA *Author for correspondence: Tel.: +39 050 221 8544 fabio.coppede@ med.unipi.it

part of

ISSN 1750-1911

175

Research Article   Coppedè, Grossi, Lopomo, Spisni, Buscema & Migliore

Cell membrane Blood folates RFC CH3

CH3

Unmethylated DNA

CGCGCGCG

CGCGCGCG

Methylated DNA DNMT SAM

SAH Hcy

MTR

B12 Met

MTRR 5-MTHF

dUMP

DHF

10-formyITHF

THF TYMS

MTHFR Purine synthesis

dTMP

5,10-methyleneTHF

Pyrimidine synthesis Figure 1. Overview of the folate metabolic pathway. The figure shows the pivotal role of folate metabolism in intracellular processes. Folates provide the one-carbon units required for the synthesis of DNA and RNA precursors necessary for DNA replication and repair, the synthesis of amino-acids and the synthesis of SAM, required for methylation reactions. Cofactor – B12.  5-MTHF: 5-methyltetrahydrofolate; 5,10-MTHF: 5,10-methylenetetrahydrofolate; B12: Vitamin B12; DHF: Dihydrofolate; dTMP: Deoxythymidine monophosphate; dUMP: Deoxyuridine monophosphate; Hcy: Homocysteine; Met: Methionine; SAH: S-adenosylhomocysteine; SAM: S-adenosylmethionine; THF: Tetrahydrofolate. 

each variable with all the others and to visually show the map of the main connections of the variables and the basic semantic of their ensemble [21,22] . Auto-CM has been previously and successfully applied to several medical datasets, some examples include its application to detect the main connections between folate metabolism and chromosome damage [23] , Alzheimer’s disease  [24] or brain atrophy [25] , and to reveal genetic risk factors of small effect for sporadic amyotrophic lateral sclerosis [26] , the connections between immunological markers in multiple sclerosis [27] , or the factors linked to bone mineral density in patients with Type 1 diabetes [28] , among others. To the best of our knowledge ANNs have not been previously applied to investigate the relationship among genetic or environmental factors and DNA methylation in CRC. Therefore, we applied ANNs to a previously described CRC dataset [9] , in order to investigate

176

Epigenomics (2015) 7(2)

whether this mathematical approach can increase our knowledge on the connections among common polymorphisms of major enzymes involved in the folate metabolic pathway (MTHFR; MTR; MTRR; TYMS; RFC1; DNMTs), age, gender, tumor stage and location and promoter methylation levels of MLH1, APC, CDKN2A INK4A, MGMT and RASSF1A in sporadic CRC tissues. Moreover, in a subset of the samples with available data, we applied ANNs to link dietary and lifestyle factors with gene promoter methylation. Materials & methods Study population

The analysis was based on 83 sporadic CRC cases for whom all the following information was available from a previously described dataset [9] : CRC stage according to the TNM system; age; gender; CRC location (right colon, left colon, or sigma/rectum); genotype

future science group

Artificial intelligence & DNA methylation in colorectal cancer 

for the MTHFR 677C>T polymorphism (rs1801133); genotype for the MTHFR 1298A>C polymorphism (rs1801131); genotype for the MTRR 66A>G polymorphism (rs1801394); genotype for the MTR 2756A>G polymorphism (rs1805087); genotype for the SLC19A1 (RFC1) 80G>A polymorphism (rs1051266); genotype for the TYMS 28-bp repeats polymorphism (rs34743033); genotype for the TYMS 1494 6-bp ins/del polymorphism (rs34489327); genotype for the DNMT3B -149C>T polymorphism (rs2424913); genotype for the DNMT3B -579G>T polymorphism (rs1569686); promoter methylation levels of the APC gene; promoter methylation levels of the MLH1 gene; promoter methylation levels of the CDKN2A INK4A gene; promoter methylation levels of the MGMT gene; and promoter methylation levels of the RASSF1A gene. As detailed elsewhere [9] CRC diagnosis was performed by medical doctors at the Department of Surgery, Medical, Molecular and Critical Area Pathology, University of Pisa, that provided tissue specimens. Staging was assessed after pathological examination of the specimens based on TNM classification. Family history of CRC was ascertained and all the subjects included in the present study had no family history of the disease [9] . The distribution of the studied variables among CRC subjects is shown in Table 1. All the samples were coded and data were processed in blind by operators. Written informed consent for inclusion in the database was collected from each patient. The study was performed in accordance with the Declaration of Helsinki and approved by the Ethics Committee of the Pisa University Hospital. Genetic & epigenetic data collection

The genotypes for all the nine studied polymorphisms and the promoter methylation status of APC, MLH1, CDKN2A INK4A, MGMT and RASSF1A have been previously obtained by means of validated PCR/RFLP and Methylation Sensitive-High Resolution Melting (MS-HRM) techniques, respectively, as detailed elsewhere  [9] . Particularly, for the DNA methylation analysis we applied MS-HRM protocols which have been checked for inter- and intra-assay variability, and validated by means of pyrosequencing [9,29] . For the promoter methylation data we set a cut-off value of 10% promoter methylation for considering each gene to be methylated. Indeed, samples showing 0% or less than 10% promoter methylation were considered to be hypomethylated, and samples showing >10% promoter methylation were considered to be methylated. The cut-off value was chosen based on recent findings by us showing that only promoter methylation levels >10% were linked to MLH1 gene silencing resulting in absence of MLH1 immunohistochemical nuclear

future science group

Research Article

staining in CRC tissues, and that methylation levels below 10% were often observed also in the healthy colonic mucosa for most of the five studied genes [9,29] . Similarly, seasonal fluctuations in the range below 10% of promoter methylation were reported in the blood of healthy donors for RASSF1A and MGMT genes [30] . Collection of data concerning dietary habits & lifestyles

In a subgroup of 37 patients (22 males and 15 females, mean age 65 ± 13.6 years) the following information on dietary/lifestyle factors was available: weekly consumption of green vegetables, legumes, potatoes, carrots, red meat (including processed meat) and white meat; weekly drinking of red or white wine; coffee consumption; smoking habits. We used self-reported questionnaires to assess the average food intake over the period preceding CRC diagnosis. Particularly, participants were asked once to fill in a questionnaire for the weekly frequency of certain food consumption according to commonly used units or portion size. Values ranged from 0 (never or less than once per month) to 14 (2 portions every day) in the case of foods, and from 0 (never or less than once per month) to 28 (4 cups of coffee or 4 glasses of wine every day) in the case of drinks. For what concerns other alcoholic beverages than wine, patients reported not to be weekly consumers. Furthermore, individuals taking vitamins, drugs, or supplements known to interfere with circulating levels of folate were not included in the database [9] . Semantic connectivity map

In order to graphically show the most important links among variables we used an artificial adaptive system called Auto Contractive Map-Auto-CM algorithm  [21,22] , a special kind of ANN that developes weights which are proportional to the strength of associations of all variables each-other. The weights are then transformed in physical distances so that couples of variables whose connection weights are higher become nearer and vice versa. After the training phase, the weights matrix of the Auto-CM represents the warped landscape of the dataset. Subsequently, a simple filter to the weights matrix of the Auto-CM system, represented by the Minimum Spanning Tree (MST), was applied to obtain a map of the main connections between the variables of the dataset and the basic semantic of their similarities, defined connectivity map as detailed elsewhere [21,22] . We also generated the Maximally Regular Graph (MRG), which is the graph of highest complexity among all the graphs generated during calculation of the MST. The MRG includes a number of new cyclic microstructures of fundamental links between the variables that were

www.futuremedicine.com

177

178

Epigenomics (2015) 7(2)

Methylation level >10%. For three patients the 28bp repeats were >3R. CL: Left colon; CR: Right colon; SD: Standard deviation; SR: Sigma and rectal region; TNM: Staging system based on the extent of the tumor (T), the extent of spread to the lymph nodes (N), and the presence of metastasis (M). ‡



RFC 80G>A: GG (32.5%), GA (48.2%), AA (19.3%) MTHFR 677C>T: CC (37.3%), CT (44.6%), TT (18.1%) MTHFR 1298 A>C: AA (51.8%), AC (38.6%), CC (9.6%) MTRR 66A>G: AA (43.4%), AG (40.9%), GG (15.7%) MTR 2756A>G: AA (69.9%), AG (30.1%), GG (0.0%) TYMS 1494 6bp ins/del: +/+ (32.5%), +/- (51.8%), - /- (15.7%) TYMS 28bp repeats: 3R3R (34.9%), 3R2R (44.6%), 2R2R (16.9%) ‡ DNMT3B -149C>T: CC (38.6%), CT (49.4%), TT (12.0%) DNMT3B -579G>T: GG (44.6%), GT (45.8%), TT (9.6%) CR : 42 CL: 16 SR: 25         Adenoma: 7 I: 14 II: 28 III: 23 IV: 11         M: 47 F: 36               70.14 ±13.62                 83                

APC: 29 (34.9%) MGMT: 30 (36.1%) CDKN2A: 18 (21.7%) hMLH1: 7 (8.4%) RASSF1A: 13 (15.6%)  

Polymorphisms Location Methylated genes† Stage (TNM) Gender Age (Mean ±SD) Total subjects

Table 1. Demographic characteristics of the study population and distribution of genotypes.

Research Article   Coppedè, Grossi, Lopomo, Spisni, Buscema & Migliore previously eliminated from the MST, representing the intrinsic complexity of the dataset. The theory behind MRG was previously detailed by us [26] . Data were stratified into different classes for gender (males and females), adenomas or TNM stage (adenoma, stage I, stage II, stage III and stage IV), CRC location (right colon, left colon, sigma-rectum), age (70 years), smoking habits (smokers, ex-smokers, never-smokers), meat cooking procedures (undercooked, medium-cooked and wellcooked) and % of DNA methylation of each gene (10%). Genotyping data were coded using a binary number system previously detailed by us [23] . Concerning numerical variables, such as weekly portions of meat and vegetables and weekly glasses/cups of wine and coffee, each variable was scaled from 0 to 1 in the database. For example the variable red meat consumption has natural values ranging from 0 (no weekly consumption) to 14 (two portions of red meat every day). According to the transformation, 0 (the lowest value) becomes 0 and 14 (the highest value) becomes 1. All the other values are scaled in this new range. For example, a value of 7 (a daily portion) becomes 0.5, and so on. The projection of the variable in the map shows the position of red meat consumption according to its high values, and this preprocessing scaling is necessary to make possible a proportional comparison among all the variables and is commonly applied to elaborate data in ANNs analyses [24,25,27] . Buscema designed and developed the AutoCM softwares (AutoCM - Auto Contractive Map, Semeion software #46, version 6.0; Modular Auto-Associative ANN, Semeion software #51, version 18.1). Results The semantic connectivity map obtained with AutoCM from the whole dataset of patients is shown in Figure 2. Variables showing the maximal amount of connections with other variables are called ‘hubs.’ A numerical value is applied to each edge of the graph. This value derives from the original weight developed by Auto-CM during the training phase scaled from 0 (not connected) to 1 (highly connected), and is proportional to the strength of the connections between two variables. Results clearly indicate a strong connection among low methylation levels (< 10%) of all the five studied genes (strength of the associations ‘s.a.’ ranging from 0.99 to 1.0), that is further highlighted by the Maximally Regular Graph (Figure 3) . In addition, the semantic connectivity map shows that the wildtype (wt) genotype for MTR 2756A>G is highly connected to RASSF1A low methylation levels (s.a. = 1.0) (Figure 2) , and to the low methylation levels of all the five genes under investigation (Figure 3) .

future science group

Artificial intelligence & DNA methylation in colorectal cancer 

Research Article

TYMS_28bp_>3R hMLH1 meth> 10%

DNMT3B_579G>T_het

DNMT3B_149C>T_het

age>70 0.98

MGMT meth>10%

1.00 DNMT3B_149C>T_wt 0.97

0.97

MTHFR_677C>T_het 0.93 0.97

TYMS_6bp_+/0.98 MGMT meth< 10%

male

0.97 RFC1_80G>A_wt 0.98

Adenoma 0.77

0.96

0.97

TYMS_28 bp_2R2R 0.88

MTRR_66A>G_wt

age 60_70

CDKN2A meth> 10% APC meth > 10%

0.92

0.88

1.00

0.84

Left colon

TMN stage I

0.59

0.92

MTRR_66A>G_mut

Right colon 0.99

0.99

0.99 1.00

CDKN2A methA_het 0.94

Age< 60

TMN stage III MTHFR_1298A>C_wtTYMS_28bp_3R3R 0.96 MTR_2756A>G wt 0.89 0.97 0.85 0.98 0.96 Sigma/Rectum 0.96 MTHFR_1298A>C_het RFC1_80G>A_mut MTRR_66A>G_het 0.95 0.94 MTHFR_677C>T_wt Female 0.87 TYMS_6bp_+/+ TMN stage II TMN stage IV TYMS_6bp_-/0.89 MTHFR_677C>T_mut 0.90 MTHFR_1298A>C_mut

DNMT3B_149C>T_mut 0.96 DNMT3B_579G>T_mut

Figure 2. Minimum spanning tree showing a map of the main connections between the variables of the dataset. Semantic connectivity map obtained with Auto-Cm system. The numbers on the arches of the graph refer to the strength of the association between two adjacent nodes. The range of this value is from 0 (not linked) to 1 (highly linked). 

Concerning MLH1 promoter methylation, high methylation levels (>10%) are linked to high CDKN2A INK4A promoter methylation and right colon location (Figure 2), while low MLH1 methylation levels, apart from being linked to the low methylation profiles of all the four other genes and the MTR 2756A>G wt genotype (Figure 3), are highly linked to male gender (s. a. = 0.99) as well as to certain RFC1 or TYMS 28bp repeat genotypes (Figure 2) . Similarly, CDKN2A INK4A hypermethylation is linked to MLH1 hypermethylation and right colon location (Figure 2), and low methylation levels with the low methylation levels of the other four genes and the MTR 2756A>G wt genotype (Figure 3) . In addition, some of the genotypes generated by the TYMS 28bp repeat polymorphism are also linked to CDKN2A INK4A methylation levels (Figure 2) . Low methylation levels of APC are linked to right colon location (s.a. = 0.99) (Figure 2) , low methylation levels of the other four genes, and the MTR 2756A>G wt genotype (Figure 3) , while APC hypermethylation is clustered in a set of variables, including certain DNMT3B and MTRR genotypes, and connected to advanced age (>70 years) (Figure 2) . Within this cluster of variables, increasing age (>70 years) and adenoma stage are also connected with high MGMT promoter methylation (Figure 2) . By contrast, low MGMT promoter methylation levels are found in younger individuals (60–70 years) and linked to the central cluster characterized by low methylation pro-

future science group

files of the studied genes and the MTR 2756A>G wt genotype (Figure 3) . Promoter methylation levels of RASSF1A are highly connected to the MTR 2756A>G genotype, with the wild-type genotype linked to low methylation, and the heterozygous one to hypermethylation (Figure 2) . In order to confirm the observed connection between the MTR 2756A>G genotype and low methylation levels of the five genes under investigation we compared the mean number of hypermethylated genes in carriers of the MTR 2756AA genotype with that observed in carriers of the MTR 2756AG one (no subject in our cohort was a carrier of the rare MTR 2756GG genotype), and the difference was statistically significant (Figure 4) . When the analysis was restricted to the subset of individuals with known dietary and lifestyle habits we still obtained a central cluster characterized by highly connected low methylation profiles of the five studied genes (Figures 5 & 6) . Particularly, the MST shown in Figure 5 revealed a cluster characterized by hypermethylation of CDKN2A, MLH1 and MGMT genes, and MGMT hypermethylation was connected to the non smoking condition and the consumption of undercooked meat. We then observed a central cluster characterized by low methylation levels of the five genes and, within this cluster, RASSF1A hypomethylation was connected to the nonsmoking status, MLH1 hypomethylation with both coffee and white meat consumption, and APC hypomethylation with

www.futuremedicine.com

179

Research Article   Coppedè, Grossi, Lopomo, Spisni, Buscema & Migliore

TYMS_28bp_>3R hMLH H1 meth> 10% % hMLH1

DNMT3B_579G>T_het DNMT M 3B_579G>T_het

TMN stage I

MTRR_66A>G_mut MTHFR_677C>T_het ett

DNMT3B_149C>T_het D DNMT 3B_149C>T_het

TYMS_6bp_+/-

TYMS_28 _2 bp_2R2R

MGMT methA_wt

R RASSF1A meth>10%

Right colon

CDKN2A meth 10%

MGMT meth>10%

MTR_2756A>G het TYMS_28 bp_2R3R

hMLH1 methG wt w

3 3R MTHFR_1298A>C_wt TYMS_28 bp_3R3R

M MTRR _66A>G_h het MTRR_66A>G_het MTHF M R_677C>T_wt MTHFR_677C>T_wt

TYMS_6bp_+/+ TMN stage IV

AgeA_h het RFC1_80G>A_het

DNMT3B_149C>T_wt _wt wt

MTHFR_1298A>C_het MTHF M R_1298A>C_het

Left colon

TMN stage III

Sigma/rectum RFC1_80G>A_mut Female le TMN stage II

TYMS_6bp_-/-

MTHFR_677C>T_mut MTHFR_1298A>C_mut

DNMT D 3B_149C>T_mut DNMT3B_149C>T_mut

DNMT3B_579G>T_mut

Figure 3. Maximally regular graph representing the connections between DNA methylation and genetic polymorphisms. Semantic connectivity map obtained with Auto-Cm system.  

white meat consumption. Externally from this central cluster, RASSF1A hypermethylation was connected to well-cooked meat consumption, and APC hypermethylation was clustered in a set of variables characterized by high green vegetables, legumes, carrots and red wine consumption (Figure 5) . However, when we generated the MRG most of the environmental factors, and particularly nonsmoking status, white meat and potatoes consumption, and with a lesser extent also red meat,

Number of methylated genes

2.0

P = 0.03

1.8 1.6 1.4 1.2 1.0 0.8 AA

AG

MTR 2756A>G genotype Figure 4. Mean level of methylated genes according to MTR 2756A>G genotype.  The figure shows the comparison of the mean number of methylated genes between carriers of the MTR 2756AA genotype and carriers of the MTR 2756AG one. Student’s T test analysis revealed a significant reduced mean number of methylated genes in carriers of the MTR 2756AA genotype (p = 0.03).

180

Epigenomics (2015) 7(2)

green vegetables, legumes and carrots consumption, showed a complex network of connections with the low promoter methylation profiles of the five studied genes (Figure 6) . Discussion To the best of our knowledge, the present study represents the first attempt to apply artificial intelligence systems to understand the complexity of the interconnections among several factors contributing to the promoter methylation levels of five key CRC genes. One of the most interesting observations of the study is the strong connection among the low methylation profiles of the five studied genes and the MTR 2756A>G wt (AA) genotype (Figure 3) . The MTR gene codes for methionine synthase, the protein required for the production of methionine that, once converted to S-adenosylmethionine (SAM), serves as the methyl donor compound for DNA methylation reactions [3] . Several previous studies have linked the MTR 2756A>G polymorphism with increased CRC risk, particularly among smokers and heavy drinkers [31] . In addition, it has been recently observed that healthy donor carriers of the MTR 2756GG genotype have higher levels of global DNA methylation in blood cells than MTR 2756AA individuals [32] . We previously observed association of the MTR 2756AA genotype with RASSF1A hypomethylation in CRC tissues and with APC hypomethylation in the healthy colonic mucosa of CRC patients [9] . The present analysis performed with ANNs revealed a more complex and tight connection among the MTR 2756AA genotype and hypomethylation

future science group

Artificial intelligence & DNA methylation in colorectal cancer 

Research Article

Under-cooked meat 0.88 hMLH1 meth>10% 0.81 MGMT meth>10% 0.91 0.88 CDKN2A meth>10%

White wine APC meth10% White wine

MGMT meth> 10% APC meth< 10%

Non smoker

MGMT meth 10%

Smoker Medium-cooked meat

RASSF1A methT wild type and MTHFR 1298A>C wild type genotypes are linked to RASSF1A hypomethylation, while hypermethylation is linked to the MTR 2756AG genotype (Figure 2) . A correlation between the MTR 2756A>G polymorphism and RASSF1A methylation levels was observed also by means of traditional statistical approaches [9] . Hypermethylation of RASSF1 is believed to be involved in invasion and metastasis [14] . Indeed, it was not linked to the hypermethylation of the other genes, such as APC, CDKN2A INK4A, MLH1 or MGMT (Figure 2) , that often occur in the earliest phases of colorectal carcinogenesis. Concerning lifestyle and dietary factors we observed a complex and intricate network of interactions among several factors (Figures 5 & 6) . Particularly, despite that the analysis was restricted to a limited number of 37 individuals, ANNs still revealed a central and highly connected cluster characterized by the low methylation levels of the five studied genes, and several environmental factors showed complex interactions with this central cluster (Figure 6) . Among them, the nonsmoking status was connected with low methylation profiles of all the five genes, and this is consistent with several literature reports suggesting that smoking might be linked

future science group

Artificial intelligence & DNA methylation in colorectal cancer 

to hypermethylation of several genes in CRC, while smoking cessation from over four decades seems to be protective against CIMP-high CRCs [41,42] . However, the MST shows that the non smoking condition is also linked to MGMT hypermethylation (