Structural Variation, Functional Differentiation, and

0 downloads 0 Views 3MB Size Report
Sep 13, 2018 - Ginseng is one of the most important medicinal herbs and has ... correlation of the cytochrome P450 gene superfamily revealed in ginseng.
Published online September 13, 2018 The

Plant Genome

o r i g i n a l r es e a rc h

Structural Variation, Functional Differentiation, and Activity Correlation of the Cytochrome P450 Gene Superfamily Revealed in Ginseng Yi Wang, Xiangyu Li, Yanping Lin, Yanfang Wang, Kangyu Wang, Chunyu Sun, Tiancheng Lu, and Meiping Zhang* Y. Wang, Y. Lin, K. Wang, C. Sun, T. Lu, MP. Zhang, College of Life Science, Jilin Agricultural Univ., 2888 Xincheng Street, Changchun, Jilin 130118, China; X. Li, College of Life Science, Jilin Agricultural Univ., 2888 Xincheng Street, Changchun, Jilin 130118, China (current address, State Forestry Administration Key Open Lab. on the Science and Technology of Bamboo and Rattan, International Center for Bamboo and Rattan, Beijing 100102, China); Y. Wang, K. Wang, C. Sun, MP. Zhang, Research Center of Ginseng Genetic Resources Development and Utilization, 2888 Xincheng Street, Changchun, Jilin 130118, China; YF. Wang, College of Chinese Medicinal Materials, Jilin Agricultural Univ., Changchun, Jilin 130118, China.

ABSTRACT Ginseng (Panax ginseng C.A. Mey.) is one of the most important medicinal herbs for human health and medicine, for which ginsenosides are the major bioactive components. The cytochrome P450 genes, CYP, form a large gene superfamily; however, only three CYP genes have been identified from ginseng and shown to be involved in ginsenoside biosynthesis, indicating the importance of the CYP gene superfamily in the process. Here we report genome-wide identification and systems analysis of the CYP genes in ginseng, defined as PgCYP genes. We identified 414 PgCYP genes, including the three published PgCYP genes. These PgCYP genes formed a superfamily consisting of 41 gene families, with a substantial diversity in phylogeny and dramatic variation in spatiotemporal expression. Gene ontology (GO) analysis categorized the PgCYP gene superfamily into 12 functional subcategories distributing among all three primary functional categories, suggesting its functional differentiation. Nevertheless, the majority of its gene members expressed correlatively and tended to form a coexpression network and some of them were commonly regulated in expression across tissues and developmental stages. These results have led to genome-wide identification of PgCYP genes useful for genome-wide identification of the PgCYP genes involved in ginsenoside biosynthesis in ginseng and provided the first insight into how a gene superfamily functionally differentiates and acts correlatively in plants. Abbreviations: BP, biological process; CC, cellular component; GO, gene ontology; MF, molecular function; NBS, nucleotide binding site; NCBI, National Center for Biotechnology Information; ORF, open-reading frame; PERF, proton-transfer groove; RLK, receptor-like kinase.

core ideas

• • • • •

Most genes exist in a form of family or superfamily in higher organisms Cytochrome P450 (CYP) gene superfamily is involved in ginsenoside biosynthesis Genome-wide identification of members of the CYP gene superfamily in ginseng Assessed its origin, evolution, and spatiotemporal expression Examined its functional differentiation and action correlation and regulation

G

inseng is one of the most important medicinal herbs and has been cultivated for millennia in Asia, especially in China, Korea, and Japan. Its closely related species, American ginseng (P. quinquefolius L.), is also cultivated as a medicinal herb in North America. Studies have documented that ginseng has numerous bioactive compounds that are remarkable for human health and medicine; nevertheless, ginsenosides, a diverse group of triterpene saponins

Citation: Wang, Y., X. Li, Y. Lin, Y. Wang, K. Wang, C. Sun, T. Lu, and M. Zhang. 2018. Structural variation, functional differentiation, and activity correlation of the cytochrome P450 gene superfamily revealed in ginseng. Plant Genome 11:170106. doi: 10.3835/plantgenome2017.11.0106 Received 29 Nov. 2017. Accepted 17 July 2018. *Corresponding author ([email protected]). This is an open access article distributed under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Copyright © Crop Science Society of America 5585 Guilford Rd., Madison, WI 53711 USA

the pl ant genome



vol . 11, no . 3



november 2018

1

of

11

present in all tissues of ginseng, are recognized as the most valuable (Choi, 2008; Zhang et al., 2014). The identified bioactivities of ginsenosides for human health and medicine include, but are not limited to, immunomodulation (Kenarova et al., 1990), anti-inflammation (Matsuda et al., 1990), antitumor (Sato et al., 1994; Mochizuki et al., 1995), antihyperlipidemia (Cho et al., 2006), and antiamnestic and antiaging activities (Cheng et al., 2005). The genes encoding cytochrome P450, CYP, are known to exist in a large gene superfamily. They are present in all organisms including plants, animals, and microbes (Nelson and Werck-Reichhart, 2011). As of 2013, ~7512 CYP450 genes have been identified in plants (Renault et al., 2014), but only three CYP genes have been so far identified from ginseng (Han et al., 2011, 2012, 2013). The CYP genes are involved in a variety of metabolisms such as endogenous oxidation reactions, detoxification of xenobiotics, and biosynthesis of secondary metabolites (Matsui et al., 1996; Xiang et al., 2005; Guengerich, 2007; Pinot and Beisson, 2011; McLean et al., 2012). Importantly, the CYP genes have been also shown to be involved in biosynthesis of ginsenosides in ginseng (Han et al., 2011, 2012, 2013), suggesting the importance of the CYP gene superfamily in the process. Therefore, genomewide identification and characterization of the CYP genes present in ginseng is of significance for advanced ginsenoside biosynthesis research and breeding. Furthermore, the CYP gene superfamily is also an excellent system for studies of functional differentiation and expression correlation and regulation among the members of a gene family or superfamily. Genome sequence analysis has revealed that most protein-encoding genes exist in multiple copies or a form of family in the genomes of higher organisms (Schnable et al., 2009). For instance, maize (Zea mays L.) was identified to have 32,540 protein-encoding genes, of which there are 11,892 gene families (Schnable et al., 2009). Therefore, systems studies of gene families or superfamilies in a genome are of significance for comprehensive understanding of many biological processes including the ginsenoside biosynthesis. Several gene families of biological and economical importance, including the nucleotide binding site (NBS)-encoding gene family, receptor-like kinase (RLK) gene family, and CYP gene superfamily, have been extensively studied in several plant species such as Arabidopsis thaliana (L.) Heynh. (Meyers et al., 2003; Shiu et al., 2004; Paquette et al., 2000; Nelson et al., 2008), rice (Oryza sativa L.) (Shiu et al., 2004; Nelson et al., 2008; Zhang et al., 2010), soybean [Glycine max (L.) Merr.] (Guttikonda et al., 2010; Zhang et al., 2010), and cotton (Gossypium spp.) (Zhang et al., 2010). Nevertheless, these studies were largely focused on identification, phylogeny, evolution, and spatiotemporal expressions of gene families. Little is known about functional differentiation and expression correlation and regulation among the members of a gene family or superfamily. Systems analysis of the CYP gene superfamily will provide an insight into these aspects of gene families in plants. Therefore, in the present study we conducted a genome-wide investigation of the CYP gene superfamily 2

of

11

present in ginseng with an emphasis on the identification and systems analysis of functional differentiation and expression correlation among the members of the gene superfamily. We identified 414 CPY genes, including the three published CYP genes (Han et al., 2011, 2012, 2013), from a ginseng transcriptome database recently developed from 14 tissues of a 4-yr-old ginseng plant (Wang et al., 2015). These CPY genes were defined PgCYP001 through PgCYP414. Moreover, we examined, as done in the studies of the CYP gene superfamily in other plant species (Paquette et al., 2000; Nelson et al., 2008; Guttikonda et al., 2010), the phylogeny, evolution, and spatiotemporal expressions of the gene superfamily in ginseng and, importantly, also conducted a systems analysis of the functional differentiation and network among the members in the PgCYP gene superfamily. These results provide gene resources useful and essential for genome-wide identification of PgCYP genes involved in the biosynthesis of ginsenosides and also provide a first insight into the functional differentiation and expression correlations among the members of a gene family in plants.

MATERIALS AND METHODS Ginseng Transcriptome Databases Three transcriptome databases of ginseng, including transcript sequences and expressions, were used for this study. The first one was developed from 14 tissues of a 4-yr-old ginseng plant (Wang et al., 2015; accession numbers SRX1445566–SRX1445579), the second from the roots of 5-, 12-, 18-, and 25-yr-old ginseng plants (Wang et al., 2015; accession numbers SRX1445580– SRX1445583), and the third from the 4-yr-old roots of 42 ginseng farmers’ cultivars (Yin et al., 2017). The genotypes, origin, and ages of the plant materials used for the generation of these three databases are presented in Supplemental Table S1. Identification of PgCYP Genes Because the CYP gene superfamily consists of a large number of gene members that are diverse, as low as 16% at the amino acid sequence level (Werck-Reichhart and Feyereisen, 2000), we used a four-step approach to identify the CYP genes expressed in ginseng, PgCYP genes, from the ginseng transcriptome database developed from 14 tissues of a 4-yr-old ginseng plant (Wang et al., 2015). First, we extracted >6000 plant CYP genes from National Center for Biotechnology Information (NCBI) nucleotide database and used them as queries to blast the ginseng 14-tissue transcriptome database with a cutoff value of 1.0 ´ 10−6 (Johnson et al., 2008). Second, the ginseng transcripts identified from the first step were used as queries for further search against the 14-tissue transcriptome database at a cutoff value of 1.0 ´ 10−6 (Johnson et al., 2008). These BLAST searches helped maximize identification of the CYP genes from the ginseng database. Third, these identified transcripts were aligned and annotated with the NCBI nonredundant protein database at a cut-off the pl ant genome



vol . 11, no . 3



november 2018

value of 1.0 ´ 10−6. The transcripts that were annotated into CYPs were selected for further analysis (Supplemental Table S2). Finally, the transcripts derived from a single gene and identified with the same gene ID defined by the Trinity software (see Wang et al., 2015) were subjected to alignment analysis. Only those having an identity of 100% in a length of at least 200 nucleotides—the size of CYP transcript with minimal length—were selected as the alternatively spliced transcripts of the gene.

Identification of Conservative Motifs Contained in PgCYP Genes Previous study showed that the proteins of CYP genes have three extremely conservative amino acid motifs, Heme-binding (FXXGXXXCXG), K-helix (EXXR), and the proton-transfer groove (PERF) (Hasemann et al., 1995). To further confirm and characterize the PgCYP genes, we identified their conserved motifs with representative genes randomly selected from the PgCYP genes using the MEME program (Bailey and Elkan, 1995). A cutoff P-value of £1.0 ´ 10−50 was used for the analysis. Classification and Phylogenetic Analysis of PgCYP Genes We classified and phylogenetically analyzed the PgCYP genes based on the amino acid sequences of their putative proteins. Therefore, we translated the transcripts of the PgCYP genes into protein sequences and identified their putative open-reading frames (ORFs) using ORFfinder (http://www.ncbi.nlm.nih.gov/projects/gorf/). Moreover, the predicted ORFs of the PgCYP transcripts were verified by BLASTp against the NCBI nonredundant protein sequence database. The verified PgCYP transcripts that contained complete ORFs were classified based on the similarity of their putative protein sequences as described by Nelson et al. (1996). Briefly, if the protein sequence similarity of two PgCYP transcripts was ³40%, they were assigned to the same CYP family; otherwise, they were assigned to different families. If the protein similarity of two PgCYP transcripts was between 40 and 55%, they were assigned to two different subfamilies in the same family. If the protein similarity of two PgCYP transcripts was >55%, they were assigned to a same subfamily. The nomenclature of the PgCYP genes followed Nelson and Werck-Reichhart (2011). The phylogeny of the PgCYP gene superfamily was established also using the verified transcripts of the PgCYP genes that contained complete ORFs based on their putative protein sequences. One hundred and five CYP genes previously identified from A. thaliana (At), Siberian-ginseng [Eleutherococcus senticosus (Rupr. & Maxim.) Maxim.] (Es), sanchi ginseng [P. notoginseng (Burkill) F. H. Chen ex C. Y. Wu & K. M. Feng] (Pn), American ginseng (Pq), tomato (Solanum lycopersicum L.) (Sl), and potato (S. tuberosum L.) (St) (Supplemental Table S3) were used as outgroup to estimate the evolution of the PgCYP gene superfamily. The protein sequence alignment of these genes was accomplished using wang et al .

ClustalX2 (Larkin et al., 2007). The clustering algorithm of phylogeny followed the neighbor-joining method with 1000 bootstrap replications (Larkin et al., 2007). The neighbor-joining method has been widely used to construct the phylogenies of gene families (e.g., Meyers et al., 2003; Shiu et al., 2004). The resultant phylogenetic tree was displayed using EvolView (Zhang et al., 2012).

Expressions of PgCYP Transcripts Because previous studies showed that the multiexonic genes in plants are frequently subjected to RNA alternative splicing, forming different transcripts or multiple transcripts per gene and with each transcript having a different biological function (Syed et al., 2012), we quantified the expression of individual transcripts of each PgCYP gene in different tissues, at different developmental stages, and in different genotypes. We extracted the expression of each PgCYP transcript from the above three transcriptome databases: the transcriptome database developed from 14 tissues of a 4-yr-old ginseng plant (Wang et al., 2015); the transcriptome database developed from the roots of 5-, 12-, 18-, and 25-yr-old ginseng plants (Wang et al., 2015); and the transcriptome database developed from the 4-yr-old roots of 42 ginseng farmers’ cultivars (Yin et al., 2017). The detailed quantification of the expression of each transcript by the RSEM program bundled with the Trinity software was previously described (Wang et al., 2015). Estimation of Functional Differentiation of the PgCYP Gene Superfamily Gene ontology analysis is a sequence-similarity-based bioinformatic tool that has been widely used to in silico functionally categorize genes. Therefore, we estimated the functional differentiation of the PgCYP gene superfamily using the 440 transcripts of the 414 PgCYP genes by GO analysis with the Blast2GO software (Ashburner et al., 2000) followed by enrichment analysis of functional subcategories. Coexpression Network and Expression Heatmap Analysis Coexpression network analysis has been widely used to determine the functional and expression relationships among genes (Theocharidis et al., 2009), while heatmap is widely used to examine the expression regulation of a group of genes. Therefore, we conducted the coexpression network analysis of the PgCYP genes in different tissues and different genotypes using the BioLayout Express3D (Theocharidis et al., 2009) and their expression heatmaps as previously described (Wang et al., 2015).

RESULTS Identification, Conservative Motifs, Classification, and Phylogeny of the PgCYP Genes We identified 440 PgCYP transcripts, including all three published ginsenoside biosynthesis genes: CYP716A53v2_1, CYP716A52v2 _3, and CYP716A47_1 3

of

11

Fig. 1. The conserved motifs of PgCYP genes and their distribution in the genes. The PgCYP genes were randomly selected as examples from the PgCYP genes identified in this study. (a) The conserved motifs of PgCYP genes. (b) Distribution of the conserved motifs along PgCYP genes.

(Han et al., 2011, 2012, 2013), from the ginseng transcriptome database derived from 14 tissues of a 4-yrold plant. These 440 transcripts were derived from 414 PgCYP genes (accession nos. GGRM01000001 - GGRM01000440) as defined by the Trinity software (Grabherr et al., 2011; Haas et al., 2013). These PgCYP genes were named PgCYP001 through PgCYP414, with PgCYP 137 corresponding to CYP716A53v2_1, PgCYP311 to CYP716A52v2 _3, and PgCYP339 to CYP716A47_1 previously cloned by Han et al. (2011, 2012, 2013). The multiple transcripts derived from a single PgCYP gene, such as PgCYP261, were suffixed with an Arabic number, such as PgCYP261-1 and PgCYP261-2 (Supplemental Table S4). These PgCYP transcripts were from 201- to 3766-bp long with an average length of 969 bp. Then, we performed conservative motif analysis of the PgCYP transcripts to further confirm and characterize the PgCYP genes (Fig. 1). The result showed that all the analyzed PgCYP transcripts contained the three conservative motifs that characterize the plant CYP genes, Hemebinding, K-Helix, and PERF (Fig. 1a). This result further confirmed the above identification of the PgCYP genes. In addition, we found that the PgCYP transcript proteins 4

of

11

were also characterized by four additional conservative motifs defined as Motif 1, 2, 3, and 4 (Fig. 1a). A majority of the analyzed PgCYP transcripts contained five or six of these seven conservative motifs, and they are distributed along the PgCYP transcripts in the same order (Fig. 1b). Next, we characterized the PgCYP genes by systematic classification and phylogenetic analysis. The PgCYP genes were classified based on the classification system of CYP genes established in plants (Nelson and WerckReichhart, 2011). Since the classification and phylogenetic analysis were both based on putative protein sequences of the genes, we first in silico translated the transcripts into putative proteins. The ORF analysis identified 138 PgCYP transcripts that have complete ORFs and these 138 PgCYP transcripts were derived from 114 PgCYP genes (Supplemental Table S5). Therefore, we classified the PgCYP genes using these 138 PgCYP transcripts as representatives. The PgCYP genes formed a gene superfamily that was classified into nine of the 11 clans and 41 families identified for plant CYP genes (Nelson and Werck-Reichhart, 2011). Clans CYP710 and CYP746 were not found in the PgCYP gene superfamily. Of these nine clans, four (CYP71, CYP72, CYP85, and CYP86) each the pl ant genome



vol . 11, no . 3



november 2018

Fig. 2. Phylogeny of the PgCYP gene superfamily. (a) The phylogenetic position of Panax ginseng in the dicot plant phylogenetic tree inferred according to Lee et al. (2011). The CYP genes of the non–P. ginseng species were used as the outgroup for the evolutionary analysis of the PgCYP gene superfamily. (b) Phylogeny of the PgCYP gene superfamily. A total of 138 PgCYP transcripts selected from PgCYP genes, shown in blue color in the circle, that have complete ORFs were used as representatives to construct the phylogeny of the PgCYP gene superfamily (Supplemental Table S5). A total of 105 published CYP genes, shown in orange color in the circle, that were identified from Arabidopsis thaliana (At), Eleutherococcus senticosus (Es), P. notoginseng (Pn), P. quinquefolius (Pq), Solanum lycopersicum (Sl), and S. tuberosum (St) were used as outgroup (Supplemental Table S3). The clans to which the CYP genes belong are indicated by colors. The numbers for the branches of the tree are bootstrap confidence out of 1000 replications. Different transcripts derived from a single PgCYP gene were shown as v1, v2, and so on.

consist of multiple gene families and five (CYP51, CYP74, CYP97, CYP711, and CYP727) each consist of a single gene family (Supplemental Table S5). Phylogenetic analysis clustered the PgCYP gene superfamily represented by 138 PgCYP transcripts (Supplemental Table S5) and 105 CYP genes identified from A. thaliana (At), Siberian-ginseng (Es), sanchi ginseng (Pn), American ginseng (Pq), tomato (Sl), and potato (St) (Supplemental Table S3; Fig. 2a) into 10 clusters (Fig. 2b). A vast majority of branches of the tree had a bootstrap confidence of 500 (50%) or larger out of the 1000 replications performed. Of the 10 clusters, nine contained one or more PgCYP genes, while the other cluster consisted of only AtCYP, SlCYP, and StCYP genes from the outgroup species. Therefore, the PgCYP genes were clustered into nine clusters. This phylogeny of the PgCYP genes, as expected, was consistent with the classification of the PgCYP genes (Fig. 2b). Every PgCYP clan, except for the 727clan that contained only one CYP gene, PgCYP727A1, contained one or more AtCYP genes from A. thaliana. Because A. thaliana is the most distantly related to P. ginseng among the wang et al .

six outgroup species, this result suggested that the PgCYP gene superfamily and at least eight of its nine clans originated before split between genera Arabidopsis and Panax. When the nine clans of the PgCYP gene phylogenetic tree were compared, it appeared that the 711clan, 51clan, 71clan, 74clan, 727clan, and 710clan originated earlier than 97clan, 72clan, and 86clan.

Functional Differentiation of the PgCYP Gene Superfamily Moreover, we examined the functional differentiation of the PgCYP gene superfamily by in silico functionally categorizing the 440 transcripts, accounting for 414 gene members of the PgCYP gene superfamily by GO analysis. As a result, 434 of the 440 PgCYP transcripts were categorized into one or more GO terms (Supplemental Table S2). These PgCYP transcripts were categorized into all the three primary functional categories, biological process (BP), molecular function (MF), and cellular component (CC) (Fig. 3a) and 12 functional subcategories at Level 2 (Fig. 3b). These 12 subcategories included four CC subcategories: cell, membrane, 5

of

11

Fig. 3. In silico functional categorization and gene ontology (GO) term enrichment of the PgCYP gene transcripts. (a) Venn diagram of numbers of the PgCYP transcripts involved in the biological process (BP), molecular function (MF), and cellular component (CC) categories. (b) Subcategories in which the PgCYP transcripts are involved at Level 2 and their enrichments. The GO terms of the transcripts expressed in 14 tissues of the 4-yr-old plant used for identification of the PgCYP genes as the background control for the enrichment analysis. Double asterisk (**), significant at P £ 0.01; the remaining GO terms are not significant at P £ 0.05.

cell part and organelle; two MF subcategories: catalytic activity and binding; and six BP subcategories: singleorganism process, response to stimulus, cellular process, metabolic process, developmental process and biological regulation (Fig. 3b). The number of subcategories into which the PgCYP gene superfamily was categorized accounted for one-third of the 36 subcategories of the transcripts derived from 14 tissues of a 4-yr-old ginseng plant (Wang et al., 2015), suggesting that the functions of the members of the gene superfamily have substantially differentiated. Nevertheless, it was surprising that in spite of the diversity of the PgCYP gene superfamily, the number of its subcategories was fewer than the 36 subcategories of the transcripts of the P. ginseng NBS (PgNBS) gene family (Yin et al., 2017) and the 23 subcategories of the transcripts of the P. ginseng RLK (PgRLK) gene family (Lin et al., 2018). These results indicated that the PgCYP gene superfamily substantially differentiated in functionality since it was originated, but the magnitude is much smaller than those of the PgNBS and PgRLK gene families in P. ginseng. Enrichment analysis showed that of the 12 functional subcategories of the PgCYP gene superfamily, only three—catalytic activity, binding, and cell parts—are upenriched in number of the PgCYP transcripts, supporting the basic functions of the CYP gene superfamily in catalytic activity and binding. The number of PgCYP transcripts of these two functional subcategories accounted for >37% of the total GO terms into which the PgCYP gene superfamily was categorized. Of the nine remaining subcategories of the PgCYP gene superfamily, seven were down-enriched and two were not significantly different from those of the whole-genome background (Fig. 3b). 6

of

11

Spatiotemporal Expressions of the PgCYP Genes Furthermore, to determine how the members of the PgCYP gene superfamily and how the transcripts alternatively spliced from a PgCYP gene expressed spatially, temporally, and in different genetic backgrounds, we examined the expressions of the PgCYP genes in 14 tissues of a 4-yr-old plant, the roots of plants aged from 5 to 25 yr, and the 4-yrold roots of 42 famers’ cultivars to further characterize the gene superfamily (Supplemental Table S1, S6). Figure 4 shows the expressions of four families of the PgCYP gene superfamily in different tissues and at different developmental stages. In summary, the expression of a PgCYP gene varied dramatically across cultivars, tissues, and developmental stages (Fig. 4; Supplemental Fig. S1). On the other hand, different transcripts alternatively spliced from a PgCYP gene also expressed differentially in the same genotype, in the same tissue, or at the same developmental stage (Fig. 4; Supplemental Table S6), for instance, those of PgCYP348-1 and PgCYP348-2 in the 5-yr-old root (Fig. 4b). Supplemental Fig. S1 shows the expression consistency of the transcripts of the PgCYP gene superfamily across genetic backgrounds, tissues, and developmental stages. Among the 42 cultivars, only 17.3% of the 440 PgCYP transcripts expressed in the roots of all 42 cultivars, whereas ~5.5% were cultivar specific in expression in the 4-yr-old roots and from 0.2 to 3.0% expressed in the roots of 2 through 41 cultivars (Supplemental Fig. S1a). Among the 14 tissues of the 4-yr-old plant, only 18.6% expressed in all 14 tissues, whereas 26.8% of the 440 PgCYP transcripts expressed in only one tissue and the remaining transcripts expressed in 2 to 13 tissues (Supplemental Fig. S1b). For the roots of different aged plants, only 126 of the PgCYP transcripts expressed through all four the pl ant genome



vol . 11, no . 3



november 2018

Fig. 4. Spatiotemporal expression patterns of the PgCYP transcripts classified into four families of the PgCYP gene superfamily in different tissues of (a) a 4-yr-old plant and (b) in the roots of 5-, 12-, 18-, and 25-yr-old plants. The PgCYP gene families, CYP704, CYP707, CYP74, and CYP97. wang et al .

7

of

11

developmental stages spanning 25 yr and 5 through 16 were age specific (Supplemental Fig. S1c).

Functional Correlation and Expression Regulation among the Members of the PgCYP Gene Superfamily Finally, given that the members of the PgCYP gene superfamily have functionally substantially differentiated, we examined whether different PgCYP genes are somehow related in functionality. To answer this question, we constructed the coexpression networks of the 440 PgCYP transcripts in different tissues of a 4-yr-old ginseng plant (Fig. 5a; Supplemental Table S7) and the 4-yr-old roots of 42 cultivars (Supplemental Fig. S2a; Supplemental Table S8), respectively, at a cutoff value of P £ 1.0 ´ 10−2. We found that 416 (94.5%) of the 440 transcripts formed a coexpression network consisting of 416 nodes, 9201 edges (Fig. 5a), and 26 clusters (Fig. 5b) in different tissues of the 4-yr-old plant and that 274 (62.3%) of the 440 transcripts formed a coexpression network made of 274 nodes, 1506 edges (Supplemental Fig. S2a), and 21 clusters (Supplemental Fig. S2b) in the 4-yr-old roots of different cultivars. Statistical analysis showed that the PgCYP transcripts more significantly tended to form coexpression networks than those randomly selected unknown ginseng transcripts in both different tissues of the 4-yr-old plant (Fig. 5c-f) and 4-yr-old roots of different cultivars (Supplemental Fig. S2c-f). For the 4-yr-old roots of the 42 cultivars, although the networks of the randomly selected unknown ginseng transcripts had more nodes than the PgCYP transcripts at P £ 5.0 ´ 10−2 and P £ 1.0 ´ 10−2, they had fewer edges than the PgCYP transcripts at P-values from 5.0 ´ 10−2 to 1.0 ´ 10−8. These results suggested that the members of the PgCYP gene superfamily are still expression correlated and thus likely functionally correlated, even though they dramatically diverged at the amino acid sequence level since the PgCYP gene superfamily originated (Fig. 2; Supplemental Table S5). We also investigated whether the genes from the PgCYP gene superfamily were commonly regulated, with an attempt to identify expression signatures that are associated with a particular developmental stage or a tissue through expression heatmap analysis. It was apparent that common regulations existed in different tissues of a 4-yr-old plant (Fig. 6a) and at different developmental stages of roots (Fig. 6b). For instance, PgCYP093, PgCYP232, PgCYP102, and PgCYP242 were likely commonly regulated in different tissues, and PgCYP093, PgCYP102, and PgCYP352-2 were likely commonly regulated in the roots at different developmental stages. Nevertheless, the transcript expression common regulation did not seem relevant to their origins of the PgCYP genes and gene families. The expression patterns of these gene transcripts were well distinguished among the different tissues of a 4-yr-old plant (Fig. 6a) and among the roots of differently aged plants (Fig. 6b). The unique expression patterns of these transcripts, therefore, provide useful signatures for ginseng research and applications. 8

of

11

DISCUSSION

The CYP genes participate in numerous biological processes important in plants, including ginsenoside biosynthesis in P. ginseng (Han et al., 2011, 2012, 2013). This study has identified 414 PgCYP genes from P. ginseng, including the three published ginsenoside biosynthesis PgCYP genes (Han et al., 2011, 2012, 2013). Therefore, 411 of the 414 PgCYP genes are newly identified from the species. This result suggests that P. ginseng has a large number of PgCYP genes. Classification and phylogenetic analyses have classified the PgCYP genes into a gene superfamily consisting of 41 families. Moreover, as the PgCYP genes were consistently clustered into the same clades or branches of the gene superfamily tree with the CYP genes identified from other plant species including A. thaliana, the PgCYP gene superfamily originated before Panax split from Arabidopsis and highly diverged. Furthermore and importantly, this study reveals that the members of the PgCYP gene superfamily, though having a common ancestor, have dramatically differentiated in functionality as indicated by their categorization into 12 GO subcategories at Level 2. However, the magnitude of their functional differentiation is much smaller than those observed for the PgNBS gene family (Yin et al., 2017) and the PgRLK gene family (Lin et al., 2018) in the ginseng species. The transcripts of the PgCYP genes that are involved in catalytic activity, binding and then cell part have been significantly up-enriched, relative to those of the whole transcriptome, suggesting their roles in these processes. Companioned with the process of functional differentiation, the expressions of the members of the PgCYP gene superfamily have also been diversified across tissues, developmental stages, and genotypes, which may be necessary to meet the needs of their differentiated functions. Numerous transcripts of the PgCYP genes have been shown to express specifically or more actively in one of the tissues or in the roots at one of the developmental stages. Surprisingly, systems analysis shows that despite of their dramatic diversity in structure, functionality, and expression activity, the majority of the members of the PgCYP gene superfamily remain to express correlatively and form a coexpression network. Since the formation of the networks is significantly stronger than that of the randomly selected unknown ginseng gene transcripts in both different tissues and different genotypes, the coexpression of the transcripts is likely a reflection of the functional correlation of the genes. A significant number of them are shown to have consistent expression levels and patterns across tissues and developmental stages. These results suggest that a number of genes in the PgCYP gene superfamily are likely commonly regulated or have the same or related functions in different tissues or at different developmental stages. Since three of the previously cloned PgCYP genes— CYP716A53v2_1 (PgCYP 137), CYP716A52v2 _3 (PgCYP311), and CYP716A47_1 (PgCYP339) (Han et al., 2011, 2012, 2013)—were shown to be involved in the ginsenoside biosynthesis in ginseng, it is likely that some of the the pl ant genome



vol . 11, no . 3



november 2018

Fig. 5. Network of the PgCYP transcripts expressed in 14 tissues of a 4-yr-old ginseng plant. (a) The coexpression network constructed from 440 PgCYP transcripts at P £ 1.0 ´ 10 −2. It consists of 416 nodes and 9201 edges. (b) Twenty-six clusters in the network. (c) Tendency that PgCYP genes form a network using the randomly selected ginseng unknown genes as a control; variation in number of nodes. (d) Tendency that PgCYP genes form a network using the randomly selected ginseng unknown genes as a control; variation in number of edges. The networks shown in (c and d) were constructed from 440 transcripts with no replicate. (e) Statistics of variation in number of nodes in the PgCYP network. (f) Statistics of variation in number of edges in the PgCYP network. Capital letters indicate significance at P £ 0.01; Error bar, the standard deviation for 20 bootstrap replications.

PgCYP genes identified in this study are also involved in the process. Considering the importance of the ginsenosides in human health and medicine, the PgCYP genes identified in this study provide gene resource that is useful for genomewide identification of the genes involved in ginsenoside biosynthesis, thus enhanced ginseng research and breeding. wang et al .

CONCLUSION Ginseng is one of the most important and widely used medicinal herbs for human health and medicine for which its ginsenosides are known to be the major bioactive components. The PgCYP gene superfamily has been shown to be actively involved in the biosynthesis of 9

of

11

Fig. 6. Expression heatmaps of the PgCYP gene transcripts of the CYP704 (highlighted in red), CYP707 (blue), CYP74 (light blue), and CYP97 (black) families, the PgCYP gene superfamily. (a) Expression heatmap of the PgCYP gene transcripts of the CYP704, CYP707, CYP74, and CYP97 families in 14 tissues of the 4-yr-old plant. (b) Expression heatmap of the PgCYP gene transcripts of the CYP704, CYP707, CYP74, and CYP97 families in the roots of 5-, 12-, 18-, and 25-yr-old ginseng. Note that not all transcripts shown in Fig. 6a expressed in the differently aged roots.

ginsenosides. This study has genome-wide identified the genes of the PgCYP gene superfamily and characterized it not only in evolution, phylogeny, and expression—as traditionally done for gene families in different species— but also in functional differentiation and activity correlation among different members of the gene superfamily. The PgCYP genes and findings resulted from this study provide gene resources essential for genome-wide identification and research of the PgCYP genes involved in ginsenoside biosynthesis and a genome-wide insight into how the members of a gene family or superfamily functionally differentiate and correlate in plants in general and in ginseng in particular.

Supplemental Information Available Supplemental information is available with the online version of this manuscript. Conflict of Interest The authors declare that there is no conflict of interest. ACKNOWLEDGMENTS

This work was supported by an award from China 863 Project (2013AA102604-3), the Bureau of Science and Technology of Jilin Province (20170101010JC), and the Development and Reform Commission of Jilin Province (2016C064).

10

of

11

REFERENCES

Ashburner, M., C.A. Ball, J.A. Blake, D. Botstein, H. Butler, J.M. Cherry, A.P. Davis, K. Dolinski, S.S. Dwight, and J.T. Eppig. 2000. Gene Ontology: Tool for the unification of biology. Nat. Genet. 25:25–29. doi:10.1038/75556 Bailey, T.L., and C. Elkan. 1995. The value of prior knowledge in discovering motifs with MEME. Proc. Int. Conf. Intell. Syst. Mol. Biol. 3:21–29. Cheng, Y., L.H. Shen, and J.T. Zang. 2005. Anti‐amnestic and anti‐ aging effects of ginsenoside Rg1 and Rb1 and its mechanism of action. Acta Pharmacol. Sin. 26:143–149. doi:10.1111/j.17457254.2005.00034.x Cho, W.C., W.S. Chung, S.K. Lee, A.W. Leung, C.H. Cheng, and K.K. Yue. 2006. Ginsenoside Re of Panax ginseng possesses significant antioxidant and antihyperlipidemic efficacies in streptozotocininduced diabetic rats. Eur. J. Pharmacol. 550:173–179. doi:10.1016/j. ejphar.2006.08.056 Choi, K.T. 2008. Botanical characteristics, pharmacological effects and medicinal components of Korean Panax ginseng C A Meyer. Acta Pharmacol. Sin. 29:1109–1118. doi:10.1111/j.1745-7254.2008.00869.x Grabherr, M.G., B.J. Haas, M. Yassour, J.Z. Levin, D.A. Thompson, I. Amit, X. Adiconis, L. Fan, R. Raychowdhury, and Q. Zeng. 2011. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29:644–652. doi:10.1038/ nbt.1883 Guengerich, F.P. 2007. Cytochrome p450 and chemical toxicology. Chem. Res. Toxicol. 21:70–83. doi:10.1021/tx700079z Guttikonda, S.K., J. Trupti, N.C. Bisht, H. Chen, Y.Q. An, S. Pandey, D. Xu, and O. Yu. 2010. Whole genome co-expression analysis of soybean cytochrome P450 genes identifies nodulation-specific P450 monooxygenases. BMC Plant Biol. 10:243. doi:10.1186/1471-222910-243 the pl ant genome



vol . 11, no . 3



november 2018

Haas, B.J., A. Papanicolaou, M. Yassour, M. Grabherr, P.D. Blood, J. Bowden, M.B. Couger, D. Eccles, B. Li, M. Lieber, M.D. MacManes, M. Ott, J. Orvis, N. Pochet, F. Strozzi, N. Weeks, R. Westerman, T. William, C.N. Dewey, R. Henschel, R.D. LeDuc, N. Friedman, and A. Regev. 2013. De novo transcript sequence reconstruction from RNA-Seq using the Trinity for reference generation and analysis. Nat. Protoc. 8:1494–1512. doi:10.1038/nprot.2013.084 Han, J.Y., H.S. Hwang, S.W. Choi, H.J. Kim, and Y.E. Choi. 2012. Cytochrome P450 CYP716A53v2 catalyzes the formation of protopanaxatriol from protopanaxadiol during ginsenoside biosynthesis in Panax ginseng. Plant Cell Physiol. 53:1535–1545. Han, J.Y., M.J. Kim, Y.W. Ban, H.S. Hwang, and Y.E. Choi. 2013. The involvement of b-amyrin 28-oxidase (CYP716A52v2) in oleananetype ginsenoside biosynthesis in Panax ginseng. Plant Cell Physiol. 54:2034–2046. doi:10.1093/pcp/pct141 Han, J.Y., H.J. Kim, Y.S. Kwon, and Y.E. Choi. 2011. The Cyt P450 enzyme CYP716A47 catalyzes the formation of protopanaxadiol from dammarenediol-II during ginsenoside biosynthesis in Panax ginseng. Plant Cell Physiol. 52:2062–2073. doi:10.1093/pcp/pcr150 Hasemann, C.A., R.G. Kurumbail, S.S. Boddupalli, J.A. Peterson, and J. Deisenhofer. 1995. Structure and function of cytochromes P450: A comparative analysis of three crystal structures. Structure 3:41–62. doi:10.1016/S0969-2126(01)00134-4 Johnson, M., I. Zaretskaya, Y. Raytselis, Y. Merezhuk, S. McGinnis, and T.L. Madden. 2008. NCBI BLAST: A better web interface. Nucleic Acids Res. 36:5–9. doi:10.1093/nar/gkn201 Kenarova, B., H. Neychev, C. Hadjiivanova, and V.D. Petkov. 1990. Immunomodulating activity of ginsenoside Rg1 from Panax ginseng. Jpn. J. Pharmacol. 54:447–454. doi:10.1254/jjp.54.447 Larkin, M.A., G. Blackshields, N. Brown, R. Chenna, P.A. McGettigan, H. McWilliam, F. Valentin, I.M. Wallace, A. Wilm, R. Lopez, J.D. Thompson, T.J. Gibson, and D.G. Higgins. 2007. Clustal W and Clustal X version 2.0. Bioinformatics 23:2947–2948. doi:10.1093/ bioinformatics/btm404 Lee, E.K., A. Cibrian-Jaramillo, S.O. Kolokotronis, M.S. Katari, A. Stamatakis, M. Ott, J.C. Chiu, D.P. Little, D.W. Stevenson, W.R. McCombie, R.A. Martienssen, G. Coruzzi, and R. DeSalle. 2011. A functional phylogenomic view of the seed plants. PLoS Genet. 7:e1002411. doi:10.1371/journal.pgen.1002411 Lin, Y., K. Wang, X. Li, C. Sun, R. Yin, F. Wang, Y. Wang, and M.P. Zhang. 2018. Evolution, functional differentiation, and co-expression of the RLK gene family revealed in Jilin ginseng, Panax ginseng C.A. Meyer. Mol. Genet. Genomics 293:845–849. doi:10.1007/ s00438-018-1425-6 Matsuda, H., K.I. Samukawa, and M. Kubo. 1990. Anti-inflammatory activity of ginsenoside Ro1. Planta Med. 56:19–23. doi:10.1055/s-2006-960875 Matsui, K., M. Shibutani, T. Hase, and T. Kajiwara. 1996. Bell pepper fruit fatty acid hydroperoxide lyase is a cytochrome P450 (CYP74B). FEBS Lett. 394:21–24. doi:10.1016/0014-5793(96)00924-6 McLean, K.J., M. Hans, and A.W. Munro. 2012. Cholesterol, an essential molecule: Diverse roles involving cytochrome P450 enzymes. Biochem. Soc. Trans. 40:587–593. doi:10.1042/BST20120077 Meyers, B.C., A. Kozik, A. Griego, H. Kuang, and R.W. Michelmore. 2003. Genome-wide analysis of NBS-LRR–encoding genes in Arabidopsis. Plant Cell 15:809–834. doi:10.1105/tpc.009308 Mochizuki, M., Y. Yoo, K. Matsuzawa, K. Sato, I. Saiki, S. Tonooka, K. Samukawa, and I. Azuma. 1995. Inhibitory effect of tumor metastasis in mice by saponins, ginsenoside-Rb2, 20 (R)-and 20 (S)ginsenoside-Rg3, of red ginseng. Biol. Pharm. Bull. 18:1197–1202. doi:10.1248/bpb.18.1197 Nelson, D.R., L. Koymans, T. Kamataki, J.J. Stegeman, R. Feyereisen, D.J. Waxman, M.R. Waterman, O. Gotoh, M.J. Coon, and R.W. Estabrook. 1996. P450 superfamily: Update on new sequences, gene

wang et al .

mapping, accession numbers and nomenclature. Pharmacogenetics 6:1–42. doi:10.1097/00008571-199602000-00002 Nelson, D.R., R. Ming, M. Alam, and A.M. Schuler. 2008. Comparison of cytochrome P450 genes from six plant genomes. Trop. Plant Biol. 1:216–235. Nelson, D., and D. Werck‐Reichhart. 2011. A P450‐centric view of plant evolution. Plant J. 66:194–211. doi:10.1111/j.1365-313X.2011.04529.x Paquette, S.M., S. Bak, and R. Feyereisen. 2000. Intron–exon organization and phylogeny in a large superfamily, the paralogous cytochrome P450 genes of Arabidopsis thaliana. DNA Cell Biol. 19:307–317. doi:10.1089/10445490050021221 Pinot, F., and F. Beisson. 2011. Cytochrome P450 metabolizing fatty acids in plants: Characterization and physiological roles. FEBS J. 278:195– 205. doi:10.1111/j.1742-4658.2010.07948.x Renault, H., J.E. Bassard, B. Hamberger, and D. Werck-Reichhart. 2014. Cytochrome P450-mediated metabolic engineering: Current progress and future challenges. Curr. Opin. Plant Biol. 19:27–34. doi:10.1016/j.pbi.2014.03.004 Sato, K., M. Mochizuki, I. Saiki, Y.C. Yoo, K.I. Samukawa, and I. Azuma. 1994. Inhibition of tumor angiogenesis and metastasis by a saponin of Panax ginseng, ginsenoside-Rb2. Biol. Pharm. Bull. 17:635–639. doi:10.1248/bpb.17.635 Schnable, P.S., D. Ware, R.S. Fulton, J.C. Stein, F. Wei, and S. Pasternak. 2009. The B73 maize genome: Complexity, diversity, and dynamics. Science 326:1112–1115. doi:10.1126/science.1178534 Shiu, S.H., W.M. Karlowski, R. Pan, Y.H. Tzeng, K.F. Mayer, and W.H. Li. 2004. Comparative analysis of the receptor-like kinase family in Arabidopsis and rice. Plant Cell 16:1220–1234. doi:10.1105/ tpc.020834 Syed, N.H., M. Kalyna, Y. Marquez, A. Barta, and J.W. Brown. 2012. Alternative splicing in plants: Coming of age. Trends Plant Sci. 17:616–623. doi:10.1016/j.tplants.2012.06.001 Theocharidis, A., S. van Dongen, A.J. Enright, and T.C. Freeman. 2009. Network visualization and analysis of gene expression data using BioLayout Express3D. Nat. Protoc. 4:1535–1550. doi:10.1038/ nprot.2009.177 Wang, K., S. Jiang, C. Sun, Y. Lin, R. Yin, Y. Wang, and M.P. Zhang. 2015. The spatial and temporal transcriptomic landscapes of Ginseng, Panax ginseng C.A. Meyer. Scientific Rep. 5:18283. Werck-Reichhart, D., and R. Feyereisen. 2000. Cytochromes P450: A success story. Genome Biol. 1:reviews3003.1–3003.9. doi:10.1186/gb2000-1-6-reviews3003 Xiang, W.S., X.J. Wang, T.R. Ren, and X.L. Ju. 2005. Expression of a wheat cytochrome P450 monooxygenase in yeast and its inhibition by glyphosate. Pest Manag. Sci. 61:402–406. doi:10.1002/ps.969 Yin, R., M. Zhao, K. Wang, Y. Lin, Y. Wang, C. Sun, Y. Wang, and M.P. Zhang. 2017. Functional differentiation and spatial-temporal coexpression networks of the NBS-encoding gene family in Jilin ginseng, Panax ginseng C.A. Meyer. PLoS One 12:e0181596. doi:10.1371/ journal.pone.0181596 Zhang, H., S. Gao, M.J. Lercher, S. Hu, and W.H. Chen. 2012. EvolView, an online tool for visualizing, annotating and managing phylogenetic trees. Nucleic Acids Res. 40:569–572. doi:10.1093/nar/gks576 Zhang, M.P., Y.H. Wu, M.K. Lee, Y.H. Liu, Y. Rong, T.S. Santos, C. Wu, F. Xie, R.L. Nelson, and H.B. Zhang. 2010. Numbers of genes in the NBS and RLK families vary by more than four-fold within a plant species and are regulated by multiple factors. Nucleic Acids Res. 38:6513–6525. doi:10.1093/nar/gkq524 Zhang, Y.C., G. Li, C. Jiang, B. Yang, H.J. Yang, H.Y. Xu, and L.Q. Huang. 2014. Tissue-specific distribution of ginsenosides in different aged ginseng and antioxidant activity of ginseng leaf. Molecules 19:17381–17399. doi:10.3390/molecules191117381

11

of

11